CESM Run Fails: ORTE_ERROR_LOG: Bad parameter in pmix_server_gen.c (OpenMPI + UCX error)

SJHuang

Shaojian Huang
New Member
Hi CESM Community,

I'm currently working on setting up and testing CESM2.1.3 on a custom HPC cluster using a compset with all stub components for initial testing:

CESM Version: CESM2.1.3-rc.01

Compset: 2000_XATM_XLND_XICE_XOCN_XROF_XGLC_XWAV

Grid: a%1.9x2.5_l%1.9x2.5_oi%gx1v6_r%r05_g%gland4_w%ww3a_m%gx1v6

Machine: Custom machine file (mycluster)

MPI: OpenMPI 4.1.5 (compiled with UCX and OpenFabrics support)

PROBLEM:
When I run the test case (even with only stub components), CESM fails with the following MPI errors:
ORTE_ERROR_LOG: Bad parameter in file orted/pmix/pmix_server_gen.c at line 863
pml_ucx.c:176 Error: Failed to receive UCX worker address: Not found (-13)
UCX ERROR Error returned from open in attach. Permission denied. File name is: /proc/...
The model terminates with:
forrtl: error (78): process killed (SIGTERM)

Any suggestions or insights are greatly appreciated.

Thanks in advance,
SJ
 

jedwards

CSEG and Liaisons
Staff member
The latest version in the cesm2.1 series is cesm2.1.5 please begin by updating to that version and trying again.
If you are still having issues include the compiler and compiler version information in your response. Do you have a /proc filesystem?
If not you will need to remove the -DHAVE_SLASHPROC from your config_compilers.xml file.
 
Vote Upvote 0 Downvote
Back
Top