Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM Run Fails: ORTE_ERROR_LOG: Bad parameter in pmix_server_gen.c (OpenMPI + UCX error)

SJHuang

Shaojian Huang
New Member
Hi CESM Community,

I'm currently working on setting up and testing CESM2.1.3 on a custom HPC cluster using a compset with all stub components for initial testing:

CESM Version: CESM2.1.3-rc.01

Compset: 2000_XATM_XLND_XICE_XOCN_XROF_XGLC_XWAV

Grid: a%1.9x2.5_l%1.9x2.5_oi%gx1v6_r%r05_g%gland4_w%ww3a_m%gx1v6

Machine: Custom machine file (mycluster)

MPI: OpenMPI 4.1.5 (compiled with UCX and OpenFabrics support)

PROBLEM:
When I run the test case (even with only stub components), CESM fails with the following MPI errors:
ORTE_ERROR_LOG: Bad parameter in file orted/pmix/pmix_server_gen.c at line 863
pml_ucx.c:176 Error: Failed to receive UCX worker address: Not found (-13)
UCX ERROR Error returned from open in attach. Permission denied. File name is: /proc/...
The model terminates with:
forrtl: error (78): process killed (SIGTERM)

Any suggestions or insights are greatly appreciated.

Thanks in advance,
SJ
 

jedwards

CSEG and Liaisons
Staff member
The latest version in the cesm2.1 series is cesm2.1.5 please begin by updating to that version and trying again.
If you are still having issues include the compiler and compiler version information in your response. Do you have a /proc filesystem?
If not you will need to remove the -DHAVE_SLASHPROC from your config_compilers.xml file.
 
Vote Upvote 0 Downvote
Top