Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

run fails with intel at higher mpi ranks

Shruti

Shruti Joshi
Member
Hello,

I was executing CESM 2.1.2 with Intel compiler.
When i execute compsets(A, X or B1850) with higher ranks like 32 i.e. mpirun -np 32, i get the following error :
"Assertion failed in file ch4_shm_coll.c at line 1477: node_info->numa_num <= ((MPIDI_SHMGR_SYNCPAGE_SIZE / MPIDI_SHMGR_FLAG_SPACE) - 1)"

I generally do not face this issue with clang or gcc compiler.
Could you please suggest any solution for the same.
 

jedwards

CSEG and Liaisons
Staff member
What version of the intel compiler are you using and what mpi library and version are you using? We routinely run cesm with the intel compiler using 20K+ tasks.
 

jedwards

CSEG and Liaisons
Staff member
I haven't seen this before but the error is coming from the mpich library and not the compiler. Did your tests with gnu and clang use the same mpich library?
Perhaps you should check with the system administrators for your system. Is there any clue in the error log as to where in the code this error is generated?
You might also try changing DEBUG=TRUE and rebuilding and running to see if that provides any further clues.
 
Top