Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

SCAM MPI_INIT issues

Hi, I am installing SCAM from cesm2.1.3-rc.01-0-g0596a97. I using openmpi (mpi/openmpi4.1.4-gcc12.2.0). Of note is that other configurations, including a coupled Oceano-Atmosphere systems are running but the SCAM is not. It doesn't even start due to MPI_INIT issues. Any insight?

Thanks for your help,
--John Mejia


Below is the error message:
#######
Generating namelists for /project/jmejia/cesm/scratch/test_scam_mpace
Creating component namelists
Calling /home/jmejia/my_cesm_sandbox/components/cam//cime_config/buildnml
CAM namelist copy: file1 /project/jmejia/cesm/scratch/test_scam_mpace/Buildconf/camconf/atm_in file2 /project/jmejia/cesm/scratch/test_scam_mpace/run/atm_in
Calling /home/jmejia/my_cesm_sandbox/components/clm//cime_config/buildnml
WARNING: CLM is starting up from a cold state
Calling /home/jmejia/my_cesm_sandbox/components/cice//cime_config/buildnml
Calling /home/jmejia/my_cesm_sandbox/cime/src/components/data_comps/docn/cime_config/buildnml
Calling /home/jmejia/my_cesm_sandbox/cime/src/components/stub_comps/srof/cime_config/buildnml
Calling /home/jmejia/my_cesm_sandbox/cime/src/components/stub_comps/sglc/cime_config/buildnml
Calling /home/jmejia/my_cesm_sandbox/cime/src/components/stub_comps/swav/cime_config/buildnml
Calling /home/jmejia/my_cesm_sandbox/cime/src/components/stub_comps/sesp/cime_config/buildnml
Calling /home/jmejia/my_cesm_sandbox/cime/src/drivers/mct/cime_config/buildnml
Finished creating component namelists
-------------------------------------------------------------------------
- Prestage required restarts into /project/jmejia/cesm/scratch/test_scam_mpace/run
- Case input data directory (DIN_LOC_ROOT) is /project/jmejia/cesm/inputdata
- Checking for required input datasets in DIN_LOC_ROOT
-------------------------------------------------------------------------
2023-05-01 17:49:07 MODEL EXECUTION BEGINS HERE
run command is /project/jmejia/cesm/scratch/test_scam_mpace/bld/cesm.exe >> cesm.log.$LID 2>&1
ERROR: RUN FAIL: Command ' /project/jmejia/cesm/scratch/test_scam_mpace/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /project/jmejia/cesm/scratch/test_scam_mpace/run/cesm.log.663.230501-174906


CESM LOG FILE FOLLOWS:

Invalid PIO rearranger comm max pend req (comp2io), 0


Resetting PIO rearranger comm max pend req (comp2io) to 64


PIO rearranger options:


comm type =p2p


comm fcd =2denable


max pend req (comp2io) = 0


enable_hs (comp2io) = T


enable_isend (comp2io) = F


max pend req (io2comp) = 64


enable_hs (io2comp) = F


enable_isend (io2comp) = T


(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_joincomm) init ID ( 6 CPLATM ) join IDs = 2 5 ( npes = 1) ( nthreads = 1)


(seq_comm_jcommarr) init ID ( 3 ALLATMID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)


(seq_comm_joincomm) init ID ( 4 CPLALLATMID ) join IDs = 2 3 ( npes = 1) ( nthreads = 1)


(seq_comm_setcomm) init ID ( 9 LND ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_joincomm) init ID ( 10 CPLLND ) join IDs = 2 9 ( npes = 1) ( nthreads = 1)


(seq_comm_jcommarr) init ID ( 7 ALLLNDID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)


(seq_comm_joincomm) init ID ( 8 CPLALLLNDID ) join IDs = 2 7 ( npes = 1) ( nthreads = 1)


(seq_comm_setcomm) init ID ( 13 ICE ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_joincomm) init ID ( 14 CPLICE ) join IDs = 2 13 ( npes = 1) ( nthreads = 1)


(seq_comm_jcommarr) init ID ( 11 ALLICEID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)


(seq_comm_joincomm) init ID ( 12 CPLALLICEID ) join IDs = 2 11 ( npes = 1) ( nthreads = 1)


(seq_comm_setcomm) init ID ( 17 OCN ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_joincomm) init ID ( 18 CPLOCN ) join IDs = 2 17 ( npes = 1) ( nthreads = 1)


(seq_comm_jcommarr) init ID ( 15 ALLOCNID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)


(seq_comm_joincomm) init ID ( 16 CPLALLOCNID ) join IDs = 2 15 ( npes = 1) ( nthreads = 1)


(seq_comm_setcomm) init ID ( 21 ROF ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_joincomm) init ID ( 22 CPLROF ) join IDs = 2 21 ( npes = 1) ( nthreads = 1)


(seq_comm_jcommarr) init ID ( 19 ALLROFID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)


(seq_comm_joincomm) init ID ( 20 CPLALLROFID ) join IDs = 2 19 ( npes = 1) ( nthreads = 1)


(seq_comm_setcomm) init ID ( 25 GLC ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_joincomm) init ID ( 26 CPLGLC ) join IDs = 2 25 ( npes = 1) ( nthreads = 1)


(seq_comm_jcommarr) init ID ( 23 ALLGLCID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)


(seq_comm_joincomm) init ID ( 24 CPLALLGLCID ) join IDs = 2 23 ( npes = 1) ( nthreads = 1)


(seq_comm_setcomm) init ID ( 29 WAV ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_joincomm) init ID ( 30 CPLWAV ) join IDs = 2 29 ( npes = 1) ( nthreads = 1)


(seq_comm_jcommarr) init ID ( 27 ALLWAVID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)


(seq_comm_joincomm) init ID ( 28 CPLALLWAVID ) join IDs = 2 27 ( npes = 1) ( nthreads = 1)


(seq_comm_setcomm) init ID ( 33 ESP ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)


(seq_comm_joincomm) init ID ( 34 CPLESP ) join IDs = 2 33 ( npes = 1) ( nthreads = 1)


(seq_comm_jcommarr) init ID ( 31 ALLESPID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)


(seq_comm_joincomm) init ID ( 32 CPLALLESPID ) join IDs = 2 31 ( npes = 1) ( nthreads = 1)


(seq_comm_printcomms) 1 0 1 1 GLOBAL:


(seq_comm_printcomms) 2 0 1 1 CPL:


(seq_comm_printcomms) 3 0 1 1 ALLATMID:


(seq_comm_printcomms) 4 0 1 1 CPLALLATMID:


(seq_comm_printcomms) 5 0 1 1 ATM:


(seq_comm_printcomms) 6 0 1 1 CPLATM:


(seq_comm_printcomms) 7 0 1 1 ALLLNDID:


(seq_comm_printcomms) 8 0 1 1 CPLALLLNDID:


(seq_comm_printcomms) 9 0 1 1 LND:


(seq_comm_printcomms) 10 0 1 1 CPLLND:


(seq_comm_printcomms) 11 0 1 1 ALLICEID:


(seq_comm_printcomms) 12 0 1 1 CPLALLICEID:


(seq_comm_printcomms) 13 0 1 1 ICE:


(seq_comm_printcomms) 14 0 1 1 CPLICE:


(seq_comm_printcomms) 15 0 1 1 ALLOCNID:


(seq_comm_printcomms) 16 0 1 1 CPLALLOCNID:


(seq_comm_printcomms) 17 0 1 1 OCN:


(seq_comm_printcomms) 18 0 1 1 CPLOCN:


(seq_comm_printcomms) 19 0 1 1 ALLROFID:


(seq_comm_printcomms) 20 0 1 1 CPLALLROFID:


(seq_comm_printcomms) 21 0 1 1 ROF:


(seq_comm_printcomms) 22 0 1 1 CPLROF:


(seq_comm_printcomms) 23 0 1 1 ALLGLCID:


(seq_comm_printcomms) 24 0 1 1 CPLALLGLCID:


(seq_comm_printcomms) 25 0 1 1 GLC:


(seq_comm_printcomms) 26 0 1 1 CPLGLC:


(seq_comm_printcomms) 27 0 1 1 ALLWAVID:


(seq_comm_printcomms) 28 0 1 1 CPLALLWAVID:


(seq_comm_printcomms) 29 0 1 1 WAV:


(seq_comm_printcomms) 30 0 1 1 CPLWAV:


(seq_comm_printcomms) 31 0 1 1 ALLESPID:


(seq_comm_printcomms) 32 0 1 1 CPLALLESPID:


(seq_comm_printcomms) 33 0 1 1 ESP:


(seq_comm_printcomms) 34 0 1 1 CPLESP:


*** The MPI_Comm_create_keyval() function was called before MPI_INIT was invoked.


*** This is disallowed by the MPI standard.


*** Your MPI job will now abort.


[compute1:1025095] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other proce


sses were killed!
######
 

jet

Member
Hi

Could you add the following parameter to your create_newcase command and try again.

--pecount 1

if that doesn't work could you provide your CaseStatus and README.case files so I can see how everything is configured?

jt
 

Christophe D

Christophe Durand
New Member
Good morning,
I have had the same error with my CESM 2.1.5 installation with the basic example of SCAM (4. Atmospheric configurations (compsets) — camdoc documentation). I have added the option "--pecount 1" as suggested.

I attached my CaseStatus and README.case files for you to check.
I will try to see if a debug session can help me, but my previous build in debug did not work, so I do not expect much from it.

Thank you for helping out.
 

Attachments

  • README_case_censored.txt
    2.8 KB · Views: 2
  • CaseStatus_censored.txt
    7.2 KB · Views: 3
Top