Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

MPI run error message.

xliu

Jon
Member
Hi, can anyone check what this error mean..from cesm.log.? There are lots of warnings during building. Thanks.

User-specified PIO rearranger comm max pend req (comp2io), 0 (value will be reset as requested)
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type = p2p
comm fcd = 2denable
max pend req (comp2io) = 64
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 191 1 ( npes = 192) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 127 1 ( npes = 128) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 127 1 ( npes = 128) ( nthreads = 1)( suffix =)
.
.
(seq_comm_joincomm) init ID ( 38 CPLIAC ) join IDs = 2 37 ( npes = 128) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 35 ALLIACID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 36 CPLALLIACID ) join IDs = 2 35 ( npes = 128) ( nthreads = 1)
[w-cal-liu:14154] *** An error occurred in MPI_Irecv
[w-cal-liu:14154] *** reported by process [1716453377,0]
[w-cal-liu:14154] *** on communicator MPI COMMUNICATOR 4 DUP FROM 3
[w-cal-liu:14154] *** MPI_ERR_COUNT: invalid count argument
[w-cal-liu:14154] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[w-cal-liu:14154] *** and potentially your MPI job)

----------------------------------
run command is mpirun -np 1 /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/test1/bld/cesm.exe >> cesm.log.$LID 2>&1
 

jedwards

CSEG and Liaisons
Staff member
The cesm log shows that you asked for 128 tasks, but the mpirun command is only asking for 1. I suspect that there is an error in your
config_batch.xml file.
 

xliu

Jon
Member
The cesm log shows that you asked for 128 tasks, but the mpirun command is only asking for 1. I suspect that there is an error in your
config_batch.xml file.
Thanks, I didn't setup batch, just used 'none' as default, could you sent me an example for key.?

<batch_system type="none" >
<batch_query args=""></batch_query>
<batch_submit></batch_submit>
<batch_redirect></batch_redirect>
<batch_directive></batch_directive>
<directives>
<directive></directive>
</directives>
</batch_system>
 

jedwards

CSEG and Liaisons
Staff member
In this line in config_machines.xml we set the number of mpi tasks using {{ total_tasks }}
this should be the same value as the xml variable TOTALPES in your case.
 

xliu

Jon
Member
In this line in config_machines.xml we set the number of mpi tasks using {{ total_tasks }}
this should be the same value as the xml variable TOTALPES in your case.
Thanks.
I use personal Linux PC to test the code. Is there any difference on setting the NODE. ?

<GMAKE_J>8</GMAKE_J>
<BATCH_SYSTEM>none</BATCH_SYSTEM>
<SUPPORTED_BY>__LIU__</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>8</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>8</MAX_MPITASKS_PER_NODE>
 

jedwards

CSEG and Liaisons
Staff member
that sets the maximum tasks per node, but you need to tell mpi how many tasks are needed for the given run.
 

xliu

Jon
Member
./create_newcas
that sets the maximum tasks per node, but you need to tell mpi how many tasks are needed for the given run.
which file for set number of tasks? sorry might so many easy question, kinds of new with this..

This one is trying to test from manual. ./create_newcase --case case1 --compset X --res f19_g16 --mach liu
 

xliu

Jon
Member
like this in the machines file.: how to find TOTALPES for a case.? thx.

<mpirun mpilib="openmpi">
<executable> mpirun </executable>
<arguments>
<arg name="num_tasks"> -np {{ total_tasks }}</arg>
</arguments>
</mpirun>
 

xliu

Jon
Member
error log:
User-specified PIO rearranger comm max pend req (comp2io), 0 (value will be reset as requested)
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type = p2p
comm fcd = 2denable
max pend req (comp2io) = 64
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)( suffix =)
.
(seq_comm_jcommarr) init ID ( 35 ALLIACID ) join multiple comp IDs ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 36 CPLALLIACID ) join IDs = 2 35 ( npes = 16) ( nthreads = 1)
(seq_comm_printcomms) 1 0 16 1 GLOBAL:
(seq_comm_printcomms) 2 0 16 1 CPL:
(seq_comm_printcomms) 3 0 16 1 ALLATMID:
.
(seq_comm_printcomms) 36 0 16 1 CPLALLIACID:
(seq_comm_printcomms) 37 0 16 1 IAC:
(seq_comm_printcomms) 38 0 16 1 CPLIAC:
(t_initf) Read in prof_inparm namelist from: drv_in
(t_initf) Using profile_disable= F
(t_initf) profile_timer= 4
(t_initf) profile_depth_limit= 4
.
(t_initf) profile_add_detail= F
(t_initf) profile_papi_enable= F
m_GlobalSegMap::max_local_segs: bad segment location error, stat =1
000.MCT(MPEU)::die.: from m_GlobalSegMap::max_local_segs()
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 

xliu

Jon
Member
When you run ./preview_run what does the -np value look like now? Should be 16.
yes, it's 16. looks like because i set the 'MAX MPITASKS' to 16.

nodes: 1
total tasks: 16
tasks per node: 16
thread count: 1

BATCH INFO:
FOR JOB: case.run
ENV:
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
None

MPIRUN (job=case.run):
mpirun -np 16 /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/test1/bld/cesm.exe >> cesm.log.$LID 2>&1

FOR JOB: case.st_archive
ENV:
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
None
 

xliu

Jon
Member
just wonder if those warnings are normal when building.. testing the configuration with this.
./create_newcase --case case1 --compset X --res f19_g16 --mach liu

Building esp with output to /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/test1/bld/esp.bldlog.210308-104903
Component iac build complete with 12 warnings
siac built in 0.422800 seconds
Component esp build complete with 12 warnings
sesp built in 0.433709 seconds
Component lnd build complete with 21 warnings
Component atm build complete with 21 warnings
Component ocn build complete with 21 warnings
xlnd built in 0.828020 seconds
xatm built in 0.831846 seconds
xocn built in 0.816502 seconds
Component rof build complete with 22 warnings
xrof built in 0.816468 seconds
Component ice build complete with 21 warnings
xice built in 0.839232 seconds
Component wav build complete with 21 warnings
xwav built in 0.827438 seconds
Component glc build complete with 21 warnings
xglc built in 0.837738 seconds
Building cesm from /home/liu/Projects/CESM_ucar_/CESM2.2.0/cime/src/drivers/mct/cime_config/buildexe with output to /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/test1/
bld/cesm.bldlog.210308-104903
Component cesm exe build complete with 97 warnings
Time spent not building: 0.632064 sec
Time spent building: 44.529102 sec
MODEL BUILD HAS FINISHED SUCCESSFULLY
 

jedwards

CSEG and Liaisons
Staff member
Yes those are expected. Another thing to check is the stack size in your shell - you want to set it to unlimited or as large as possible.
 

xliu

Jon
Member
Yes those are expected. Another thing to check is the stack size in your shell - you want to set it to unlimited or as large as possible.
doesn't work. what is the segment mean.?

m_GlobalSegMap::max_local_segs: bad segment location error, stat =1
000.MCT(MPEU)::die.: from m_GlobalSegMap::max_local_segs()
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 

xliu

Jon
Member
is following normal? .

.
.
2021-03-08 11:06:17 cpl
Calling /home/liu/Projects/CESM_ucar_/CESM2.2.0/cime/src/drivers/mct/cime_config/buildnml
-------------------------------------------------------------------------
- Prestage required restarts into /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/test1/run
- Case input data directory (DIN_LOC_ROOT) is /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/inputdata
- Checking for required input datasets in DIN_LOC_ROOT
-------------------------------------------------------------------------
run command is mpirun -np 16 /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/test1/bld/cesm.exe >> cesm.log.$LID 2>&1
Exception from case_run: ERROR: RUN FAIL: Command ' mpirun -np 16 /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/test1/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /home/liu/Projects/CESM_ucar_/CESM2.2.0/projects/test1/run/cesm.log.210308-110617
Submit job case.st_archive
.
.
 
Top