Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

case.run error

yangx2

xinyi yang
Member
Hi everyone,
After I successfully submitted the case, I get the errors from case.run.
Here shows CaseState first:
****************************************************
2020-09-10 05:30:53: case.build success
---------------------------------------------------
2020-09-10 05:33:35: case.submit starting
---------------------------------------------------
2020-09-10 06:43:20: case.submit success case.run:10801889, case.st_archive:10801890
---------------------------------------------------
2020-09-10 06:43:53: case.run starting
---------------------------------------------------
2020-09-10 06:44:07: model execution starting
---------------------------------------------------
2020-09-10 06:44:40: model execution success
---------------------------------------------------
2020-09-10 06:44:40: case.run error
ERROR: RUN FAIL: Command 'mpirun -np 120 /mnt/scratch/nfs_fs02/yangx2/b1850.test/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /mnt/scratch/nfs_fs02/yangx2/b1850.test/run/cesm.log.10801889.200910-064353

****************************************************

Below shows part of the "/mnt/scratch/nfs_fs02/yangx2/b1850.test/run/cesm.log.10801889.200910-064353" :
***************************************************
Invalid PIO rearranger comm max pend req (comp2io), 0
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type =
p2p

comm fcd =
2denable

max pend req (comp2io) = 0
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 119 1 ( npes = 120) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 79 1 ( npes = 80) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 79 1 ( npes = 80) ( nthreads = 1)( suffix =)
.
.
.
.
[node-0204:25435] *** An error occurred in MPI_Comm_create_keyval
[node-0204:25435] *** reported by process [1384972289,47880295415809]
[node-0204:25435] *** on communicator MPI_COMM_WORLD
[node-0204:25435] *** MPI_ERR_ARG: invalid argument of some other kind
[node-0204:25435] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node-0204:25435] *** and potentially your MPI job)
forrtl: error (78): process killed (SIGTERM)
.
.
.

***************************************************

Any hints would be helpful! Thanks in advance!
Best,
Skylar
 

ntandon

Neil Tandon
Member
In my case, commenting out the lines below in config_machines.xml resolved the error. Apparently, on some machines don't need to be changed, and changing them can produce problems like this, as well as segmentation faults.

<resource_limits>
<resource name="RLIMIT_STACK">-1</resource>
</resource_limits>
 

xliu

Jon
Member
Did anyone get this kinds of error? thanks~

User-specified PIO rearranger comm max pend req (comp2io), 0 (value will be reset as requested)
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type = p2p
comm fcd = 2denable
max pend req (comp2io) = 64
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 167 1 ( npes = 168) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 111 1 ( npes = 112) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 111 1 ( npes = 112) ( nthreads = 1)( suffix =)
....
(seq_comm_joincomm) init ID ( 38 CPLIAC ) join IDs = 2 37 ( npes = 112) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 35 ALLIACID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 36 CPLALLIACID ) join IDs = 2 35 ( npes = 112) ( nthreads = 1)
[w-col-jliu:704] *** An error occurred in MPI_Irecv
[w-col-jliu:704] *** reported by process [1959526401,0]
[w-col-jliu:704] *** on communicator MPI COMMUNICATOR 4 DUP FROM 3
[w-col-jliu:704] *** MPI_ERR_COUNT: invalid count argument
[w-col-jliu:704] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[w-col-jliu:704] *** and potentially your MPI job)
 

taoliu_tech

Tao Liu
Member
Did anyone get this kinds of error? thanks~

User-specified PIO rearranger comm max pend req (comp2io), 0 (value will be reset as requested)
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type = p2p
comm fcd = 2denable
max pend req (comp2io) = 64
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 167 1 ( npes = 168) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 111 1 ( npes = 112) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 111 1 ( npes = 112) ( nthreads = 1)( suffix =)
....
(seq_comm_joincomm) init ID ( 38 CPLIAC ) join IDs = 2 37 ( npes = 112) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 35 ALLIACID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 36 CPLALLIACID ) join IDs = 2 35 ( npes = 112) ( nthreads = 1)
[w-col-jliu:704] *** An error occurred in MPI_Irecv
[w-col-jliu:704] *** reported by process [1959526401,0]
[w-col-jliu:704] *** on communicator MPI COMMUNICATOR 4 DUP FROM 3
[w-col-jliu:704] *** MPI_ERR_COUNT: invalid count argument
[w-col-jliu:704] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[w-col-jliu:704] *** and potentially your MPI job)
I got the same error, and I am still looking for solutions. Did you solve it?
 
Top