case.run error

yangx2

xinyi yang
Member
Hi everyone,
After I successfully submitted the case, I get the errors from case.run.
Here shows CaseState first:
****************************************************
2020-09-10 05:30:53: case.build success
---------------------------------------------------
2020-09-10 05:33:35: case.submit starting
---------------------------------------------------
2020-09-10 06:43:20: case.submit success case.run:10801889, case.st_archive:10801890
---------------------------------------------------
2020-09-10 06:43:53: case.run starting
---------------------------------------------------
2020-09-10 06:44:07: model execution starting
---------------------------------------------------
2020-09-10 06:44:40: model execution success
---------------------------------------------------
2020-09-10 06:44:40: case.run error
ERROR: RUN FAIL: Command 'mpirun -np 120 /mnt/scratch/nfs_fs02/yangx2/b1850.test/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /mnt/scratch/nfs_fs02/yangx2/b1850.test/run/cesm.log.10801889.200910-064353

****************************************************

Below shows part of the "/mnt/scratch/nfs_fs02/yangx2/b1850.test/run/cesm.log.10801889.200910-064353" :
***************************************************
Invalid PIO rearranger comm max pend req (comp2io), 0
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type =
p2p

comm fcd =
2denable

max pend req (comp2io) = 0
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 119 1 ( npes = 120) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 79 1 ( npes = 80) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 79 1 ( npes = 80) ( nthreads = 1)( suffix =)
.
.
.
.
[node-0204:25435] *** An error occurred in MPI_Comm_create_keyval
[node-0204:25435] *** reported by process [1384972289,47880295415809]
[node-0204:25435] *** on communicator MPI_COMM_WORLD
[node-0204:25435] *** MPI_ERR_ARG: invalid argument of some other kind
[node-0204:25435] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node-0204:25435] *** and potentially your MPI job)
forrtl: error (78): process killed (SIGTERM)
.
.
.

***************************************************

Any hints would be helpful! Thanks in advance!
Best,
Skylar
 

ntandon

Neil Tandon
Member
In my case, commenting out the lines below in config_machines.xml resolved the error. Apparently, on some machines don't need to be changed, and changing them can produce problems like this, as well as segmentation faults.

<resource_limits>
<resource name="RLIMIT_STACK">-1</resource>
</resource_limits>
 

xliu

Jon
Member
Did anyone get this kinds of error? thanks~

User-specified PIO rearranger comm max pend req (comp2io), 0 (value will be reset as requested)
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type = p2p
comm fcd = 2denable
max pend req (comp2io) = 64
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 167 1 ( npes = 168) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 111 1 ( npes = 112) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 111 1 ( npes = 112) ( nthreads = 1)( suffix =)
....
(seq_comm_joincomm) init ID ( 38 CPLIAC ) join IDs = 2 37 ( npes = 112) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 35 ALLIACID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 36 CPLALLIACID ) join IDs = 2 35 ( npes = 112) ( nthreads = 1)
[w-col-jliu:704] *** An error occurred in MPI_Irecv
[w-col-jliu:704] *** reported by process [1959526401,0]
[w-col-jliu:704] *** on communicator MPI COMMUNICATOR 4 DUP FROM 3
[w-col-jliu:704] *** MPI_ERR_COUNT: invalid count argument
[w-col-jliu:704] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[w-col-jliu:704] *** and potentially your MPI job)
 

taoliu_tech

Tao Liu
Member
Did anyone get this kinds of error? thanks~

User-specified PIO rearranger comm max pend req (comp2io), 0 (value will be reset as requested)
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type = p2p
comm fcd = 2denable
max pend req (comp2io) = 64
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 167 1 ( npes = 168) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 111 1 ( npes = 112) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 111 1 ( npes = 112) ( nthreads = 1)( suffix =)
....
(seq_comm_joincomm) init ID ( 38 CPLIAC ) join IDs = 2 37 ( npes = 112) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 35 ALLIACID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 36 CPLALLIACID ) join IDs = 2 35 ( npes = 112) ( nthreads = 1)
[w-col-jliu:704] *** An error occurred in MPI_Irecv
[w-col-jliu:704] *** reported by process [1959526401,0]
[w-col-jliu:704] *** on communicator MPI COMMUNICATOR 4 DUP FROM 3
[w-col-jliu:704] *** MPI_ERR_COUNT: invalid count argument
[w-col-jliu:704] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[w-col-jliu:704] *** and potentially your MPI job)
I got the same error, and I am still looking for solutions. Did you solve it?
 
Back
Top