zweina@umich_edu
New Member
Hi,I got errors on runnning step and anyone can provide any suggestion how to fix this? Thank you so much!The screen output shows below: ERROR: RUN FAIL: Command 'mpiexec -n 1 /scratch/ivanov_flux/shared/clm/cases/CLM_USRDAT.I2000Clm40SpCruGs.gnu/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failedSee log file for details: /scratch/ivanov_flux/shared/clm/cases/CLM_USRDAT.I2000Clm40SpCruGs.gnu/run/cesm.log.32553894.nyx.arc-ts.umich.edu.190110-112154
The log file cesm.log.32553894.nyx.arc-ts.umich.edu.190110-112154 details: Invalid PIO rearranger comm max pend req (comp2io), 0 Resetting PIO rearranger comm max pend req (comp2io) to 64 PIO rearranger options: comm type =p2p comm fcd =2denable max pend req (comp2io) = 0 enable_hs (comp2io) = T enable_isend (comp2io) = F max pend req (io2comp) = 64 enable_hs (io2comp) = F enable_isend (io2comp) = T(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 6 CPLATM ) join IDs = 2 5 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 3 ALLATMID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 4 CPLALLATMID ) join IDs = 2 3 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 9 LND ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 10 CPLLND ) join IDs = 2 9 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 7 ALLLNDID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 8 CPLALLLNDID ) join IDs = 2 7 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 13 ICE ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 14 CPLICE ) join IDs = 2 13 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 11 ALLICEID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 12 CPLALLICEID ) join IDs = 2 11 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 17 OCN ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 18 CPLOCN ) join IDs = 2 17 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 15 ALLOCNID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 16 CPLALLOCNID ) join IDs = 2 15 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 21 ROF ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 22 CPLROF ) join IDs = 2 21 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 19 ALLROFID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 20 CPLALLROFID ) join IDs = 2 19 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 25 GLC ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 26 CPLGLC ) join IDs = 2 25 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 23 ALLGLCID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 24 CPLALLGLCID ) join IDs = 2 23 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 29 WAV ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 30 CPLWAV ) join IDs = 2 29 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 27 ALLWAVID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 28 CPLALLWAVID ) join IDs = 2 27 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 33 ESP ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 34 CPLESP ) join IDs = 2 33 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 31 ALLESPID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 32 CPLALLESPID ) join IDs = 2 31 ( npes = 1) ( nthreads = 1)(seq_comm_printcomms) 1 0 1 1 GLOBAL:(seq_comm_printcomms) 2 0 1 1 CPL:(seq_comm_printcomms) 3 0 1 1 ALLATMID:(seq_comm_printcomms) 4 0 1 1 CPLALLATMID:(seq_comm_printcomms) 5 0 1 1 ATM:(seq_comm_printcomms) 6 0 1 1 CPLATM:(seq_comm_printcomms) 7 0 1 1 ALLLNDID:(seq_comm_printcomms) 8 0 1 1 CPLALLLNDID:(seq_comm_printcomms) 9 0 1 1 LND:(seq_comm_printcomms) 10 0 1 1 CPLLND:(seq_comm_printcomms) 11 0 1 1 ALLICEID:(seq_comm_printcomms) 12 0 1 1 CPLALLICEID:(seq_comm_printcomms) 13 0 1 1 ICE:(seq_comm_printcomms) 14 0 1 1 CPLICE:(seq_comm_printcomms) 15 0 1 1 ALLOCNID:(seq_comm_printcomms) 16 0 1 1 CPLALLOCNID:(seq_comm_printcomms) 17 0 1 1 OCN:(seq_comm_printcomms) 18 0 1 1 CPLOCN:(seq_comm_printcomms) 19 0 1 1 ALLROFID:(seq_comm_printcomms) 20 0 1 1 CPLALLROFID:(seq_comm_printcomms) 21 0 1 1 ROF:(seq_comm_printcomms) 22 0 1 1 CPLROF:(seq_comm_printcomms) 23 0 1 1 ALLGLCID:(seq_comm_printcomms) 24 0 1 1 CPLALLGLCID:(seq_comm_printcomms) 25 0 1 1 GLC:(seq_comm_printcomms) 26 0 1 1 CPLGLC:(seq_comm_printcomms) 27 0 1 1 ALLWAVID:(seq_comm_printcomms) 28 0 1 1 CPLALLWAVID:(seq_comm_printcomms) 29 0 1 1 WAV:(seq_comm_printcomms) 30 0 1 1 CPLWAV:(seq_comm_printcomms) 31 0 1 1 ALLESPID:(seq_comm_printcomms) 32 0 1 1 CPLALLESPID:(seq_comm_printcomms) 33 0 1 1 ESP:(seq_comm_printcomms) 34 0 1 1 CPLESP: (t_initf) Read in prof_inparm namelist from: drv_in (t_initf) Using profile_disable= F (t_initf) profile_timer= 4 (t_initf) profile_depth_limit= 4 (t_initf) profile_detail_limit= 2 (t_initf) profile_barrier= F (t_initf) profile_outpe_num= 1 (t_initf) profile_outpe_stride= 0 (t_initf) profile_single_file= F (t_initf) profile_global_stats= T (t_initf) profile_ovhd_measurement= F (t_initf) profile_add_detail= F (t_initf) profile_papi_enable= F 1 pes participating in computation for CLM ----------------------------------- NODE# NAME( 0) nyx6149.arc-ts.umich.edu NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Variable not found NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Variable not found NetCDF: Variable not found NetCDF: Variable not found NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name ERROR: Unknown error submitted to shr_abort_abort.--------------------------------------------------------------------------A process has executed an operation involving a call to the"fork()" system call to create a child process. Open MPI is currentlyoperating in a condition that could result in memory corruption orother system errors; your job may hang, crash, or produce silentdata corruption. The use of fork() (or system() or other calls thatcreate child processes) is strongly discouraged. The process that invoked fork was: Local host: [[23746,1],0] (PID 18422) If you are *absolutely sure* that your application will successfullyand correctly survive a call to fork(), you may disable this warningby setting the mpi_warn_on_fork MCA parameter to 0.--------------------------------------------------------------------------#0 0x2AECDDC37C37#1 0x72288B in __shr_abort_mod_MOD_shr_abort_backtrace#2 0x722AA6 in __shr_abort_mod_MOD_shr_abort_abort#3 0x4A6074 in __abortutils_MOD_endrun_vanilla#4 0x65F160 in __urbaninputmod_MOD_urbaninput#5 0x4B199B in __clm_initializemod_MOD_initialize1#6 0x4A2AC2 in __lnd_comp_mct_MOD_lnd_init_mct#7 0x42228E in __component_mod_MOD_component_init_cc#8 0x4139FC in __cime_comp_mod_MOD_cime_init--------------------------------------------------------------------------MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLDwith errorcode 1001. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.You may or may not see output from other processes, depending onexactly when Open MPI kills them.--------------------------------------------------------------------------In addition, I upload my compiler setting file and there are some of other settings below: COMPILER=gnuRES="CLM_USRDAT"COMPSET="I2000Clm40SpCruGs"config_machines files: University of Michigan HPC .*.arc-ts.umich.edu LINUX gnu openmpi ivanov_flux scratch/ivanov_flux/shared/clm/cases/${COMPSET} /scratch/ivanov_flux/shared/clm/Clinton_project/cesm-inputdata /scratch/ivanov_flux/shared/clm/Clinton_project/cesm-inputdata/atm/datm7 $ENV{HOME}/projects/scratch/archive/$CASE $ENV{HOME}/projects/baselines $CIMEROOT/tools/cprnc/build/cprnc make 16 pbs zweina@umich.edu 72 72 TRUE mpiexec -n $TOTALPES
Config_bathch files: qstat qsub qdel -v #PBS ^(S+)$ -W depend=afterok:jobid -W depend=afterany:jobid : %H:%M:%S -M -m abe -N {{ job_id }} -M zweina@umich.edu -m abe -r {{ rerunnable }} -j oe -V -S {{ shell }} flux
The log file cesm.log.32553894.nyx.arc-ts.umich.edu.190110-112154 details: Invalid PIO rearranger comm max pend req (comp2io), 0 Resetting PIO rearranger comm max pend req (comp2io) to 64 PIO rearranger options: comm type =p2p comm fcd =2denable max pend req (comp2io) = 0 enable_hs (comp2io) = T enable_isend (comp2io) = F max pend req (io2comp) = 64 enable_hs (io2comp) = F enable_isend (io2comp) = T(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 6 CPLATM ) join IDs = 2 5 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 3 ALLATMID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 4 CPLALLATMID ) join IDs = 2 3 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 9 LND ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 10 CPLLND ) join IDs = 2 9 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 7 ALLLNDID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 8 CPLALLLNDID ) join IDs = 2 7 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 13 ICE ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 14 CPLICE ) join IDs = 2 13 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 11 ALLICEID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 12 CPLALLICEID ) join IDs = 2 11 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 17 OCN ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 18 CPLOCN ) join IDs = 2 17 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 15 ALLOCNID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 16 CPLALLOCNID ) join IDs = 2 15 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 21 ROF ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 22 CPLROF ) join IDs = 2 21 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 19 ALLROFID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 20 CPLALLROFID ) join IDs = 2 19 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 25 GLC ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 26 CPLGLC ) join IDs = 2 25 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 23 ALLGLCID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 24 CPLALLGLCID ) join IDs = 2 23 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 29 WAV ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 30 CPLWAV ) join IDs = 2 29 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 27 ALLWAVID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 28 CPLALLWAVID ) join IDs = 2 27 ( npes = 1) ( nthreads = 1)(seq_comm_setcomm) init ID ( 33 ESP ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)(seq_comm_joincomm) init ID ( 34 CPLESP ) join IDs = 2 33 ( npes = 1) ( nthreads = 1)(seq_comm_jcommarr) init ID ( 31 ALLESPID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)(seq_comm_joincomm) init ID ( 32 CPLALLESPID ) join IDs = 2 31 ( npes = 1) ( nthreads = 1)(seq_comm_printcomms) 1 0 1 1 GLOBAL:(seq_comm_printcomms) 2 0 1 1 CPL:(seq_comm_printcomms) 3 0 1 1 ALLATMID:(seq_comm_printcomms) 4 0 1 1 CPLALLATMID:(seq_comm_printcomms) 5 0 1 1 ATM:(seq_comm_printcomms) 6 0 1 1 CPLATM:(seq_comm_printcomms) 7 0 1 1 ALLLNDID:(seq_comm_printcomms) 8 0 1 1 CPLALLLNDID:(seq_comm_printcomms) 9 0 1 1 LND:(seq_comm_printcomms) 10 0 1 1 CPLLND:(seq_comm_printcomms) 11 0 1 1 ALLICEID:(seq_comm_printcomms) 12 0 1 1 CPLALLICEID:(seq_comm_printcomms) 13 0 1 1 ICE:(seq_comm_printcomms) 14 0 1 1 CPLICE:(seq_comm_printcomms) 15 0 1 1 ALLOCNID:(seq_comm_printcomms) 16 0 1 1 CPLALLOCNID:(seq_comm_printcomms) 17 0 1 1 OCN:(seq_comm_printcomms) 18 0 1 1 CPLOCN:(seq_comm_printcomms) 19 0 1 1 ALLROFID:(seq_comm_printcomms) 20 0 1 1 CPLALLROFID:(seq_comm_printcomms) 21 0 1 1 ROF:(seq_comm_printcomms) 22 0 1 1 CPLROF:(seq_comm_printcomms) 23 0 1 1 ALLGLCID:(seq_comm_printcomms) 24 0 1 1 CPLALLGLCID:(seq_comm_printcomms) 25 0 1 1 GLC:(seq_comm_printcomms) 26 0 1 1 CPLGLC:(seq_comm_printcomms) 27 0 1 1 ALLWAVID:(seq_comm_printcomms) 28 0 1 1 CPLALLWAVID:(seq_comm_printcomms) 29 0 1 1 WAV:(seq_comm_printcomms) 30 0 1 1 CPLWAV:(seq_comm_printcomms) 31 0 1 1 ALLESPID:(seq_comm_printcomms) 32 0 1 1 CPLALLESPID:(seq_comm_printcomms) 33 0 1 1 ESP:(seq_comm_printcomms) 34 0 1 1 CPLESP: (t_initf) Read in prof_inparm namelist from: drv_in (t_initf) Using profile_disable= F (t_initf) profile_timer= 4 (t_initf) profile_depth_limit= 4 (t_initf) profile_detail_limit= 2 (t_initf) profile_barrier= F (t_initf) profile_outpe_num= 1 (t_initf) profile_outpe_stride= 0 (t_initf) profile_single_file= F (t_initf) profile_global_stats= T (t_initf) profile_ovhd_measurement= F (t_initf) profile_add_detail= F (t_initf) profile_papi_enable= F 1 pes participating in computation for CLM ----------------------------------- NODE# NAME( 0) nyx6149.arc-ts.umich.edu NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Variable not found NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Variable not found NetCDF: Variable not found NetCDF: Variable not found NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name NetCDF: Invalid dimension ID or name ERROR: Unknown error submitted to shr_abort_abort.--------------------------------------------------------------------------A process has executed an operation involving a call to the"fork()" system call to create a child process. Open MPI is currentlyoperating in a condition that could result in memory corruption orother system errors; your job may hang, crash, or produce silentdata corruption. The use of fork() (or system() or other calls thatcreate child processes) is strongly discouraged. The process that invoked fork was: Local host: [[23746,1],0] (PID 18422) If you are *absolutely sure* that your application will successfullyand correctly survive a call to fork(), you may disable this warningby setting the mpi_warn_on_fork MCA parameter to 0.--------------------------------------------------------------------------#0 0x2AECDDC37C37#1 0x72288B in __shr_abort_mod_MOD_shr_abort_backtrace#2 0x722AA6 in __shr_abort_mod_MOD_shr_abort_abort#3 0x4A6074 in __abortutils_MOD_endrun_vanilla#4 0x65F160 in __urbaninputmod_MOD_urbaninput#5 0x4B199B in __clm_initializemod_MOD_initialize1#6 0x4A2AC2 in __lnd_comp_mct_MOD_lnd_init_mct#7 0x42228E in __component_mod_MOD_component_init_cc#8 0x4139FC in __cime_comp_mod_MOD_cime_init--------------------------------------------------------------------------MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLDwith errorcode 1001. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.You may or may not see output from other processes, depending onexactly when Open MPI kills them.--------------------------------------------------------------------------In addition, I upload my compiler setting file and there are some of other settings below: COMPILER=gnuRES="CLM_USRDAT"COMPSET="I2000Clm40SpCruGs"config_machines files: University of Michigan HPC .*.arc-ts.umich.edu LINUX gnu openmpi ivanov_flux scratch/ivanov_flux/shared/clm/cases/${COMPSET} /scratch/ivanov_flux/shared/clm/Clinton_project/cesm-inputdata /scratch/ivanov_flux/shared/clm/Clinton_project/cesm-inputdata/atm/datm7 $ENV{HOME}/projects/scratch/archive/$CASE $ENV{HOME}/projects/baselines $CIMEROOT/tools/cprnc/build/cprnc make 16 pbs zweina@umich.edu 72 72 TRUE mpiexec -n $TOTALPES
Config_bathch files: qstat qsub qdel -v #PBS ^(S+)$ -W depend=afterok:jobid -W depend=afterany:jobid : %H:%M:%S -M -m abe -N {{ job_id }} -M zweina@umich.edu -m abe -r {{ rerunnable }} -j oe -V -S {{ shell }} flux