Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Cesm.exe running issue

xliu

Jon
Member
built success..but shows this error. Thanks.

infile is /home/liu/Projects/CESM_ucar_/CESM1_2_2_1/projects/test1/Buildconf/cplconf/cesm_namelist
-----------
check_case OK
UNSET: Command not found.
ERROR: cesm_submit problem sourcing templar
 

dbailey

CSEG and Liaisons
Staff member
What machine is this on? It looks like cesm.submit does not understand the queuing system.
 

xliu

Jon
Member
This code is fairly old, and so I am not certain if it will work on your machine. Have you already followed the steps here?

fixed batch error. but now other error show following:
does this mean out of memory? Thanks


dead_setNewGrid decomp 2d 23 144 70 72 1 48
m_GlobalSegMap::max_local_segs: bad segment location error, stat =1
000.MCT(MPEU)::die.: from m_GlobalSegMap::max_local_segs()
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 

xliu

Jon
Member
I think @jedwards might have some insight on this.
(seq_comm_setcomm) initialize ID ( 1 GLOBAL ) pelist = 0 31 1 ( npes = 32) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 2 CPL ) pelist = 0 31 1 ( npes = 32) ( nthreads = 1)
.
(seq_comm_jcommarr) initialize ID ( 9 ALLWAVID ) join multiple comp IDs ( npes = 32) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 16 CPLALLWAVID ) join IDs = 2 9 ( npes = 32) ( nthreads = 1)
(seq_comm_printcomms) 1 0 32 1 GLOBAL:
(seq_comm_printcomms) 2 0 32 1 CPL:
(seq_comm_printcomms) 3 0 32 1 ALLATMID:
.
(seq_comm_printcomms) 29 0 32 1 WAV:
(seq_comm_printcomms) 30 0 32 1 CPLWAV:
(t_initf) Read in prof_inparm namelist from: drv_in
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 987.36
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
.
Memory block size conversion in bytes is 978.61
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 978.61
seq_flds_mod: read seq_cplflds_inparm namelist from: drv_in
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 950.23
seq_flds_mod: read seq_cplflds_userspec namelist from: drv_in
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 945.09
8 MB memory alloc in MB is 8.00
.
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 960.23
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
.
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 994.38
seq_flds_mod: seq_flds_a2x_states=
Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:Sa_pslv
seq_flds_mod: seq_flds_a2x_fluxes=
Faxa_rainc:Faxa_rainl:Faxa_snowc:Faxa_snowl:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_swnet:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4
seq_flds_mod: seq_flds_x2a_states=
Sf_lfrac:Sf_ifrac:Sf_ofrac:Sx_avsdr:Sx_anidr:Sx_avsdf:Sx_anidf:Sx_tref:Sx_qref:So_t:Sx_t:Sl_fv:Sl_ram1:Sl_snowh:Si_snowh:So_ssq:So_re:Sx_u10:So_ustar
seq_flds_mod: seq_flds_x2a_fluxes=
Faxx_taux:Faxx_tauy:Faxx_lat:Faxx_sen:Faxx_lwup:Faxx_evap:Fall_flxdst1:Fall_flxdst2:Fall_flxdst3:Fall_flxdst4
seq_flds_mod: seq_flds_l2x_states=
Sl_avsdr:Sl_anidr:Sl_avsdf:Sl_anidf:Sl_tref:Sl_qref:Sl_t:Sl_fv:Sl_ram1:Sl_snowh:Sl_u10
seq_flds_mod: seq_flds_l2x_fluxes=
Fall_swnet:Fall_taux:Fall_tauy:Fall_lat:Fall_sen:Fall_lwup:Fall_evap:Fall_flxdst1:Fall_flxdst2:Fall_flxdst3:Fall_flxdst4:Flrl_rofliq:Flrl_rofice
seq_flds_mod: seq_flds_x2l_states=
Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Slrr_volr
seq_flds_mod: seq_flds_x2l_fluxes=
Faxa_rainc:Faxa_rainl:Faxa_snowc:Faxa_snowl:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Flrr_flood
seq_flds_mod: seq_flds_i2x_states=
Si_avsdr:Si_anidr:Si_avsdf:Si_anidf:Si_tref:Si_qref:Si_t:Si_snowh:Si_u10:Si_ifrac
seq_flds_mod: seq_flds_i2x_fluxes=
Faii_swnet:Fioi_swpen:Faii_taux:Fioi_taux:Faii_tauy:Fioi_tauy:Faii_lat:Faii_sen:Faii_lwup:Faii_evap:Fioi_melth:Fioi_meltw:Fioi_salt
seq_flds_mod: seq_flds_x2i_states=
Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:So_t:So_s:So_u:So_v:So_dhdx:So_dhdy
seq_flds_mod: seq_flds_x2i_fluxes=
Faxa_rain:Faxa_snow:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Fioo_q
seq_flds_mod: seq_flds_o2x_states=
So_t:So_s:So_u:So_v:So_dhdx:So_dhdy:So_bldepth
seq_flds_mod: seq_flds_o2x_fluxes=
Fioo_q
seq_flds_mod: seq_flds_x2o_states=
Sa_pslv:So_duu10n:Si_ifrac:Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokes
seq_flds_mod: seq_flds_x2o_fluxes=
Faxa_rain:Faxa_snow:Faxa_prec:Faxa_lwdn:Foxx_swnet:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Foxx_taux:Foxx_tauy:Foxx_lat:Foxx_sen:Foxx_lwup:Foxx_evap:Fioi_melth:Fioi_meltw:Fioi_salt:Forr_roff:Forr_ioff
seq_flds_mod: seq_flds_s2x_states=
Ss_tsrf01:Ss_topo01:Ss_tsrf02:Ss_topo02:Ss_tsrf03:Ss_topo03:Ss_tsrf04:Ss_topo04:Ss_tsrf05:Ss_topo05:Ss_tsrf06:Ss_topo06:Ss_tsrf07:Ss_topo07:Ss_tsrf08:Ss_topo08:Ss_tsrf09:Ss_topo09:Ss_tsrf10:Ss_topo10
seq_flds_mod: seq_flds_s2x_fluxes=
Fgss_qice01:Fgss_qice02:Fgss_qice03:Fgss_qice04:Fgss_qice05:Fgss_qice06:Fgss_qice07:Fgss_qice08:Fgss_qice09:Fgss_qice10
seq_flds_mod: seq_flds_x2s_states=
Sg_frac01:Sg_topo01:Sg_frac02:Sg_topo02:Sg_frac03:Sg_topo03:Sg_frac04:Sg_topo04:Sg_frac05:Sg_topo05:Sg_frac06:Sg_topo06:Sg_frac07:Sg_topo07:Sg_frac08:Sg_topo08:Sg_frac09:Sg_topo09:Sg_frac10:Sg_topo10
seq_flds_mod: seq_flds_x2s_fluxes=
Fsgg_rofi01:Fsgg_rofl01:Fsgg_hflx01:Fsgg_rofi02:Fsgg_rofl02:Fsgg_hflx02:Fsgg_rofi03:Fsgg_rofl03:Fsgg_hflx03:Fsgg_rofi04:Fsgg_rofl04:Fsgg_hflx04:Fsgg_rofi05:Fsgg_rofl05:Fsgg_hflx05:Fsgg_rofi06:Fsgg_rofl06:Fsgg_hflx06:Fsgg_rofi07:Fsgg_rofl07:Fsgg_hflx07:Fsgg_rofi08:Fsgg_rofl08:Fsgg_hflx08:Fsgg_rofi09:Fsgg_rofl09:Fsgg_hflx09:Fsgg_rofi10:Fsgg_rofl10:Fsgg_hflx10
seq_flds_mod: seq_flds_g2x_states=
Sg_frac01:Sg_topo01:Sg_frac02:Sg_topo02:Sg_frac03:Sg_topo03:Sg_frac04:Sg_topo04:Sg_frac05:Sg_topo05:Sg_frac06:Sg_topo06:Sg_frac07:Sg_topo07:Sg_frac08:Sg_topo08:Sg_frac09:Sg_topo09:Sg_frac10:Sg_topo10
seq_flds_mod: seq_flds_g2x_fluxes=
Fsgg_rofi01:Fsgg_rofl01:Fsgg_hflx01:Fsgg_rofi02:Fsgg_rofl02:Fsgg_hflx02:Fsgg_rofi03:Fsgg_rofl03:Fsgg_hflx03:Fsgg_rofi04:Fsgg_rofl04:Fsgg_hflx04:Fsgg_rofi05:Fsgg_rofl05:Fsgg_hflx05:Fsgg_rofi06:Fsgg_rofl06:Fsgg_hflx06:Fsgg_rofi07:Fsgg_rofl07:Fsgg_hflx07:Fsgg_rofi08:Fsgg_rofl08:Fsgg_hflx08:Fsgg_rofi09:Fsgg_rofl09:Fsgg_hflx09:Fsgg_rofi10:Fsgg_rofl10:Fsgg_hflx10
seq_flds_mod: seq_flds_x2g_states=
Ss_tsrf01:Ss_topo01:Ss_tsrf02:Ss_topo02:Ss_tsrf03:Ss_topo03:Ss_tsrf04:Ss_topo04:Ss_tsrf05:Ss_topo05:Ss_tsrf06:Ss_topo06:Ss_tsrf07:Ss_topo07:Ss_tsrf08:Ss_topo08:Ss_tsrf09:Ss_topo09:Ss_tsrf10:Ss_topo10
seq_flds_mod: seq_flds_x2g_fluxes=
Fgss_qice01:Fgss_qice02:Fgss_qice03:Fgss_qice04:Fgss_qice05:Fgss_qice06:Fgss_qice07:Fgss_qice08:Fgss_qice09:Fgss_qice10
seq_flds_mod: seq_flds_xao_states=
So_tref:So_qref:So_ssq:So_re:So_u10:So_duu10n:So_ustar
seq_flds_mod: seq_flds_xao_albedo=
So_avsdr:So_anidr:So_avsdf:So_anidf
seq_flds_mod: seq_flds_r2x_states=
Slrr_volr
seq_flds_mod: seq_flds_r2x_fluxes=
Forr_roff:Forr_ioff:Flrr_flood
seq_flds_mod: seq_flds_x2r_states=

seq_flds_mod: seq_flds_x2r_fluxes=
Flrl_rofliq:Flrl_rofice
seq_flds_mod: seq_flds_w2x_states=
Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokes
seq_flds_mod: seq_flds_w2x_fluxes=

seq_flds_mod: seq_flds_x2w_states=
Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepth
seq_flds_mod: seq_flds_x2w_fluxes=

At line 657 of file /home/liu/Projects/CESM_ucar_/CESM1_2_2_1/models/drv/shr/seq_infodata_mod.F90 (unit = 98)
Fortran runtime error: Cannot open file 'rpointer.drv': No such file or directory


Error termination. Backtrace:
#0 0x152a7013c32a
#1 0x152a7013ced5
#2 0x152a7013d69d
#3 0x152a702afecd
#4 0x152a702b0214
#5 0x55c526e269a9
#6 0x55c526d8d719
#7 0x55c526da1879
#8 0x152a6f9b3bf6
#9 0x55c526d7f299
#10 0xffffffffffffffff
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[43319,1],0]
Exit code: 2
--------------------------------------------------------------------------
any idea about bold error? Thanks.
 

jedwards

CSEG and Liaisons
Staff member
You are running in a mode that expects restart files to start from - either CONTINUE_RUN is True or you are in HYBRID or BRANCH mode and the restart rpointer files (and probably also the restart files) are not present.
 

xliu

Jon
Member
You are running in a mode that expects restart files to start from - either CONTINUE_RUN is True or you are in HYBRID or BRANCH mode and the restart rpointer files (and probably also the restart files) are not present.
Thanks Jim, back to same MPI error again, any clue?, is code building error or batch environment setting issue?..
(seq_comm_setcomm) initialize ID ( 1 GLOBAL ) pelist = 0 31 1 ( npes = 32) ( nthreads = 1)
.
(seq_comm_joincomm) initialize ID ( 16 CPLALLWAVID ) join IDs = 2 9 ( npes = 32) ( nthreads = 1)
(seq_comm_printcomms) 1 0 32 1 GLOBAL:
.
(seq_comm_printcomms) 29 0 32 1 WAV:
(seq_comm_printcomms) 30 0 32 1 CPLWAV:
(t_initf) Read in prof_inparm namelist from: drv_in
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
.
Memory block size conversion in bytes is 945.94
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
.
Memory block size conversion in bytes is 985.50
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 959.36
seq_flds_mod: read seq_cplflds_userspec namelist from: drv_in
8 MB memory alloc in MB is 8.00
.
Memory block size conversion in bytes is 953.68
seq_flds_mod: seq_flds_a2x_states=
Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:Sa_pslv
seq_flds_mod: seq_flds_a2x_fluxes=
.
seq_flds_mod: seq_flds_i2x_states=
.
So_tref:So_qref:So_ssq:So_re:So_u10:So_duu10n:So_ustar
seq_flds_mod: seq_flds_xao_albedo=
So_avsdr:So_anidr:So_avsdf:So_anidf
seq_flds_mod: seq_flds_r2x_states=
Slrr_volr
seq_flds_mod: seq_flds_r2x_fluxes=
Forr_roff:Forr_ioff:Flrr_flood
seq_flds_mod: seq_flds_x2r_states=

seq_flds_mod: seq_flds_x2r_fluxes=
Flrl_rofliq:Flrl_rofice
seq_flds_mod: seq_flds_w2x_states=
Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokes
seq_flds_mod: seq_flds_w2x_fluxes=

seq_flds_mod: seq_flds_x2w_states=
Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepth
seq_flds_mod: seq_flds_x2w_fluxes=

dead_setNewGrid decomp 2d 3 144 1 72 7 8
.
dead_setNewGrid decomp 2d 23 72 1 72 38 38
dead_setNewGrid decomp seg 3 104 8
.
dead_setNewGrid decomp seg 29 364 28
dead_setNewGrid decomp 2d 13 348 44 46 1 116
..
dead_setNewGrid decomp 2d 29 92 67 68 1 46
m_GlobalSegMap::max_local_segs: bad segment location error, stat =1
000.MCT(MPEU)::die.: from m_GlobalSegMap::max_local_segs()
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 

jedwards

CSEG and Liaisons
Staff member
This looks like it may be a stack size issue, make sure that
the user stack is set to unlimited or the largest allowed value.
 

xliu

Jon
Member
This looks like it may be a stack size issue, make sure that
the user stack is set to unlimited or the largest allowed value.
I set it it unlimited,
./create_newcase --case $case_path --compset X --res f45_g37 --mach liu
After building, it shows successfult, but when I try case.run,
echo "before cesm.exe, edited by J, 20210318"
mpirun -np 32 $EXEROOT/cesm.exe >&! cesm.log.$LID
echo "after cesm.exe, checking...."

mpirun -np... did not real run
 

jedwards

CSEG and Liaisons
Staff member
I don't have any other suggestions - cesm 1.1.2 is very old and unsupported code - can you update to cesm2?
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
Can you post the tail of the cesm.log and cpl.log files for your simulation with CESM2? Also what version of CESM2? I also suggest following recommendations on debugging for CESM and cime. Also run with DEBUG=TRUE to get more diagnostic warnings. I'd try setting up some out of the box cases that are known to work and compare those to the case that's failing for you. Setup simpler cases with fewer processors and/or lower resolution. Also try on different machines or compilers, especially machines that CESM has been ported too. I'd also recommend trying a simpler compset with fewer prognostic components, so you can ignore problems in the components you are the least interested in. That also might help you to figure out which component is having trouble.

Here's a link on porting for the latest cime...


We don't have resources to debug problems for CESM users. We can help guide a bit, but users need to dig into how the model is running and find out what's going wrong. Now, if you can find a simple out of the box setup of a problem on the latest CESM release that we can replicate we will work on solving that problem.

I also recommend you enlist the help of the system administrators for the machine you are working on. This might be problems specific to the machine you are running on.
 

xliu

Jon
Member
Can you post the tail of the cesm.log and cpl.log files for your simulation with CESM2? Also what version of CESM2? I also suggest following recommendations on debugging for CESM and cime. Also run with DEBUG=TRUE to get more diagnostic warnings. I'd try setting up some out of the box cases that are known to work and compare those to the case that's failing for you. Setup simpler cases with fewer processors and/or lower resolution. Also try on different machines or compilers, especially machines that CESM has been ported too. I'd also recommend trying a simpler compset with fewer prognostic components, so you can ignore problems in the components you are the least interested in. That also might help you to figure out which component is having trouble.

Here's a link on porting for the latest cime...


We don't have resources to debug problems for CESM users. We can help guide a bit, but users need to dig into how the model is running and find out what's going wrong. Now, if you can find a simple out of the box setup of a problem on the latest CESM release that we can replicate we will work on solving that problem.

I also recommend you enlist the help of the system administrators for the machine you are working on. This might be problems specific to the machine you are running on.
Thanks Erik,

Currently, I am testing if the porting is correct with CESM 2.2.0 and 1.2.2.1 as well. The machine is KDE Ubuntu 18.04, OS type 64 bit; 56 core cpu and 64G ram. Wonder it is mpirun or building issue, but lots of warnings though..

For the CESM 2.2.0, I did:
./create_newcase --case $case_path --compset X --res f19_g16 --mach liu
cd $case_path
./case.setup
./preview_run
./xmlchange DEBUG=TRUE
./case.build
./check_case
./check_input_data --download
./case.submit

cesm.log:
...
(seq_comm_printcomms) 37 0 16 1 IAC:
(seq_comm_printcomms) 38 0 16 1 CPLIAC:
(t_initf) Read in prof_inparm namelist from: drv_in
(t_initf) Using profile_disable= F
(t_initf) profile_timer= 4
(t_initf) profile_depth_limit= 4
(t_initf) profile_detail_limit= 2
(t_initf) profile_barrier= F
(t_initf) profile_outpe_num= 1
(t_initf) profile_outpe_stride= 0
(t_initf) profile_single_file= F
(t_initf) profile_global_stats= T
(t_initf) profile_ovhd_measurement= F
(t_initf) profile_add_detail= F
(t_initf) profile_papi_enable= F
m_GlobalSegMap::max_local_segs: bad segment location error, stat =1
000.MCT(MPEU)::die.: from m_GlobalSegMap::max_local_segs()
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

cpl.log:
...

(seq_timemgr_clockPrint) Alarm = 9 seq_timemgr_alarm_pause
(seq_timemgr_clockPrint) Prev Time = 00001201 00000
(seq_timemgr_clockPrint) Next Time = 99991201 00000
(seq_timemgr_clockPrint) Intervl yms = 9999 0 0

tfreeze_option is mushy

(seq_mct_drv) : Initialize each component: atm, lnd, rof, ocn, ice, glc, wav, esp, iac
(component_init_cc:mct) : Initialize component atm
(component_init_cc:mct) : Initialize component lnd
(component_init_cc:mct) : Initialize component rof
(component_init_cc:mct) : Initialize component ocn
(component_init_cc:mct) : Initialize component ice
(component_init_cc:mct) : Initialize component glc
(component_init_cc:mct) : Initialize component wav
(component_init_cc:mct) : Initialize component esp
(component_init_cc:mct) : Initialize component iac
 
Top