Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Errors when running CESM2.0.1

tckm@whu_edu_cn

New Member
hi,    I built a case with the f19_f19_mg16 res and FXHIST compset successfully through the porting machine:whuatm. I run the case.submit,and got the case.run submitted to the batch system. But there is a error aborting the running process.Someone who can help me to solve this problem,thanks very much.machine name : whuatm the logging file:Setting resource.RLIMIT_STACK to -1 from (-1, -1)Generating namelists for /project/hp_home/kmtchen/CESM2.0.1/cime/scripts/ggCreating component namelists   Calling /project/hp_home/kmtchen/CESM2.0.1/components/cam//cime_config/buildnmlCAM namelist copy: file1 /project/hp_home/kmtchen/CESM2.0.1/cime/scripts/gg/Buildconf/camconf/atm_in file2 /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/run/atm_in   Calling /project/hp_home/kmtchen/CESM2.0.1/components/clm//cime_config/buildnml   Calling /project/hp_home/kmtchen/CESM2.0.1/components/cice//cime_config/buildnml   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/components/data_comps/docn/cime_config/buildnml   Calling /project/hp_home/kmtchen/CESM2.0.1/components/rtm//cime_config/buildnml   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/components/stub_comps/sglc/cime_config/buildnml   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/components/stub_comps/swav/cime_config/buildnml   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/components/stub_comps/sesp/cime_config/buildnml   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/drivers/mct/cime_config/buildnml   NOTE: ignoring setting of rof2ocn_liq_rmapname=idmap in seq_maps.rc   NOTE: ignoring setting of rof2ocn_ice_rmapname=idmap in seq_maps.rc   NOTE: ignoring setting of rof2ocn_fmapname=idmap in seq_maps.rcFinished creating component namelists------------------------------------------------------------------------- - Prestage required restarts into /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/run - Case input data directory (DIN_LOC_ROOT) is /project/kmtchen/cesmdata/inputdata - Checking for required input datasets in DIN_LOC_ROOT-------------------------------------------------------------------------2019-09-10 18:21:15 MODEL EXECUTION BEGINS HERErun command is srun --ntasks=64 /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/bld/cesm.exe  >> cesm.log.$LID 2>&1ERROR: RUN FAIL: Command 'srun --ntasks=64 /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/bld/cesm.exe  >> cesm.log.$LID 2>&1 ' failedSee log file for details: /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/run/cesm.log.4918165.190910-182109 
 

jedwards

CSEG and Liaisons
Staff member
THis error is happening in mpi_init or even earlier than that - you should try something like a hello world program or talk to your sys-ad about the issue.
 

tckm@whu_edu_cn

New Member
I run a hello world program successfully. Then I ask the administrator of the supercomputer service center, he told me there is no error with the intelmpi and he could not help me solving the problem. If there are some other possibilities causing this error, or i miss some  details to check the intelmpi.  Thanks and looking forward to your reply.$ srun -n 2 ./hello_world   HELLO_MPI - Master process:  FORTRAN90/MPI version  Process        1 says "Hello, world!"n0065  An MPI test program.  The number of processes is        2  Process        0 says "Hello, world!"n0065
 

jedwards

CSEG and Liaisons
Staff member
Are you posting the entire cesm.log or just the tail of it?   There is a known problem in impi in the io layer on lustre file systems - if this describes your system, try changingto serial IO with ./xmlchange PIO_TYPENAME=netcdf
 

tckm@whu_edu_cn

New Member
Yes, i use the impi on lustre file systems. So sad. Then i run case with the new porting cesm2.1.1 and serial IO with ./xmlchange PIO_TYPENAME=netcdf before run the case.submit, but it didn't work, the error still exists. And the cesm.log attached to the following. Whether the way to solve this problem is to change the mpi such as openmpi.
 

jedwards

CSEG and Liaisons
Staff member
I think that the error you are getting is happening even before the model starts - I notice you are using cpu-binding, try turning that off and also try running on a single node instead of two as you are now.   It might also help to start with a very simple case such as compset X and resolution f19_g16.   Have you followed the testing and porting guidelines from the cime documentation?  https://esmci.github.io/cime/users_guide/porting-cime.html
 

tckm@whu_edu_cn

New Member
I turn the cpu-binding off and run case on a single node. Due to the slow internet speeds , i can't download the inputdata for the compset X,so i still run case with the compset FXHIST and res f19_f19_mg16. The same error happened.I followed the steps for porting of the porting guide when i port the cesm.But i didn't validate the porting using the way provided from the porting guide. I create and setup and build and submit a new case directly,and there is no error till the srun cesm.exe couldn't be  executed. So now i run the create_test to validate the CESM porting. Because it will spend too much time to run the create_test, i only attach some test.log about the test information. When the test ends , i will attach all log information. Setting resource.RLIMIT_STACK to -1 from (-1, -1)Generating namelists for /project/sygu/tckm/cesm2.1.1/cime/scripts/lastCreating component namelists   Calling /project/sygu/tckm/cesm2.1.1/components/cam//cime_config/buildnmlCAM namelist copy: file1 /project/sygu/tckm/cesm2.1.1/cime/scripts/last/Buildconf/camconf/atm_in file2 /project/sygu/tckm/cesmdata/cesm2.1.1/last/run/atm_in   Calling /project/sygu/tckm/cesm2.1.1/components/clm//cime_config/buildnml   Calling /project/sygu/tckm/cesm2.1.1/components/cice//cime_config/buildnml   Calling /project/sygu/tckm/cesm2.1.1/cime/src/components/data_comps/docn/cime_config/buildnml   Calling /project/sygu/tckm/cesm2.1.1/components/rtm//cime_config/buildnml   Calling /project/sygu/tckm/cesm2.1.1/cime/src/components/stub_comps/sglc/cime_config/buildnml   Calling /project/sygu/tckm/cesm2.1.1/cime/src/components/stub_comps/swav/cime_config/buildnml   Calling /project/sygu/tckm/cesm2.1.1/cime/src/components/stub_comps/sesp/cime_config/buildnml   Calling /project/sygu/tckm/cesm2.1.1/cime/src/drivers/mct/cime_config/buildnml   NOTE: ignoring setting of rof2ocn_liq_rmapname=idmap in seq_maps.rc   NOTE: ignoring setting of rof2ocn_ice_rmapname=idmap in seq_maps.rcFinished creating component namelists-------------------------------------------------------------------------- Prestage required restarts into /project/sygu/tckm/cesmdata/cesm2.1.1/last/run - Case input data directory (DIN_LOC_ROOT) is /project/sygu/tckm/cesmdata/inputdata - Checking for required input datasets in DIN_LOC_ROOT-------------------------------------------------------------------------2019-09-12 00:27:18 MODEL EXECUTION BEGINS HERErun command is srun -n 16 /project/sygu/tckm/cesmdata/cesm2.1.1/last/bld/cesm.exe  >> cesm.log.$LID 2>&1ERROR: RUN FAIL: Command 'srun -n 16 /project/sygu/tckm/cesmdata/cesm2.1.1/last/bld/cesm.exe  >> cesm.log.$LID 2>&1 ' failed See log file for details: /project/sygu/tckm/cesmdata/cesm2.1.1/last/run/cesm.log.4984658.190912-002715
 
Top