Main menu

Navigation

Errors when running CESM2.0.1

7 posts / 0 new
Last post
tckm@...
Errors when running CESM2.0.1

hi,

    I built a case with the f19_f19_mg16 res and FXHIST compset successfully through the porting machine:whuatm. I run the case.submit,and got the case.run submitted to the batch system. But there is a error aborting the running process.Someone who can help me to solve this problem,thanks very much.

machine name : whuatm 

the logging file:

Setting resource.RLIMIT_STACK to -1 from (-1, -1)

Generating namelists for /project/hp_home/kmtchen/CESM2.0.1/cime/scripts/gg

Creating component namelists

   Calling /project/hp_home/kmtchen/CESM2.0.1/components/cam//cime_config/buildnml

CAM namelist copy: file1 /project/hp_home/kmtchen/CESM2.0.1/cime/scripts/gg/Buildconf/camconf/atm_in file2 /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/run/atm_in

   Calling /project/hp_home/kmtchen/CESM2.0.1/components/clm//cime_config/buildnml

   Calling /project/hp_home/kmtchen/CESM2.0.1/components/cice//cime_config/buildnml

   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/components/data_comps/docn/cime_config/buildnml

   Calling /project/hp_home/kmtchen/CESM2.0.1/components/rtm//cime_config/buildnml

   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/components/stub_comps/sglc/cime_config/buildnml

   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/components/stub_comps/swav/cime_config/buildnml

   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/components/stub_comps/sesp/cime_config/buildnml

   Calling /project/hp_home/kmtchen/CESM2.0.1/cime/src/drivers/mct/cime_config/buildnml

   NOTE: ignoring setting of rof2ocn_liq_rmapname=idmap in seq_maps.rc

   NOTE: ignoring setting of rof2ocn_ice_rmapname=idmap in seq_maps.rc

   NOTE: ignoring setting of rof2ocn_fmapname=idmap in seq_maps.rc

Finished creating component namelists

-------------------------------------------------------------------------

 - Prestage required restarts into /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/run

 - Case input data directory (DIN_LOC_ROOT) is /project/kmtchen/cesmdata/inputdata

 - Checking for required input datasets in DIN_LOC_ROOT

-------------------------------------------------------------------------

2019-09-10 18:21:15 MODEL EXECUTION BEGINS HERE

run command is srun --ntasks=64 /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/bld/cesm.exe  >> cesm.log.$LID 2>&1

ERROR: RUN FAIL: Command 'srun --ntasks=64 /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/bld/cesm.exe  >> cesm.log.$LID 2>&1 ' failed

See log file for details: /project/kmtchen/cesmdata/cesm2.0.1/gg/gg/run/cesm.log.4918165.190910-182109

 

jedwards

THis error is happening in mpi_init or even earlier than that - you should try something like a hello world program or talk to your sys-ad about the issue.

CESM Software Engineer

tckm@...

I run a hello world program successfully. Then I ask the administrator of the supercomputer service center, he told me there is no error with the intelmpi and he could not help me solving the problem. If there are some other possibilities causing this error, or i miss some  details to check the intelmpi. 

Thanks and looking forward to your reply.

$ srun -n 2 ./hello_world 

  HELLO_MPI - Master process:

  FORTRAN90/MPI version

  Process        1 says "Hello, world!"n0065

  An MPI test program.

  The number of processes is        2

  Process        0 says "Hello, world!"n0065

jedwards

Are you posting the entire cesm.log or just the tail of it?   There is a known problem in impi in the io layer on lustre file systems - if this describes your system, try changing

to serial IO with ./xmlchange PIO_TYPENAME=netcdf

CESM Software Engineer

tckm@...

Yes, i use the impi on lustre file systems. So sad. Then i run case with the new porting cesm2.1.1 and serial IO with ./xmlchange PIO_TYPENAME=netcdf before run the case.submit, but it didn't work, the error still exists. And the cesm.log attached to the following. Whether the way to solve this problem is to change the mpi such as openmpi.

jedwards

I think that the error you are getting is happening even before the model starts - I notice you are using cpu-binding, try turning that off and also try running on a single node instead of two as you are now.   It might also help to start with a very simple case such as compset X and resolution f19_g16.   Have you followed the testing and porting guidelines from the cime documentation?  https://esmci.github.io/cime/users_guide/porting-cime.html

CESM Software Engineer

tckm@...

I turn the cpu-binding off and run case on a single node. Due to the slow internet speeds , i can't download the inputdata for the compset X,so i still run case with the compset FXHIST and res f19_f19_mg16. The same error happened.

I followed the steps for porting of the porting guide when i port the cesm.But i didn't validate the porting using the way provided from the porting guide. I create and setup and build and submit a new case directly,and there is no error till the srun cesm.exe couldn't be  executed. So now i run the create_test to validate the CESM porting. Because it will spend too much time to run the create_test, i only attach some test.log about the test information. When the test ends , i will attach all log information.

 

Setting resource.RLIMIT_STACK to -1 from (-1, -1)

Generating namelists for /project/sygu/tckm/cesm2.1.1/cime/scripts/last

Creating component namelists

   Calling /project/sygu/tckm/cesm2.1.1/components/cam//cime_config/buildnml

CAM namelist copy: file1 /project/sygu/tckm/cesm2.1.1/cime/scripts/last/Buildconf/camconf/atm_in file2 /project/sygu/tckm/cesmdata/cesm2.1.1/last/run/atm_in

   Calling /project/sygu/tckm/cesm2.1.1/components/clm//cime_config/buildnml

   Calling /project/sygu/tckm/cesm2.1.1/components/cice//cime_config/buildnml

   Calling /project/sygu/tckm/cesm2.1.1/cime/src/components/data_comps/docn/cime_config/buildnml

   Calling /project/sygu/tckm/cesm2.1.1/components/rtm//cime_config/buildnml

   Calling /project/sygu/tckm/cesm2.1.1/cime/src/components/stub_comps/sglc/cime_config/buildnml

   Calling /project/sygu/tckm/cesm2.1.1/cime/src/components/stub_comps/swav/cime_config/buildnml

   Calling /project/sygu/tckm/cesm2.1.1/cime/src/components/stub_comps/sesp/cime_config/buildnml

   Calling /project/sygu/tckm/cesm2.1.1/cime/src/drivers/mct/cime_config/buildnml

   NOTE: ignoring setting of rof2ocn_liq_rmapname=idmap in seq_maps.rc

   NOTE: ignoring setting of rof2ocn_ice_rmapname=idmap in seq_maps.rc

Finished creating component namelists

-------------------------------------------------------------------------

- Prestage required restarts into /project/sygu/tckm/cesmdata/cesm2.1.1/last/run

 - Case input data directory (DIN_LOC_ROOT) is /project/sygu/tckm/cesmdata/inputdata

 - Checking for required input datasets in DIN_LOC_ROOT

-------------------------------------------------------------------------

2019-09-12 00:27:18 MODEL EXECUTION BEGINS HERE

run command is srun -n 16 /project/sygu/tckm/cesmdata/cesm2.1.1/last/bld/cesm.exe  >> cesm.log.$LID 2>&1

ERROR: RUN FAIL: Command 'srun -n 16 /project/sygu/tckm/cesmdata/cesm2.1.1/last/bld/cesm.exe  >> cesm.log.$LID 2>&1 ' failed

 

See log file for details: /project/sygu/tckm/cesmdata/cesm2.1.1/last/run/cesm.log.4984658.190912-002715

Log in or register to post comments

Who's new

  • jwolff
  • tinna.gunnarsdo...
  • sarthak2235@...
  • eolivares@...
  • shubham.gandhi@...