Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Issue with Initialize component atm

xidazhi

FDX
New Member
What version of the code are you using?
2.1.5


Have you made any changes to files in the source tree?
I ported the model to our university's platform by edition config_machine.xml config_compilers.xml, and config_batch.xml files


Describe every step you took leading up to the problem:
to create the case I did ./create_newcase --case /lustre1/g/esd_dxi/dazhixi/models/IntelCESMfit/models/CESM215/cases/b1850.basics --compset B1850 --res f19_g17 --mach hku
then I set up the case by ./case.setup
then I did ./case.build
then I write a slurm file to submit the job
the job seems running as I can see the queue and it is not killed, and it has started all the mpi processors I required, but the run stuck which will be described later.

If this is a port to a new machine: Please attach any files you added or changed for the machine port (e.g., config_compilers.xml, config_machines.xml, and config_batch.xml) and tell us the compiler version you are using on this machine.
Please attach any log files showing error messages or other useful information.

I use intel and impi to build, they are 2021 version. There are no errors during build, although I have to remove -lnetcdf and only keep -lnetcdff in Macro.make to build the model (otherwise it report ld cannot find -lnetcdf)
The error happens during running and only cpl.log and cesm.log are created by the run. attached.


Describe your problem or question:
The model started to run, but stuck at (in cpl.log):

(seq_mct_drv) : Initialize each component: atm, lnd, rof, ocn, ice, glc, wav, esp
(component_init_cc:mct) : Initialize component atm
There is no more output since then.

Thank you for your help.

Best,
Dazhi
 

Attachments

  • files.zip
    35.2 KB · Views: 1

xidazhi

FDX
New Member
Hello Dazhi,

Could you try rebuilding the model in DEBUG and increasing the debug printout level to see if there is more information about where the run is stuck?

Hi Haipeng,

I increased the debug printout level but the runtime information is the safe. It stucks soon after
(component_init_cc:mct) : Initialize component atm and has no further outputs.
I build the model with self-installed environmental packages (hdf5,netcdf, etc). Do you think this is the issue?
I attached the cpl.log and the building blogs. Thank you very much.

Best,
Dazhi
 

Attachments

  • building_blogs.zip
    63.8 KB · Views: 0
  • cpl.log.1934870.zip
    5.4 KB · Views: 0
Vote Upvote 0 Downvote

hplin

Haipeng Lin
Moderator
Staff member
Thanks for writing. Have you tried ./xmlchange DEBUG=true as well? You will need a clean rebuild (./case.build --clean-all) once DEBUG is changed.

I also saw you mentioned -lnetcdf was removed and you only kept -lnetcdff. I wonder if that could be the issue. netCDF-Fortran was split off from netCDF-C some years ago, but the build process should be able to find both. Perhaps netCDF-Fortran libraries/includes could be put in the same folder as netCDF-C so the compiler can find both, as opposed to simply removing -lnetcdf.
 
Vote Upvote 0 Downvote

xidazhi

FDX
New Member
Hi Haipeng,

I rebuild the libraries and now I can remain both -lnetcdf and -lnetcdff and succesfully build the model. I also set the debug correctly and now I have more debug information: here is what it is in cesm.log. It appears the model stop running after reading regrid_vegetation.nc. I have increased the stack size to 1024M in the config_machine.xml as well as set ulimit -s unlimited in my slurm file, but none of these helps. Please let me know if you know how to fix it. Thanks.

32
Opened existing file
/scr/u/dazhixi/cesm/inputdata/atm/cam/chem/emis/CMIP6_emissions_2000climo/emiss
ions-cmip6_so4_a1_contvolcano_vertical_2000climo_0.9x1.25_c20170724.nc
32
Opened existing file
/scr/u/dazhixi/cesm/inputdata/atm/cam/chem/emis/CMIP6_emissions_2000climo/emiss
ions-cmip6_so4_a2_contvolcano_vertical_2000climo_0.9x1.25_c20170724.nc
33
Opened existing file
/scr/u/dazhixi/cesm/inputdata/atm/cam/chem/emis/CMIP6_emissions_2000climo/emiss
ions-cmip6_so4_a2_contvolcano_vertical_2000climo_0.9x1.25_c20170724.nc
33
Opened existing file
/scr/u/dazhixi/cesm/inputdata/atm/cam/chem/emis/elev/H2O_emission_CH4_oxidation
x2_elev_3Dmonthly_L70_2000climo_c180511.nc 34
Opened existing file
/scr/u/dazhixi/cesm/inputdata/atm/cam/chem/emis/elev/H2O_emission_CH4_oxidation
x2_elev_3Dmonthly_L70_2000climo_c180511.nc 34
Opened existing file
/scr/u/dazhixi/cesm/inputdata/atm/cam/chem/trop_mozart/dvel/regrid_vegetation.n
c 35
forrtl: severe (168): Program Exception - illegal instruction
Image PC Routine Line Source
libpthread-2.28.s 000014D9529E3D10 Unknown Unknown Unknown
libhdf5.so.310.5. 000014D9600E7ABD H5T__init_native_ Unknown Unknown
libhdf5.so.310.5. 000014D96005ABD7 H5T_init Unknown Unknown
libhdf5.so.310.5. 000014D960107980 H5VL_init_phase2 Unknown Unknown
libhdf5.so.310.5. 000014D95FDF7949 H5_init_library Unknown Unknown
libhdf5.so.310.5. 000014D95FFF8A11 H5PLsize Unknown Unknown
libnetcdf.so.22.1 000014D9614CDDD6 NC4_hdf5_plugin_p Unknown Unknown
libnetcdf.so.22.1 000014D961457E4A nc_plugin_path_in Unknown Unknown
libnetcdf.so.22.1 000014D96150A706 NC4_initialize Unknown Unknown
libnetcdf.so.22.1 000014D961423162 nc_initialize Unknown Unknown
libnetcdf.so.22.1 000014D96142819E NC_open Unknown Unknown
libnetcdf.so.22.1 000014D961427E87 nc_open Unknown Unknown
libnetcdff.so.7.2 000014D960F679B2 nf_open_ Unknown Unknown
libnetcdff.so.7.2 000014D960FACAB9 netcdf_mp_nf90_op Unknown Unknown
cesm.exe 0000000003C92748 mo_jlong_mp_get_x 168 mo_jlong.F90
cesm.exe 0000000003C91017 mo_jlong_mp_jlong 91 mo_jlong.F90
cesm.exe 0000000002DBCD51 mo_photo_mp_photo 306 mo_photo.F90
cesm.exe 0000000002BC5F7F mo_chemini_mp_che 215 mo_chemini.F90
cesm.exe 00000000024AD32C chemistry_mp_chem 809 chemistry.F90
cesm.exe 000000000100782A physpkg_mp_phys_i 825 physpkg.F90
cesm.exe 000000000081A388 cam_comp_mp_cam_i 201 cam_comp.F90
cesm.exe 00000000007E59A7 atm_comp_mct_mp_a 209 atm_comp_mct.F90
cesm.exe 000000000046051C component_mod_mp_ 267 component_mod.F90
cesm.exe 00000000004277B7 cime_comp_mod_mp_ 1231 cime_comp_mod.F90
cesm.exe 00000000004566FB MAIN__ 114 cime_driver.F90
cesm.exe 000000000041592D Unknown Unknown Unknown
libc-2.28.so 000014D9526357E5 __libc_start_main Unknown Unknown
cesm.exe 000000000041584E Unknown Unknown Unknown
 
Vote Upvote 0 Downvote

xidazhi

FDX
New Member
Hi Haipeng,

Thanks. I did a make test for hdf5, and it stucks during test_swmr.sh. Attached please find the log. I have to interrupt the test at the end as it stuck for 5 hours. I believe this may have something to do with lustre file system but I don't know. Do we use the swmr feature of hdf5 in cesm? if not I may try build hdf5 without it to avoid problem
 
Vote Upvote 0 Downvote
Top