Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM 2.2 runtime error

I have an otherwise running version of CESM 2.2 that is encountering a runtime error when running a simple case

./create_newcase --case /home/Sean.M.Davis/CESM-cases/cesm2.2/f2000climobase1deg --compset F2000climo --mach hera --res f09_f09_mg17

This is a snippet of the error from the cesm.log file

157 h9c31
158 h9c31
159 h9c31
Opened existing file
/scratch1/BMC/chimera/CESM-central/inputdata/atm/cam/inic/fv/cami-mam3_0000-01-
01_0.9x1.25_L32_c141031.nc 0
Opened existing file
/scratch1/BMC/chimera/CESM-central/inputdata/atm/cam/topo/fv_0.9x1.25_nc3000_Ns
w042_Nrs008_Co060_Fi001_ZR_sgh30_24km_GRNL_c170103.nc 1
pio_support::pio_die:: myrank= -1 : ERROR: ionf_mod.F90: 235 :
Unknown error in file operation
pio_support::pio_die:: myrank= -1 : ERROR: ionf_mod.F90: 235 :
Unknown error in file operation
pio_support::pio_die:: myrank= -1 : ERROR: ionf_mod.F90: 235 :
Unknown error in file operation
pio_support::pio_die:: myrank= -1 : ERROR: ionf_mod.F90: 235 :
Unknown error in file operation
Image PC Routine Line Source
cesm.exe 0000000002AE4A76 Unknown Unknown Unknown
cesm.exe 00000000028D9241 pio_support_mp_pi 118 pio_support.F90
cesm.exe 00000000028D7305 pio_utils_mp_chec 59 pio_utils.F90
cesm.exe 00000000029DCE30 ionf_mod_mp_open_ 235 ionf_mod.F90
cesm.exe 00000000028C8760 piolib_mod_mp_pio 2831 piolib_mod.F90
cesm.exe 000000000053DEAB cam_pio_utils_mp_ 1135 cam_pio_utils.F90
cesm.exe 0000000000711BD1 prescribed_strata 190 prescribed_strataero.F90
cesm.exe 00000000006E6739 physpkg_mp_phys_r 270 physpkg.F90
cesm.exe 00000000004F3680 cam_comp_mp_cam_i 173 cam_comp.F90
cesm.exe 00000000004E9BFC atm_comp_mct_mp_a 253 atm_comp_mct.F90
cesm.exe 0000000000436BB6 component_mod_mp_ 257 component_mod.F90
cesm.exe 00000000004259E1 cime_comp_mod_mp_ 1347 cime_comp_mod.F90
cesm.exe 0000000000433C59 MAIN__ 122 cime_driver.F90
cesm.exe 0000000000414E5E Unknown Unknown Unknown
libc-2.17.so 00002B737DB83555 __libc_start_main Unknown Unknown
cesm.exe 0000000000414D69 Unknown Unknown Unknown
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
slurmstepd: error: *** STEP 38360441.0 ON h9c18 CANCELLED AT 2022-11-29T16:52:50 ***

Various xml and log files are attached.
 

Attachments

  • describe_version.txt
    7.8 KB · Views: 1
  • config_batch.xml.txt
    25.3 KB · Views: 0
  • config_compilers.xml.txt
    43.9 KB · Views: 2
  • config_machines.xml.txt
    7 KB · Views: 3
  • atm.log.38360441.221129-165242.txt
    25.9 KB · Views: 1
  • cesm.log.38360441.221129-165242.txt
    16.1 KB · Views: 3
  • cpl.log.38360441.221129-165242.txt
    44.8 KB · Views: 0

jedwards

CSEG and Liaisons
Staff member
Check the md5sum of that file, if it doesn't match this remove it and redownload.
md5sum /glade/p/cesmdata/cseg/inputdata/atm/cam/inic/fv/cami-mam3_0000-01-01_0.9x1.25_L32_c141031.nc
2041a4f583648705c0aab2ae12a0065d /glade/p/cesmdata/cseg/inputdata/atm/cam/inic/fv/cami-mam3_0000-01-01_0.9x1.25_L32_c141031.nc
 
md5sum /scratch1/BMC/chimera/CESM-central/inputdata/atm/cam/inic/fv/cami-mam3_0000-01-01_0.9x1.25_L32_c141031.nc
2041a4f583648705c0aab2ae12a0065d /scratch1/BMC/chimera/CESM-central/inputdata/atm/cam/inic/fv/cami-mam3_0000-01-01_0.9x1.25_L32_c141031.nc

It looks like this file is fine. Here is the md5sum of the other netcdf file listed there.

md5sum /scratch1/BMC/chimera/CESM-central/inputdata/atm/cam/topo/fv_0.9x1.25_nc3000_Nsw042_Nrs008_Co060_Fi001_ZR_sgh30_24km_GRNL_c170103.nc
7e3c697b63118c4e7a0ba951a45a6308 /scratch1/BMC/chimera/CESM-central/inputdata/atm/cam/topo/fv_0.9x1.25_nc3000_Nsw042_Nrs008_Co060_Fi001_ZR_sgh30_24km_GRNL_c170103.nc
 
This seems to be something related to my netcdf paths/libraries. In config_compilers.xml I changed from
<NETCDF_PATH>/apps/netcdf/4.7.0/intel/18.0.5.274/</NETCDF_PATH>
to
<NETCDF_PATH>$(NETCDF)</NETCDF_PATH>

and now I get what looks like a slightly different error

Opened existing file
/scratch1/BMC/chimera/CESM-central/inputdata/atm/cam/inic/fv/cami-mam3_0000-01-
01_0.9x1.25_L32_c141031.nc 0
Opened existing file
/scratch1/BMC/chimera/CESM-central/inputdata/atm/cam/topo/fv_0.9x1.25_nc3000_Ns
w042_Nrs008_Co060_Fi001_ZR_sgh30_24km_GRNL_c170103.nc 1
NetCDF: Not a valid ID
NetCDF: Not a valid ID
NetCDF: Not a valid ID
pio_support::pio_die:: myrank= -1 : ERROR: ionf_mod.F90: 235 :
NetCDF: Not a valid ID
NetCDF: Not a valid ID
Image PC Routine Line Source
cesm.exe 0000000002AE9556 Unknown Unknown Unknown
cesm.exe 00000000028D9401 pio_support_mp_pi 118 pio_support.F90
 

Attachments

  • cpl.log.38363634.221129-173757.txt
    44.8 KB · Views: 1
  • cesm.log.38363634.221129-173757.txt
    16.4 KB · Views: 5
  • atm.log.38363634.221129-173757.txt
    25.9 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
Can you do an ncdump -h on that file? Is the filesystem mounted on the compute nodes that you are using?
Can you try with pnetcdf?
 
ncdump -h works for both files. Yes, that file system is mounted on the compute nodes. How would I try with pnetcdf? That module is included in my config_machines.xml, i.e.

<modules mpilib="impi">
<command name="load">impi/2018.0.4</command>
<command name="load">netcdf-hdf5parallel/4.7.0</command>
<command name="load">pnetcdf/1.7.0</command>

ncdump output is attached
 

Attachments

  • ncdump-output.txt
    11.4 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
I think that your version of pnetcdf may be too old. Check the pio.bldlog to see if it was used.
If you can use ncdump to read the file and that ncdump is from the same netcdf module you are using
in the model then I don't know what the problem might be.
 
I figured out the problem. We had a file system corruption a while back and some of the inputdata files were corrupted. The problem file was in the atm.log though, not cesm.log where I was looking.
 
BTW, I'm not really sure how to interpret the piobldlog but here it is - is this helpful in knowing if pnetcdf is too old? Thanks!
 

Attachments

  • pio.bldlog.221201-222100.txt
    77.2 KB · Views: 5

tresamt

Tresa Mary
Member
Hi,
I also ran into the same error.
./create_newcase --case /groups/carnegie_poc/Clab_CESM/cases/Test3 --compset F2000climo --res f09_f09_mg17 --machine CaltechHPC
I am attaching the log files. Do you think it is pnetcdf issue?
Thankyou in advance.
 

Attachments

  • pio.bldlog.230711-144930.gz
    4.7 KB · Views: 2

tresamt

Tresa Mary
Member
Hi,
I also ran into the same error.
./create_newcase --case /groups/carnegie_poc/Clab_CESM/cases/Test3 --compset F2000climo --res f09_f09_mg17 --machine CaltechHPC
I am attaching the log files. Do you think it is pnetcdf issue?
Thankyou in advance.
 

Attachments

  • atm.log.35675943.230712-054004.txt
    25.5 KB · Views: 0
  • cpl.log.35675943.230712-054004.txt
    44.6 KB · Views: 0

tresamt

Tresa Mary
Member
I am attaching the CESM log file also.
I remember I had a similar error while using CESM 2.1.3 and changing the grid resolution to f19_f19 solved the issue. Do you think the grid resolutions might be an issue here?
Thank you
Tresa
 

Attachments

  • cesm.log.35675943.230712-054004.txt
    10.3 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
What is the format of file
/central/groups/carnegie_poc/Clab_CESM/inputdata/atm/cam/topo/fv_0.9x1.25_nc300
0_Nsw042_Nrs008_Co060_Fi001_ZR_sgh30_24km_GRNL_c170103.nc?

use ncdump -k filename to get the format.
 

tresamt

Tresa Mary
Member
Hi Jedwards,
I reinstalled NETCDF packages and tried to create the same case. The following error happened while trying to download the input files:

'Model cam missing file mode_defs for ncl_a1 = '/central/groups/carnegie_poc/CESM/inputdata/atm/cam/physprops/ssam_rrtmg_c100508.nc'
Trying to download file: 'atm/cam/physprops/ssam_rrtmg_c100508.nc' to path '/central/groups/carnegie_poc/CESM/inputdata/atm/cam/physprops/ssam_rrtmg_c100508.nc' using GridFTP protocol.
FAIL: GridFTP repo 'ftp://gridanon.cgd.ucar.edu:2811/cesm/inputdata/' does not have file 'atm/cam/physprops/ssam_rrtmg_c100508.nc' error=error: globus_ftp_client: the server responded with an error
500 Command failed. : Path not allowed.'

Proxy files are getting downloaded which cannot be read and have a different hd5sum too.

What do you think might be the issue?
Thankyou
Tresa
 

jedwards

CSEG and Liaisons
Staff member
It looks like the gridftp server is down, please remove that entry from config_inputdata.xml and try again.
 

tresamt

Tresa Mary
Member
Thankyou Jedwards.
Just to confirm, you are asking me to remove the following lines from the attached file?
' <server>
<comment>grid ftp requires the globus-url-copy tool on the client side </comment>
<protocol>gftp</protocol>
<address>ftp://gridanon.cgd.ucar.edu:2811/cesm/inputdata/</address>
<checksum>../inputdata_checksum.dat</checksum>
</server>'

or shall I rearrange the svn command to the top?
' <server>
<protocol>svn</protocol>
<address>https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata</address>
</server>'
 

Attachments

  • config_inputdata.txt
    1.5 KB · Views: 0
Top