Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Inputdada can’t be read correctly in CESM2.1.4: NetCDF: Variable not found/Invalid dimension ID or name/Attribute not found

wilma wang

Wilma Wang
New Member
Hi everyone, I am still working to solve the issue as shown in the last thread(CESM2.1.4 case_run error: without any hist output nc files).
I find there may be something wrong about the inputdata used in F2000climo because the cesm.log (also attached) says a lot of message as following:

Code:
/home/bozk/Models/inputdata/atm/waccm/lb/LBC_2000climo_CMIP6_0p5degLat_c180227.
 nc      196608
 NetCDF: Variable not found
 NetCDF: Variable not found
 NetCDF: Variable not found
 NetCDF: Variable not found
 NetCDF: Variable not found
 WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
 WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
 WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
 WARNING: Rearr optional argument is a pio2 feature, ignored in pio1

Shoud i update the version of NetCDF on my machine (now NetCDF-4.4.1 and netCDF-F is 4.4.4) and retry it?
Look forward to any useful information.


Best regards,

Wilma W.
 

Attachments

  • cesm.log.21268.qxc-cluster.jinan.com.250719-002459.txt
    66.9 KB · Views: 4

wilma wang

Wilma Wang
New Member
I also find a similar post in the forum as following:

Hi Jesse,

Thank you for for your useful response. I also had similar doubt regarding the libraries (openmpi) it is using within current setup at HPC. Because model seems running well but it is not writing the final restrat pointer and files. I have explained the issue to HPC folks and they are looking into it.

Meanwhile, regarding your other questions.

1.) what happens if you use the CESM2.1.5 code base instead?

- I tried the same set but at the resolution f19_g16 (basically 2 degree), It gives an issue while initilizing the model. I posted the issue on forum and I am waiting for someone to reply, Here it is CLM: NaN Value while running slab ocean (ETEST) compset

Meanwhile, I got some hints to try it on newer branch. So i tried the same setup on cesm2.2.2.

I tried the same experiment setup, but at a f19_g16 resolution on cesm2.2.2 before changing to f09_g16. It asks for the 4 nodes at machine and this runs fine (Successful) without any issue. I am attaching the log file here (cesm222_f19_Ecmp.tar).

2.) what happens if you run an out-of-the-box F-case, like F2000climo?

- I also tried a F compset case at f09_g16 resolution :

./create_newcase --case /scratch/rs9552/cesm2.2.2_T2/F_CAM62k_CLM5BGCCrop_T1 --res f09_g16 --compset 2000_CAM60_CLM50%BGC-CROP_CICE%PRES_DOCN%DOM_MOSART_CISM2%NOEVOLVE_SWAV --machine greene --run-unsupported



It also gave the same issue, I am attaching log from this too.

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
Stack trace terminated abnormally.
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.


Please check.

HPC folks are also looking into this. Any hint would be a great help if you notice something which we are not able to catch.

Thank You.

-Ram

Any comments are welcome!
 
Vote Upvote 0 Downvote

dbailey

CSEG and Liaisons
Staff member
Hi everyone, I am still working to solve the issue as shown in the last thread(CESM2.1.4 case_run error: without any hist output nc files).
I find there may be something wrong about the inputdata used in F2000climo because the cesm.log (also attached) says a lot of message as following:

Code:
/home/bozk/Models/inputdata/atm/waccm/lb/LBC_2000climo_CMIP6_0p5degLat_c180227.
 nc      196608
 NetCDF: Variable not found
 NetCDF: Variable not found
 NetCDF: Variable not found
 NetCDF: Variable not found
 NetCDF: Variable not found
 WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
 WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
 WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
 WARNING: Rearr optional argument is a pio2 feature, ignored in pio1

Shoud i update the version of NetCDF on my machine (now NetCDF-4.4.1 and netCDF-F is 4.4.4) and retry it?
Look forward to any useful information.


Best regards,

Wilma W.

This is telling you that you have PIO version 1 installed on your machine. You will need PIO version 2. Moving this to porting.
 
Vote Upvote 0 Downvote

jedwards

CSEG and Liaisons
Staff member
PIO version 1 is fine for cesm2.1 but you should update to the latest 2.1.5. The NetCDF: Variable not found are normal and
expected messages as the model searches for a configuration. These are not associated with problems in inputdata. I don't
see any error in the cesm.log file you have provided, but it is trying to write not read a file - are you sure that you have adequate disk space?
 
Vote Upvote 0 Downvote

wilma wang

Wilma Wang
New Member
PIO version 1 is fine for cesm2.1 but you should update to the latest 2.1.5. The NetCDF: Variable not found are normal and
expected messages as the model searches for a configuration. These are not associated with problems in inputdata. I don't
see any error in the cesm.log file you have provided, but it is trying to write not read a file - are you sure that you have adequate disk space?
Hi, Jedwards.
I think for this run I had adequate disk space. And it is try to write the rest files for cam (*.cam.r.0001-01-06-00000.nc, size is ~1GB).
Unfortunately, I didn't get the hist output (like *.cam.h0.0001-01-06-00000.nc).
Fot this CESM2.1.4, I have run the X and FSCAM compsets successfully and got the correct hist output data files.

Have you ever meet such situation before? I have stuck in this problem for a week. Any comments are welcome!
 
Vote Upvote 0 Downvote

wilma wang

Wilma Wang
New Member
I already suggested that you should update cesm to the 2.1.5 version - have you done that?
I have updated to the CESM2.1.5 and run F2000climo compset again, I got the same result (no hist output again). Attached is the cesm_run.log on CESM2.1.5. I am super confused right now.
 

Attachments

  • cesm.log.21289.qxc-cluster.jinan.com.250723-182301.txt
    74.8 KB · Views: 1
Vote Upvote 0 Downvote

wilma wang

Wilma Wang
New Member
Hi jedwards,
Today, I followed the guide (1: Control case: F2000climo — CESM Tutorial) and customized the CAM history files as following:
Run for 5 days, with 3-hourly instantaneous output of the variables: TS, PS, Z500, U850, U200, T850, T500, T200, CLDLOW, PRECT, LHFLX, SHFLX, FLNT, FLNS. You are also welcome to output your own variables
Code:
echo "nhtfrq(2) = -3">> user_nl_cam   
echo "mfilt(2) = 240">> user_nl_cam
echo "fincl2 = 'TS:I','PS:I', 'U850:I','T850:I','PRECT:I','LHFLX:I','SHFLX:I','FLNT:I','FLNS:I'">> user_nl_cam
echo "">> user_nl_cam

This probems happened again: I set the walltimemax="03:00:00", but the the log files in run directory stop updating after 1.5 hours, and the PBS work was still running until it was killed due to a 3-hour walltime limitation.
I could see the h1 file in the run directory, but it is empty.
How can i fix this problem? Please help me!
Attached is the cesm.log and atm.log
 

Attachments

  • cesm.log.21327.qxc-cluster.jinan.com.250731-215712.txt
    67.8 KB · Views: 0
  • atm.log.21327.qxc-cluster.jinan.com.250731-215712.txt
    408.7 KB · Views: 1
Vote Upvote 0 Downvote

jedwards

CSEG and Liaisons
Staff member
Does it begin to write an output file or does it fail without creating anything? What is the value of PIO_TYPENAME?
The cesm log suggests a low level system IO error.
 
Vote Upvote 0 Downvote

wilma wang

Wilma Wang
New Member
Does it begin to write an output file or does it fail without creating anything? What is the value of PIO_TYPENAME?
The cesm log suggests a low level system IO error.
Hi Jedwards,

I apologize that I cannot reply to you soon due to the time lag.

It did creat an output file (F2000climo19.cam.h1.0001-01-01-00000.nc) but with empty value (Here is the info. ).
Code:
[bozk@qxc-cluster F2000climo19]$ ncdump -h /home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo19/run/F2000climo19.cam.h1.0001-01-01-00000.nc
netcdf F2000climo19.cam.h1.0001-01-01-00000 {
dimensions:
    lat = 192 ;
    lon = 288 ;
    time = UNLIMITED ; // (0 currently)
    nbnd = 2 ;
    chars = 8 ;
    lev = 32 ;
    ilev = 33 ;
variables:
    double lat(lat) ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_north" ;
    double lon(lon) ;
        lon:long_name = "longitude" ;
        lon:units = "degrees_east" ;
    double gw(lat) ;
        gw:long_name = "latitude weights" ;
    double lev(lev) ;
        lev:long_name = "hybrid level at midpoints (1000*(A+B))" ;
        lev:units = "hPa" ;
        lev:positive = "down" ;
        lev:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
        lev:formula_terms = "a: hyam b: hybm p0: P0 ps: PS" ;
    double hyam(lev) ;
        hyam:long_name = "hybrid A coefficient at layer midpoints" ;
    double hybm(lev) ;
        hybm:long_name = "hybrid B coefficient at layer midpoints" ;
    double P0 ;
        P0:long_name = "reference pressure" ;
        P0:units = "Pa" ;
    double ilev(ilev) ;
        ilev:long_name = "hybrid level at interfaces (1000*(A+B))" ;
        ilev:units = "hPa" ;
        ilev:positive = "down" ;
        ilev:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
        ilev:formula_terms = "a: hyai b: hybi p0: P0 ps: PS" ;
    double hyai(ilev) ;
        hyai:long_name = "hybrid A coefficient at layer interfaces" ;
    double hybi(ilev) ;
        hybi:long_name = "hybrid B coefficient at layer interfaces" ;
    double time(time) ;
        time:long_name = "time" ;
        time:units = "days since 0001-01-01 00:00:00" ;
        time:calendar = "noleap" ;
        time:bounds = "time_bnds" ;
    int date(time) ;
        date:long_name = "current date (YYYYMMDD)" ;
    int datesec(time) ;
        datesec:long_name = "current seconds of current date" ;
    double time_bnds(time, nbnd) ;
        time_bnds:long_name = "time interval endpoints" ;
    char date_written(time, chars) ;
    char time_written(time, chars) ;
    int ndbase ;
        ndbase:long_name = "base day" ;
    int nsbase ;
        nsbase:long_name = "seconds of base day" ;
    int nbdate ;
        nbdate:long_name = "base date (YYYYMMDD)" ;
    int nbsec ;
        nbsec:long_name = "seconds of base date" ;
    int mdt ;
        mdt:long_name = "timestep" ;
        mdt:units = "s" ;
    int ndcur(time) ;
        ndcur:long_name = "current day (from base day)" ;
    int nscur(time) ;
        nscur:long_name = "current seconds of current day" ;
    double co2vmr(time) ;
        co2vmr:long_name = "co2 volume mixing ratio" ;
    double ch4vmr(time) ;
        ch4vmr:long_name = "ch4 volume mixing ratio" ;
    double n2ovmr(time) ;
        n2ovmr:long_name = "n2o volume mixing ratio" ;
    double f11vmr(time) ;
        f11vmr:long_name = "f11 volume mixing ratio" ;
    double f12vmr(time) ;
        f12vmr:long_name = "f12 volume mixing ratio" ;
    double sol_tsi(time) ;
        sol_tsi:long_name = "total solar irradiance" ;
        sol_tsi:units = "W/m2" ;
    int nsteph(time) ;
        nsteph:long_name = "current timestep" ;
    float FLNS(time, lat, lon) ;
        FLNS:Sampling_Sequence = "rad_lwsw" ;
        FLNS:units = "W/m2" ;
        FLNS:long_name = "Net longwave flux at surface" ;
    float FLNT(time, lat, lon) ;
        FLNT:Sampling_Sequence = "rad_lwsw" ;
        FLNT:units = "W/m2" ;
        FLNT:long_name = "Net longwave flux at top of model" ;
    float LHFLX(time, lat, lon) ;
        LHFLX:units = "W/m2" ;
        LHFLX:long_name = "Surface latent heat flux" ;
    float PRECT(time, lat, lon) ;
        PRECT:units = "m/s" ;
        PRECT:long_name = "Total (convective and large-scale) precipitation rate (liq + ice)" ;
    float PS(time, lat, lon) ;
        PS:units = "Pa" ;
        PS:long_name = "Surface pressure" ;
    float SHFLX(time, lat, lon) ;
        SHFLX:units = "W/m2" ;
        SHFLX:long_name = "Surface sensible heat flux" ;
    float T850(time, lat, lon) ;
        T850:units = "K" ;
        T850:long_name = "Temperature at 850 mbar pressure surface" ;
    float TS(time, lat, lon) ;
        TS:units = "K" ;
        TS:long_name = "Surface temperature (radiative)" ;
    float U850(time, lat, lon) ;
        U850:units = "m/s" ;
        U850:long_name = "Zonal wind at 850 mbar pressure surface" ;

// global attributes:
        :Conventions = "CF-1.0" ;
        :source = "CAM" ;
        :case = "F2000climo19" ;
        :logname = "bozk" ;
        :host = "" ;
        :initial_file = "f.e20.FHIST.f09_f09.cesm2_1.001_v2.cam.i.2000-01-01-00000.nc" ;
        :topography_file = "/home/bozk/Models/inputdata/atm/cam/topo/fv_0.9x1.25_nc3000_Nsw042_Nrs008_Co060_Fi001_ZR_sgh30_24km_GRNL_c170103.nc" ;
        :model_doi_url = "https://doi.org/10.5065/D67H1H0V" ;
        :time_period_freq = "hour_3" ;
}

I checked the PIO_TYPENAME, the result is shown as following:
Code:
[bozk@qxc-cluster F2000climo19]$ ./xmlquery PIO_TYPENAME
    PIO_TYPENAME: ['CPL:netcdf', 'ATM:netcdf', 'LND:netcdf', 'ICE:netcdf', 'OCN:netcdf', 'ROF:netcdf', 'GLC:netcdf', 'WAV:netcdf', 'ESP:netcdf']
And I have run the FSCAM compset successfully and gotten the correct output file before.

Any suggestions are welcome! Thanks a lot!

Wilma
 
Vote Upvote 0 Downvote

wilma wang

Wilma Wang
New Member
I just run the same compset using PIO_TYPENAME=pnetcdf based on PIO version 1. It also created an output file (F2000climoP.cam.h1.0001-01-01-00000.nc) but the value of all variables are 0. The PBS work is still running, but the cesm.log files are not updated anymore. Attached are the cesm.log and atm.log.
 

Attachments

  • atm.log.21338.qxc-cluster.jinan.com.250801-174231.txt
    407.2 KB · Views: 0
  • cesm.log.21338.qxc-cluster.jinan.com.250801-174231.txt
    56.8 KB · Views: 0
Vote Upvote 0 Downvote

wilma wang

Wilma Wang
New Member
I also try the same compset using PIO_TYPENAME=pnetcdf based on PIO version 2. There is an error during cesm run:

Error:component_mod:check_fields NaN found in ATM instance: 1 field Sa_z 1d global

then I also try to use PIO_TYPENAME=netcdf based on PIO version 2. There is another error during cesm run:
Abort with message NetCDF: Start+count exceeds dimension bound in file /home/bozk/Models/CESM2/cesm_2.1.5/cime/src/externals/pio2/src/clib/pio_darray_int.c at line 1243
Obtained 10 stack frames.
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x29b0d79]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x29ecc08]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x29e8473]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x29af953]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x1a4f73d]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x1a52260]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x18e6c51]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x18e9e57]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x17e6b19]
/home/bozk/Models/CESM2/cesm_2.1.5/cime/output/F2000climo_pio2_netcdf/bld/cesm.exe() [0x42d600]
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
Hope these infomation could be useful.
 
Vote Upvote 0 Downvote
Top