Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Model execution failed in Derecho _CAM6

dharmendraks841

Dharmendra Kumar Singh
Member
Hi CAM users,
Could you please check and try to resolve the following ERROR
I went through the log file but something might be related to "csgteam".

2024-01-29 21:25:46: case.submit success 2963626.desched1
---------------------------------------------------

2024-01-29 21:26:22: case.run starting 2963625.desched1

---------------------------------------------------

2024-01-29 21:27:21: model execution starting 2963625.desched1

---------------------------------------------------

2024-01-29 21:27:38: model execution error

ERROR: Command: 'mpiexec --label --line-buffer -n 512 /glade/derecho/scratch/dksingh/mbspinup_33/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed with error '' from dir '/glade/derecho/scratch/dksingh/mbspinup_33/run'

---------------------------------------------------

2024-01-29 21:27:38: case.run error

ERROR: RUN FAIL: Command 'mpiexec --label --line-buffer -n 512 /glade/derecho/scratch/dksingh/mbspinup_33/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed

See log file for details: /glade/derecho/scratch/dksingh/mbspinup_33/run/cesm.log.2963625.desched1.240129-212622
 

dharmendraks841

Dharmendra Kumar Singh
Member
The error is as follows:
dec1781.hsn.de.hpc.ucar.edu 190: Abort with message NetCDF: Variable not found in file /glade/derecho/scratch/csgteam/temp/spack/derecho/23.06/builds/spack-stage-parallelio-2.6.0-r2glyxbtqxylkeks6jqmns4s6fn3cmge/spack-src/src/clib/pio_nc.c at line 1164
 

peverley

Courtney Peverley
Moderator
Staff member
Hi looking at your atm.log file, it seems that the issue is there may be one or more variables missing from this file: /glade/derecho/scratch/dksingh/met_back2033_34_35_BB/output_file_date_rename_33//bssp245smbb_hybrid33.cam.h1.20330101.nc

I've never worked with these met files myself so I don't know what could be missing. My first advice would be to turn on debug mode to see if you get any additional information

./xmlchange DEBUG=true
 

dharmendraks841

Dharmendra Kumar Singh
Member
Thanks
Now I change
./xmlchange DEBUG=true
./xmlchange INFO_DBUG=2

The Error is as follows:



2024-02-02 19:45:44: model execution error

ERROR: Command: 'mpiexec --label --line-buffer -n 512 /glade/derecho/scratch/dksingh/mbspinup_33/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed with error '' from dir '/glade/derecho/scratch/dksingh/mbspinup_33/run'

---------------------------------------------------

2024-02-02 19:45:44: case.run error

ERROR: RUN FAIL: Command 'mpiexec --label --line-buffer -n 512 /glade/derecho/scratch/dksingh/mbspinup_33/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed

See log file for details: /glade/derecho/scratch/dksingh/mbspinup_33/run/cesm.log.2996651.desched1.240202-194502

---------------------------------------------------



The additional information in the above log file



dec2370.hsn.de.hpc.ucar.edu 143: MPICH ERROR [Rank 143] [job id e38cdbaf-f3f8-4f7c-b0b8-474aacdb09bb] [Fri Feb 2 19:45:43 2024] [dec2370] - Abort(-1) (rank 143 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 143


dec2373.hsn.de.hpc.ucar.edu 277: Abort with message NetCDF: Variable not found in file /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.0-awmwo6a2ami3vpi3yscykbhbdb4umlr3/spack-src/src/clib/pio_nc.c at line 1164


dec2374.hsn.de.hpc.ucar.edu 396: /var/run/palsd/e38cdbaf-f3f8-4f7c-b0b8-474aacdb09bb/files/cesm.exe() [0xead2ab]


dec2369.hsn.de.hpc.ucar.edu 72: /glade/u/apps/cseg/derecho/23.06/spack/opt/spack/linux-sles15-x86_64_v3/oneapi-2023.0.0/parallelio-2.6.0-awmwo6a2ami3vpi3yscykbhbdb4umlr3/lib/libpioc.so(print_trace+0x36) [0x14c977bb79d6]


dec2370.hsn.de.hpc.ucar.edu 143:


dec2373.hsn.de.hpc.ucar.edu 277: Obtained 10 stack frames.


dec2374.hsn.de.hpc.ucar.edu 396: /var/run/palsd/e38cdbaf-f3f8-4f7c-b0b8-474aacdb09bb/files/cesm.exe() [0xe0713a]


dec2369.hsn.de.hpc.ucar.edu 72: /glade/u/apps/cseg/derecho/23.06/spack/opt/spack/linux-sles15-x86_64_v3/oneapi-2023.0.0/parallelio-2.6.0-awmwo6a2ami3vpi3yscykbhbdb4umlr3/lib/libpioc.so(piodie+0xa6) [0x14c977bb7b06]


dec2370.hsn.de.hpc.ucar.edu 143: aborting job:


dec2373.hsn.de.hpc.ucar.edu 277: /glade/u/apps/cseg/derecho/23.06/spack/opt/spack/linux-sles15-x86_64_v3/oneapi-2023.0.0/parallelio-2.6.0-awmwo6a2ami3vpi3yscykbhbdb4umlr3/lib/libpioc.so(print_trace+0x36) [0x14af9b6519d6]


dec2374.hsn.de.hpc.ucar.edu 396: /var/run/palsd/e38cdbaf-f3f8-4f7c-b0b8-474aacdb09bb/files/cesm.exe() [0xc8b077]


dec2369.hsn.de.hpc.ucar.edu 109: Abort with message NetCDF: Variable not found in file /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.0-awmwo6a2ami3vpi3yscykbhbdb4umlr3/spack-src/src/clib/pio_nc.c at line 1164


dec2370.hsn.de.hpc.ucar.edu 154: Abort with message NetCDF: Variable not found in file /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.0-awmwo6a2ami3vpi3yscykbhbdb4umlr3/spack-src/src/clib/pio_nc.c at line 1164
 

peverley

Courtney Peverley
Moderator
Staff member
Hi,

I'm not an expert on these met files, but I would recommend looking at this file: $CAM/src/dynamics/fv/metdata.F90

For example, I am seeing a conditional that is looking for the variable 'ASDIR' if met_srf_rad = .true. - I am not sure that that variable is present on the file in question.

Courtney
 

dharmendraks841

Dharmendra Kumar Singh
Member
Thank you for bringing this to my attention.

ASDIR was not present in the B compset, located at /glade/campaign/cesm/development/cvcwg/cvwg/b.e21.BSSP245smbb.f09_g17/.
I extracted 6-hourly datasets for the 2035 meteorological background to drive the F compset using the CAM6_03_128 model. Acquiring all the required variables for this specific study of future meteorological background is indeed a significant challenge.
You are correct; met_srf_rad = .true, and the file I utilized for the run lack ASDIR and several other necessary variables.
Please suggest additional steps if possible from your end. I consistently value your substantial assistance, and I have learned from you.

Kind regards
Dharmendra


For additional information,
the user_nl_cam is as follows:
namelist_var = new_namelist_value
&metdata_nl
met_nudge_temp = .true.
met_data_file =
met_data_path =
met_filenames_list =
met_fix_mass = .true.
met_qflx_factor = 1.0
met_rlx_time = 24.
met_srf_land = .false.
met_srf_land_scale = .true.
met_srf_nudge_flux = .false.
met_srf_rad = .true.
met_srf_refs = .true.
met_srf_sst = .true.
met_srf_tau = .true.
/
 
Top