Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Error on CESM start up

bahls@arsc_edu

New Member
We are attempting to run a CESM case (res=T31_T31 / compset=F) and the code is crashing a few seconds after starting with the following error:

14 n65
15 n65
pio_support::pio_die:: myrank= 1 : ERROR: ionf_mod.F90:
211 : Invalid argument

This error is coming from the "atm" code and seems to be netcdf related, but we're unsure what the "Invalid argument" error is telling us.

We were able to track down the netcdf that's being read when this error is produced (using strace) and it is:

${DIN_LOC_ROOT_CSMDATA}/data/atm/cam/inic/gaus/cami_0000-01-01_48x96_L26_c091218.nc

This file appears to have valid header information.

We're using PGI-9.0.4, OpenMPI-1.4.1 and netcdf-3.6.3 (compiled with PGI) using
the "midnight" machine as a template.

If you have any thoughts on how we might resolve this issue, we'd appreciate any input.

Don
 

eaton

CSEG and Liaisons
It may be an MPI related problem. One way to test this is to try a test in a serial run mode. I don't think the CESM scripts are working properly for a serial test, so the easiest way to do this is using the CAM standalone scripts. Try using $cesm_root/models/atm/cam/bld/run-pc.csh as a template script. That script is set up for a pure MPI execution, so to run in a serial mode you need to make the following changes:

Comment out the lines that set ntasks at the top of the script. They would probably make the script fail if executed from an interactive login since they are assuming that the environment variable PBS_NODEFILE is defined.

Replace the configure command by:

$cfgdir/configure $cfg_string -dyn fv -hgrid 48x96 -nosmp -nospmd || echo "configure failed" && exit 1

Note that the hgrid value 48x96 is for the T31 grid (48 lats, 96 lons).

Replace the run command by:

$blddir/cam || echo "CAM run failed" && exit 1

You should be able to run this script from an interactive login shell.
 

eaton

CSEG and Liaisons
I just saw a mistake in my previous post. The configure commandline needs to specify the Eulerian dycore, not FV. So it should read:

$cfgdir/configure $cfg_string -dyn eul -hgrid 48x96 -nosmp -nospmd || echo "configure failed" && exit 1
 
Thanks for the idea.

After modifying the run-pc.csh script, we are still getting the same error:

(GETFIL): attempting to find local file cami_0000-01-01_48x96_L30_
c100426.nc
(GETFIL): using
$WORKDIR/data/atm/cam/inic/gaus/cami_0000-01-01_48x96_L30_c100426.nc
pio_support::pio_die:: myrank= 0 : ERROR: ionf_mod.F90:
212 : Invalid argument

The script for some reason also required the file cami_0000-01-01_48x96_L30_c100426.nc instead of cami_0000-01-01_48x96_L26_c091218.nc.

Any other suggestions??
 
Top