Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

./case.submit errors with vanilla container

jrvb

Rob von Behren
New Member
Greetings!

I'm trying to use the vanilla escomp/cesm-2.2:latest container for some simple tests, but my commands crash out shortly after running ./case.submit. Here are the last few lines from the cesm.log file:

Code:
TASK#  NAME
  0  42b3f74e3a4d
  1  42b3f74e3a4d
  2  42b3f74e3a4d
  3  42b3f74e3a4d
WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
WARNING: Rearr optional argument is a pio2 feature, ignored in pio1
pio_support::pio_die:: myrank=          -1 : ERROR: ionf_mod.F90:         235 : Unknown error in file operation
pio_support::pio_die:: myrank=          -1 : ERROR: ionf_mod.F90:         235 : Unknown error in file operation
pio_support::pio_die:: myrank=          -1 : ERROR: ionf_mod.F90:         235 : Unknown error in file operation
pio_support::pio_die:: myrank=          -1 : ERROR: ionf_mod.F90:         235 : Unknown error in file operation
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3


Here's how I set things up:

Code:
host$ docker pull escomp/cesm-2.2:latest
host$ docker run -it -v /tmp:/host_tmp escomp/cesm-2.2:latest /bin/bash
[user@cesm2.2 ~]$  create_newcase --case test1 --compset F2000climo --res f19_g17 --run-unsupported
[user@cesm2.2 ~]$  cd test1
[user@cesm2.2 test1]$  ./xmlchange DIN_LOC_ROOT=/host_tmp/cesm_inputdata
[user@cesm2.2 test1]$  ./xmlchange NTASKS=4
[user@cesm2.2 test1]$  ./case.setup && ./case.build && ./case.submit

Note these are just the vanilla instructions from the container readme with the addition of shunting the input data to a location outside the container.

My best guess is that there is something weird interaction between the pio library and some software on the host OS. I've found the same error using both the container optimized OS in Google could and on a fairly standard Debian distribution.

Anyone have suggestions or pointers on where to look to debug this further? I've attached the log files from the case.submit, in case that helps.

Thanks,

-Rob
 

Attachments

  • cesm-container-errors.tar.gz
    14.1 KB · Views: 3

Vru

Vru
New Member
I had a look at escomp/cesm-2.2:latest, and this Docker container is configured to use 256 CPUs, take inputdata from /home/user/inputdata and archive the outputs in /home/user/archive

Therefore I suggest that you create these folders on your machine (for instance in $HOME) and bind them to the corresponding folders in the container

Also, use the number of cores you have on your machine, since it is unlikely that it has 256 (default in the container)

As for the compset and resolution you tried, these are quite demanding in terms of memory and to start with a simplified compset like FKESSLER and a coarse resolution (like T31_g37) would be easier to set up and quicker to run

As a summary:

cd $HOME
mkdir cases inputdata archive

docker run -it -v /tmp:/host_tmp -v $HOME/cases:/home/user/cases -v $HOME/inputdata:/home/user/inputdata -v $HOME/archive:/home/user/archive escomp/cesm-2.2:latest /bin/bash

create_newcase --case /home/user/cases/fkessler --compset FKESSLER --res T31_g37 --run-unsupported

cd /home/user/cases/fkessler
./xmlchange --file env_mach_pes.xml --id NTASKS --val 8 (<- this is if you have 8 cores on your laptop)
./xmlchange STOP_N=1 (this is to run the model for 1 day)
./case.setup
./case.build
./case.submit
 

jrvb

Rob von Behren
New Member
Hi Vru -

Thanks a bunch for the suggestions! The inputdata mapping I had previously set up worked fine, and I had set NTASKS to 4, so that shouldn't have been an issue. Switching to FKESSLER I was able to get things to run without errors, however, so that's a step in the right direction. :)

Would the lower resolution with my F2000climo attempt cause "Unknown error in file operation"? I tried doing a case with --compset F2000climo --res T31_g37 but that failed during case.build with

Code:
Generating component namelists as part of build
Creating component namelists
  2021-01-13 22:41:11 atm
   Calling /opt/ncar/cesm2/components/cam//cime_config/buildnml
     ...calling cam buildcpp to set build time options
ERROR: Command /opt/ncar/cesm2/components/cam/bld/build-namelist -ntasks 8 -csmdata /host_tmp/cesm_inputdata -infile /home/user/test2/Buildconf/camconf/namelist -start_ymd 00010101 -ignore_ic_year -use_case 2000_cam6 -inputdata /home/user/test2/Buildconf/cam.input_data_list -namelist " &atmexp /"  failed rc=255
out=CAM build-namelist - ERROR: No default value found for ncdata
user defined attributes:
key=ic_md  val=00010101
err=Smartmatch is experimental at /opt/ncar/cesm2/components/cam/bld/perl5lib/Build/ChemNamelist.pm line 274.
Died at /opt/ncar/cesm2/components/cam/bld/build-namelist line 3969.

.... so I take it the compset and the resolution need to match in some way?

-Rob
 

Vru

Vru
New Member
Inputdata has to be available for the compset and resolution you try to use, all combinations are not possible
 
Top