Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

WACCM-D test

Rgh

New Member
Hi all,
I am going to run a test in a local cluster where we have CESM2.
These are the steps that I follow:
create_newcase --compset FWmadSD --res f09_f09_mg17 --case=~/cases/test
cd ~/cases/test/
./case.setup
./case.build
build step goes well and successfully finished the job.
Then:
./case.submit
but the job crashes after few minutes and I have log files including convergence failure and
NetCDF: Invalid dimension ID or name
NetCDF: Variable not found

I am really new in learning CESM2 and appreciate any help.
In scratch directory I have bld and run. If all goes well where should I find outputs? I do not have any archive directory at the moment.
What about required inputs? If there was any missing input data, build aborts, right?

Thanks in advance
 

mmills

CSEG and Liaisons
Staff member
If you are new to running CESM2, your first step should be to try to run a compset for CAM instead of WACCM. For example, try creating, building, and running a case with the FHIST compset. If that causes you trouble, please post in the Infrastructure forum here, where others can help you with porting:


Have others run CESM2 on your local cluster? If you succeed in running a CAM compset, but still can't run WACCM, please attach the cesm.log and atm.log files from your run directory in a reply on this thread. I will take a look at them and try to give you suggestions.

Also, if you are interested in creating useful names for your CESM cases, you may choose to follow the CESM case naming convention as described here:


In this case, your test case above might be called f.e21.FWmadSD.f09_f09_mg17.001, or something similar.
 

Rgh

New Member
If you are new to running CESM2, your first step should be to try to run a compset for CAM instead of WACCM. For example, try creating, building, and running a case with the FHIST compset. If that causes you trouble, please post in the Infrastructure forum here, where others can help you with porting:


Have others run CESM2 on your local cluster? If you succeed in running a CAM compset, but still can't run WACCM, please attach the cesm.log and atm.log files from your run directory in a reply on this thread. I will take a look at them and try to give you suggestions.

Also, if you are interested in creating useful names for your CESM cases, you may choose to follow the CESM case naming convention as described here:


In this case, your test case above might be called f.e21.FWmadSD.f09_f09_mg17.001, or something similar.

Thank you so much for reply.
No one has tried CESM2 in this cluster.
Now I am trying to run for CAM and see what happens.
What is the expected output if it goes well?
 

mmills

CSEG and Liaisons
Staff member
For guidance, see this quickstart:


To check that a run completed successfully, check the last several lines of the cpl.log file for the string “SUCCESSFUL TERMINATION OF CPL7-cesm”.

If the short-term archiver ran after the model successfully completed, the log files will be in the short-term archive, which is defined by $DOUT_S_ROOT in your env_run.xml.
  • $DOUT_S_ROOT refers to the short term archive path location on local disk. This path is used by the case.st_archive script when $DOUT_S = TRUE. See CESM Model Output File Locations for details regarding the component model output filenames and locations.
    $DOUT_S_ROOT/$CASE is the short term archive directory for this case. If $DOUT_S is FALSE, then no archive directory should exist. If $DOUT_S is TRUE, then log, history, and restart files should have been copied into a directory tree here.
  • $DOUT_S_ROOT/$CASE/logs
    The log files should have been copied into this directory if the run completed successfully and the short-term archiver is turned on with $DOUT_S = TRUE. Otherwise, the log files are in the $RUNDIR.
 

Rgh

New Member
For guidance, see this quickstart:


To check that a run completed successfully, check the last several lines of the cpl.log file for the string “SUCCESSFUL TERMINATION OF CPL7-cesm”.

If the short-term archiver ran after the model successfully completed, the log files will be in the short-term archive, which is defined by $DOUT_S_ROOT in your env_run.xml.
  • $DOUT_S_ROOT refers to the short term archive path location on local disk. This path is used by the case.st_archive script when $DOUT_S = TRUE. See CESM Model Output File Locations for details regarding the component model output filenames and locations.
    $DOUT_S_ROOT/$CASE is the short term archive directory for this case. If $DOUT_S is FALSE, then no archive directory should exist. If $DOUT_S is TRUE, then log, history, and restart files should have been copied into a directory tree here.
  • $DOUT_S_ROOT/$CASE/logs
    The log files should have been copied into this directory if the run completed successfully and the short-term archiver is turned on with $DOUT_S = TRUE. Otherwise, the log files are in the $RUNDIR.
I succeed in running a CAM compset but still have trouble with WACCM.
atm.log is attached. I could not attach cesm.log due to the large size but here is the dropbox link:

Thanks
 

Attachments

  • atm.log.txt
    917 KB · Views: 4

cmcully

Christopher Michael Cully
New Member
Hi. I'm another user on the same cluster as the original poster. It looks like our problem is a missing MERRA2 input file:
.../inputdata/atm/cam/met/MERRA2/0.9x1.25/2005/MERRA2_0.9x1.25_20050105.nc
The files for dates 20050101 through 20050104 are automatically downloaded from:
ftp://ftp.cgd.ucar.edu/cesm/inputdata/atm/cam/met/MERRA2/0.9x1.25/2005
but those 4 days (plus 19800101) look like the only ones available from that server.

If I shorten the length of the run to only include the 4 available days, the run completes successfully, with a "SUCCESSFUL TERMINATION OF CPL7-cesm" in the cpl.log file. Is there somewhere else we could find the appropriate MERRA2 input files?
 

mmills

CSEG and Liaisons
Staff member
NCAR does not distribute MERRA2 met fields. They are available for download from NASA. If you need more info on how to obtain the MERRA2 files please contact Simone Tilmes (tilmes@ucar.edu).
 

mmills

CSEG and Liaisons
Staff member
Please see 6. Input Datasets — camdoc documentation

6.5. Meteorological data sets
For specified dynamics model simulations, meteorolocial analysis from the Goddard Earth Observing System Model, Version 5 (GEOS5) and the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) data have been prepared to run CESM and WRF simulations and are available in 3 resolutions, and are availbe on the Research Data Archive:

GEOS5 2005-present (currently only 1.9x2.5 degree horizontal resolution):CISL RDA: GEOS5 Global Atmosphere Forcing Data

and

MERRA2 1980-close to present (1.9x2.5, 0.9x1.25, and 0.5x0.63 degrees horizontal resolution): CISL RDA: MERRA2 Global Atmosphere Forcing Data

These datasets and additional resolutions for GEOS5, MERRA, and MERRA2 can be found on repository, and on HSI.
 

cmcully

Christopher Michael Cully
New Member
With your help, we've made good progress in validating our machine setup for WACCM. Thank you.

There's one thing that emerged from the testing so far that concerns me, and I'd like to ask your advice. I've been running a series of tests using the create_test script, including (among others):
SMS_D_Ln9.f19_f19_mg16.FW4madSD​
SMS_D_Ln9.f19_f19_mg17.FWmadHIST​
ERP_Ld3.f09_f09_mg17.FWmadHIST​
SMS_Ld5.f09_f09_mg17.FWmadSD​
Our machine passed all of the tests, but those involving the FWmadHIST and FWmadSD compsets had a very large number of messages (hundreds of thousands) like the following in the cesm.log file:
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 1701 1 0​
The FW4 compsets (e.g. FW4madSD) did not cause this warning. I've attached the cesm and atm log files here (I abridged the cesm.log file due to the large number of similar warnings).

Just to be clear, I'm running the tests straight out of the box:
cd /global/software/src/cesm/cesm211/cime/scripts
./create_test --machine arcgnu730openmpi402opa --compiler gnu --wait --queue apophis-bf ERP_Ld3.f19_f19_mg16.FW4madHIST

Is this expected behavior? Should I be concerned about using the FWmad compsets on my machine?

Thanks,
Chris Cully
 

Attachments

  • cesm.log.5028130.200309-224019_abridged.txt
    80.5 KB · Views: 6
  • atm.log.5028130.200309-224019.gz
    122.6 KB · Views: 2

mmills

CSEG and Liaisons
Staff member
I looked at your atm log, and I don't think you need to be concerned. That message is common, and it just indicates some instability in the chemistry. In your case, these all appear before the first model time step is complete, indicating that the chemistry is stable by the end of the first time step. It is typical for the chemistry to be unstable at the start of a new run when you are porting to a new machine. In some cases, these messages can continue for days or longer, but that does not appear to be the case here. In general, these messages can be ignored if the model does not crash.
 
If you are new to running CESM2, your first step should be to try to run a compset for CAM instead of WACCM. For example, try creating, building, and running a case with the FHIST compset. If that causes you trouble, please post in the Infrastructure forum here, where others can help you with porting:


Have others run CESM2 on your local cluster? If you succeed in running a CAM compset, but still can't run WACCM, please attach the cesm.log and atm.log files from your run directory in a reply on this thread. I will take a look at them and try to give you suggestions.

Also, if you are interested in creating useful names for your CESM cases, you may choose to follow the CESM case naming convention as described here:


In this case, your test case above might be called f.e21.FWmadSD.f09_f09_mg17.001, or something similar.

Hello, I failed to run the default case of WACCM-SD (1980-01-03) , FX2000 and FXSD. And I run a default FHIST case (1979-01-01), which still crashed. I have no idea to fix these problems. Could you help me?
Attach files is FHIST logs.

Thanks,
Jeff
 

Attachments

  • atm.log.6595.NS1.201002-001751.txt
    101.4 KB · Views: 1
  • cesm.log.6595.NS1.201002-001751.txt
    168 KB · Views: 3
Top