Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Error restarting run after initial 4 year startup

I ran WACCM for an initial 4 years to do my spin-up of my F run. I started at year 0 and the model ran successfully over the four years. I went into env_run.xml and changed continue_run to true and changed resubmit to a positive number so it will automatically resubmit each time. When I submitted the run, I got an error immediatly in the cesm log file. It told me this: Opened existing file waccm_1850.cam.r.0004-01-01-00000.nc          11
 Opened existing file waccm_1850.cam.h0.0003-12.nc          12
 pio_support::pio_die:: myrank=          -1 : ERROR: nf_mod.F90:         674 :
 NetCDF: Variable not found
MPT: Global rank 1 is aborting with error code 1.
     Process ID: 43061, Host: r301i0n5, Program: /nobackupp8/akren/waccm_1850/bld/cesm.exe

I looked into what this might be but was not able to find anything wrong with the nc files it is trying to read, they appear to be fine. I also tried to change my start date in env_run.xml to be year 040101 since that is where the model left off, but that didn't help either. If anyone can provide any help as to how to fix this, it would be appreciated. I am not sure what is going wrong.Thanks,
 

mmills

CSEG and Liaisons
Staff member
Andrew,If you can open up read permissions in your run and case directories, I can take a look on Pleiades. Try:chmod -R a+rX /nobackupp8/akren/waccm_1850Make sure to use a capital "X", which is required to open directories. A lower-case "x" will make all of your files executable, so don't do that.Then do the same for your case directory.Or just pop across the hall next time you are in FL0 and I can take a look on your screen.Cheers,Mike
 

mmills

CSEG and Liaisons
Staff member
Sorry, I will need you to open up the parent directories:chmod a+rX /nobackupp8/akrenchmod a+rX ~akrenAlso, where is your case directory?
 
Ok did that. My case directory is ~/waccm_1850I am running this model with a data ocean dataset that I created from a 200-yr B run with interactive ocean. Ethan noticed that my rpointer.ocn was pointing to a .nc file that did not exist. We are trying to see if by changing my rpointer.ocn to point to my ocean file - if that will fix it. Could that be the problem? 
 

mmills

CSEG and Liaisons
Staff member
Andrew,F-compsets (data ocean) should use an ocean grid that is the same as the atmosphere. See here for details. The grid you have used to set up your cases is f19_g16, which should only be used for B-compsets (full ocean). You should create your cases again using f19_f19 as your grid, i.e.:create_newcase -case /home6/akren/waccm_1850 -compset F_1850_WACCM -res f19_f19 -mach pleiades-sanYou should delete the old case, run, and archive directories.Mike
 
Mike,But it ran 4 years successfully. I made my resolution f19_g16 because I created a sstice_ts.nc file from a B compset run. The ocean data file I have is thus a higher resolution file than f19_f19, so I had to use the f19_g16. Do you mean this is causing the error?
 

santos

Member
Hmm. Did you change anything about your history outputs? Once you start a run, you cannot add any fields to the output. If you start a run and realize that you need to change the output, you need to either start over completely, or create a new case that's a branch of the previous one.As for docn, I can't tell off the top of my head whether or not running on g16 is a problem. I remember being told to run f19_f19, but I can't remember if the idea was to get docn to do the interpolation, or that it should be done offline for some reason.
 
Hi Sean,I may have changed something about my history outputs. I actually think I may have found the problem. I ran a test case by not modifying my user_nl_cam, ran it for 3 days, then restarted it and ran for another 3 days. It worked fine. Then I tried to run my original case again with my user_nl_cam specifications. After running for 3 days, I restarted it for another 3 days, and it crashed with the same error about NETCDF: variable not found. I was thinking then that maybe the reason it didn't work is because I didn't have the default variables in my fincl0 that are default in atm_in. So I ran my case again by having my new variables and the default variables in fincl0 and it rain successfully for 3 days after a 3 day startup. Could that have been the reason?
 

santos

Member
Hmm. If you change fincl1 between setting up a case and restarting it, we expect an error. But if you have a smaller fincl1 before the very first run, and you don't change it before restarting, there shouldn't be any problem. If you are doing the latter and still getting a restart error, that is very puzzling.
 
Hi Sean. You are right, there shouldn't be a problem. I just restarted my run after again running it for 4 years startup. It just started running but stopped almost immediately with that same error about NETCDF: variable not found. Do you know how I can troubleshoot this issue? I don't think it is a resolution problem since it ran without errors. For some reason it just won't restart after an initial run. I for sure this time did not change my namelist, so it shouldn't be having a problem with not finding a variable. Do you know what would cause this error? Is there any way to find out which variable it is looking for?
 

santos

Member
I can't think of any mechanism that would cause this, and I doubt that I will be able to figure this out without reproducing the issue myself. Let me see if I can get a similar error...
 

santos

Member
Hmm, this might be the problem:'LCWAT&IC'The "&IC" is a signal that the variable should be written to the initial condition file; it's sort of an internal-use-only sort of thing, which you're not supposed to set with fincl. Did you mean to just put 'LCWAT'?
 

santos

Member
Yes. I think that allowing users to put this in fincl1 might be a bit of a hack, to support developers of new configurations in trying to create new kinds of initial conditions files.But doing this certainly seems to be incompatible with the restart mechanism, and therefore is not a good idea for typical users and/or science runs. In general, you never want to have a variable that ends in "&IC" in any fincl list.(If this seems non-obvious, well, it wasn't obvious to me either. I learned something today!)
 
Hi Sean,Just posting my result to your suggestion to remove the '&IC' in my namelist. I did that and re-ran WACCM for the 4 years to start it up. Then I did the continue run and it has not come up with that error since. Not to say it won't come up again, but for now it looks like it is fixed and was due to that "&IC" in my namelist. Thanks for your help!
 
Top