Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

POP error restarting CESM piControl run

bmduran

Brandon Duran
New Member
Hi,

I am trying to do a simple hybrid run using the CESM2 piControl runs. I picked a random year (1021) from the restart files available at /glade/campaign/collections/cmip/CMIP6/restarts/b.e21.B1850.f09_g17.CMIP6-piControl.001/1001-1201/ . The model is able to build and starting running, but quickly I encounter the following error in the POP log files:

Requested sp_Fe_lim_Cweight_avg_100m
(request_tavg_field) FATAL ERROR: requested field unknown
POP aborting...
FATAL ERROR: requested field unknown


I am unsure if this is related to missing information in the restart files I am using, improper setup of my hybrid case, or a conflicting compset (I'm using the standard B1850 setup). Any suggestions would be appreciated!
Case dir: /glade/work/bduran/test-picontrol-branch-3
run dir: /glade/derecho/scratch/bduran/test-picontrol-branch-3

What version of the code are you using?
cesm2.1.5
./create_newcase --case /glade/work/bduran/test-picontrol-branch-3 --compset B1850 --res f09_g17


Describe every step you took leading up to the problem:
./xmlchange GET_REFCASE=FALSE,RUN_REFCASE=b.e21.B1850.f09_g17.CMIP6-piControl.001,RUN_TYPE=hybrid,RUN_REFDATE=1021-01-01
removed the setting of init_interp_method= 'use_finidat_areas' in env_run.xml as per prior guidance
Prestaged restarts into run directory
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
There is a somewhat complicated procedure for telling POP what diagnostics MARBL will be providing so that POP can add fields to the history files, and it looks like something is going wrong in that step. If you look in /glade/work/bduran/test-picontrol-branch-3/Buildconf/popconf, there are several files involved:

1. gx1v7_tavg_contents is the list of diagnostics that POP will write to files, and it includes all of the MARBL diagnostics (that's good)
2. ecosys_diagnostics is a file that helps differentiate between diagnostics that POP knows how to compute and diagnostics that MARBL will provide; for some reason, the section MARBL-generated diagnostics is empty (that's bad)
3. marbl_diagnostics_operators is just a list of the MARBL-generated diagnostics, and it helps POP define local buffers for storing the MARBL diagnostics. It is also empty, I think because it is generated in the same process as ecosys_diagnostics

So I'm confused about why ecosys_diagnostics is missing MARBL diagnostics that are included in gx1v7_tavg_contents; if you go to your case directory and run ./preview_namelists are their any helpful messages written to the screen? Or do these files get updated?
 

bmduran

Brandon Duran
New Member
Hi Michael,

Thanks for your explanation! That does seem slightly complicated.

After I run ./preview_namelists, it appears that these files get updated. Specifically, there is now a list of diagnostics under the MARBL-generated diagnostics section in ecosys_diagnostics (in prior runs this was empty). Output attached below.

However, marbl_diagnostics_operators remains empty.

I'll try to rebuild and submit the case and see if it generates a new error. Thanks for providing some direction!
 

Attachments

  • ecosys_diagnostics.txt
    29.4 KB · Views: 1

bmduran

Brandon Duran
New Member
Hi Michael,

I continue to hit this error when trying to run a case. In the interim, I have tried to use several different compsets (BHIST, B1850) and several different run types (startup, hybrid, branch) with a variety of different restart files (from the piControl sim, LENS, SMYLE). No matter what combination I've tried, I encounter this POP error.

In some cases, if I only run ./preview_namelists, only gx1v7_tavg_contents contains the correct MARBL components (both ecosys_diagnostics and marbl_diagnostics_operators are empty / do not contain the MARBL diagnostics).
CaseDir: /glade/work/bduran/CESM2_runs/Experiments/CESM2_DerechoTest/

In other cases, ./preview_namelists leaves gx1v7_tavg_contents with the MARBL diagnostics, and the MARBL diagnostics section in ecosys_diagnostics is filled in. However, the marbl_diagnostic_operators remains empty.
CaseDir: /glade/work/bduran/CESM2_runs/Experiments/CESM2_DerechoTest_LENS_try_hybrid_clm_fix/

Do you have any other thoughts? I'm surprised to be running into this issue across so many configurations and with a default release of CESM (2.1.5). Thanks!

-Brandon
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
@bmduran -- sorry for the late reply, I just got back from a week off. Let me see if I can recreate this issue myself, and then I can try to track things down in my own case instead of asking you to try a bunch of different things :)
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
@bmduran What version of python are you using? Can you run python --version and which python and report the output? I created a case out of my default environment (python 3.10.16) and my marbl_diagnostics_operators looks fine:

Code:
$ cd /glade/work/mlevy/codes/CESM/cesm2.1.5/cases/b.e21.B1850.f09_g17.test_MARBL_diags
$ wc -l Buildconf/popconf/*_diag*
 1034 Buildconf/popconf/ecosys_diagnostics
  336 Buildconf/popconf/marbl_diagnostics_list
  338 Buildconf/popconf/marbl_diagnostics_operators
 1708 total

whereas in your case, the operators file is too small:

Code:
$ cd /glade/work/bduran/test-picontrol-branch-3
$ wc -l Buildconf/popconf/*_diag*
 1034 Buildconf/popconf/ecosys_diagnostics
  336 Buildconf/popconf/marbl_diagnostics_list
    2 Buildconf/popconf/marbl_diagnostics_operators
 1372 total

Also, the Buildconf/popconf/marbl_diagnostics_operators file is created by the MARBL_diags_to_tavg.py script, which is called from POP's input_templates/ocn.ecosys.tavg.csh script. So one possibility would be to add an echo statements to that cshell script to see what arguments are being passed to the python script:

Code:
# Add POP-based ecosys diagnostics to tavg_contents
if ( -f $CASEROOT/SourceMods/src.pop/ecosys_diagnostics ) then
  set MARBL_args_filename = "-i $CASEROOT/SourceMods/src.pop/ecosys_diagnostics"
else
  set MARBL_args_filename = "-i $CASEBUILD/popconf/ecosys_diagnostics"
endif
+ echo "TEST DEBUG $MARBL_args $MARBL_args_filename"
$POPROOT/MARBL_scripts/MARBL_diags_to_tavg.py $MARBL_args $MARBL_args_filename

And make sure the inputs are all correct.
 

bmduran

Brandon Duran
New Member
Hi Michael,

Thanks for doing some digging and getting back to me on this!

When I run python --version in my default base environment, I get Python 3.9.18. I set up a new environment with a matching Python version to yours, and still am hitting the same error.

Adding the echo statement yields this, which to my eyes seems fine?:

TEST DEBUG -t /glade/work/bduran/CESM2_runs/Experiments/CESM2_DerechoTest_mlevy5/Buildconf/popconf/ecosys_tavg_contents -d /glade/work/bduran/CESM2_runs/Experiments/CESM2_DerechoTest_mlevy5/Buildconf/popconf/marbl_diagnostics_list -o /glade/work/bduran/CESM2_runs/Experiments/CESM2_DerechoTest_mlevy5/Buildconf/popconf/marbl_diagnostics_operators --low_frequency_stream 5 --medium_frequency_stream 1 --high_frequency_stream 4 --lMARBL_tavg_alt_co2 True -i /glade/work/bduran/CESM2_runs/Experiments/CESM2_DerechoTest_mlevy5/Buildconf/popconf/ecosys_diagnostics

If it's helpful, I've been using /glade/u/home/bduran/LaunchScripts/Tests/CESM2_DerechoTest_LAUNCH.txt to generate these cases.

Interestingly, I was able to run a case (/glade/work/bduran/CESM2_runs/Experiments/CESM2_DerechoTest_mlevy6/) successfully using your CESM code instead of mine. It looks like to me that we are running the same exact version of 2.1.5, and I don't have any changes in my codebase (only the ones to POP that you just suggested above). This is a clean version of the codebase that I only checked out recently for this testing, so I am confused by this. That's as much of the testing I've done on my end up to now and will keep posting to this thread if I make any more progress.

Thanks again for the help!
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
It looks like to me that we are running the same exact version of 2.1.5, and I don't have any changes in my codebase (only the ones to POP that you just suggested above). This is a clean version of the codebase that I only checked out recently for this testing, so I am confused by this. That's as much of the testing I've done on my end up to now and will keep posting to this thread if I make any more progress.

I created a case out of your sandbox /glade/derecho/scratch/mlevy/b.e21.B1850.f09_g17.test_MARBL_diags_bduran, and it looks like I get the full Buildconf/popconf/marbl_diagnostics_operators. Can you try cloning CESM 2.1.5 a second time, and creating a case out of the new sandbox? I don't like not knowing what is going wrong, but if switching to a new copy of the same version of the model works that seems like a good enough work-around... Sorry we can't point to something and say "that's the problem right there!"
 

bmduran

Brandon Duran
New Member
I tried what you suggested (out of the box new cesm2.1.5, create new case) and I still hit the same error! It has been hit or miss whether cases built using your codebase have ended up running or not (the ones that do not hit the same error I keep encountering). Frustratingly, a run may be successful, but then I am unable to reproduce the run despite not changing anything about my environment (see /glade/derecho/scratch/bduran/retry_other_codes_ncarenv2309 and /glade/derecho/scratch/bduran/retry2, which were created identically and in the same environment; the former ran successfully, the latter failed with the above error).

Is there any way I'm able to reproduce your environment to try that element of this puzzle? It seems like we've controlled for the more obvious solutions and still hitting roadblocks.

Thanks for the continued debugging support!
 
Top