Several errors when running


Update: I built the CAM and POP runs after loading python/2.7.14

Now I am trying to add the meta data and am getting a file not found error. Is the documentation on Python Tools | Community Earth System Model outdated? Their isn't even a "run" directory in the case directory that was created.

This is the command I am using to try and add meta data
$ ./ --caseroot /blue/gerber/cdevaneprugh/earth_model_output/cime_output_root/ect_runs/case.cesm_tag.uf.000 --histfile /blue/gerber/cdevaneprugh/earth_model_output/cime_output_root/ect_runs/case.cesm_tag.uf.000/run/


If there is no run directory under that case directory go to the case and do
./xmlquery RUNDIR
that should provide the path to the run directory.


Okay I found the run directory and .nc files. Using the same command as above I tried adding metadata to the files but got the following errors.

The first was an error with mv stating that /blue/gerber/cdevaneprugh/earth_model_output/cime_output_root/ect_runs/case.cesm_tag.uf.000/run/ did not exist

I fixed this by creating an empty file with that name.

The other error I am getting seems to be related to ncks saying:
ncks: unrecognized option '--glb'

We have nco versions 4.2.1 and 4.4.3 on our machine.

Do I need to update nco to a newer version?


I created a conda environment with python 3.8 and a newer version of nco (5.2.2) and was able to add meta data without an issue. However when I uploaded the .nc files to be verified I got an error telling me my version of cesm is not supported (see the screenshot). I don't understand what went wrong as I am using cesm 2.1.5 which it says on the website is supported. Did I upload the wrong files or add the meta data incorrectly? Has anyone else ran into this?



I think that there is a problem on our end, I'll report the issue and hopefully have it solved soon.


Thanks Jim.

While the three UF-CAM-ECT tests all appear to have built and run successfully, I noticed that the POP-ECT test did not. I got a notification from our scheduler that it timed out after 2 hours. Additionally, when I look at the CaseStatus file in $CIME_OUTPUT_ROOT/popcase.cesm_tag.000/ it shows that and case.submit was successful, and that the model execution started but did not finish. What error logs should I be looking at to diagnose this?


I would first try to determine if it just needs more time or if it was deadlocked. Usually you can gather this from the timestamps on the log files, if the
logs other than cesm.log are much older then it was deadlocked, if all of the logs are within a minute or two of the timeout time, then you probably just need to increase the wallclock time.


All the logs were last modified within seconds of the timeout time reported by our scheduler so I'll increase the wallclock time.

Considering that the case was built successfully and it just timed out, would something like the following be fine?
$ ./xmlchange JOB_WALLCLOCK_TIME=06:00:00
$ ./case.submit

Additionally, I poked around the the cesm log in /popcase.cesm_tag.000/run and saw many lines towards the end of the file saying:

NetCDF: Invalid dimension ID or name
NetCDF: Variable not found
NETCDF: Attribute not found

Does this mean my netcdf linking or install is still not successful?


Yes that wallclock should do it. NetCDF prints messages like this when you inquire for a variable, dimension or attribute in a file. It
may be that the model is just inquiring about optional variables and these messages can be ignored.


I was able to successfully upload and test the files but am unsure of how to interpret the results and where to go from here. This is the output:

CESM Version Tested: CESM 2.1.5
Metadata retrieved from:

PCA Test Results

Summary: 1 PC scores failed at least 2 runs: [8]

These runs PASSED according to our testing criterion.
PC 2: failed 1 runs [3]
PC 5: failed 1 runs [3]
PC 8: failed 2 runs [2, 3]
PC 30: failed 1 runs [1]
PC 43: failed 1 runs [3]
PC 47: failed 1 runs [1]

Run 1: 2 PC scores failed [30, 47]
Run 2: 1 PC scores failed [8]
Run 3: 4 PC scores failed [2, 5, 8, 43]

Testing complete.


These runs PASSED according to our testing criterion. There were a couple of principle components out of spec in each run but the same PC is not out of spec in all three runs and the number out of spec is within the tolerance of the test. I'm not sure what the difficulty in interpreting the results could be?


Thank you for the clarification. It's not clear from the results that the principle component failures were within an acceptable tolerance. I wasn't sure if I needed to get the tests to pass with zero PC failures.


Hey Jim, is there a way to download the summary files and validate my ECT runs locally? I am having an issue with the CESM website timing out before anything can validate now.


Not easily - Do you have a poor network connection or does the timeout error appear to be on our end?


I think the time out error is on your end. I've tried doing this from several reliable internet connections and the results are always the same. The files upload just fine, but when verifying it times out.


Have you tried today - the systems people worked on it yesterday and think that they've solved the issue.