Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

perturbation growth test

I ran a perturbation growth test for the linux cluster I would like to run CAM3.0 on, and the RMS difference quickly jumps up to 10^-2 within 0.1 days. This does not seem good. Is there another test I could run, or does this mean that the model will not give good climate results on this cluster no matter what?

I also did a year-long parallel climate simulation on this linux cluster and compared it to a control run I did on my own linux machine (that seems to be a pretty good match to the NCAR-provided control runs at first glance), and the average temperatures from the cluster run are 10-12 degrees warmer at TOA than in my initial control run. And the differences get larger the longer the model runs (although I only ran it for that one year). Average RH gives a similar situation.

Does this sound hopeless? What can I do to make the climate better when running CAM on the cluster?

Thanks,
Cathy
 

eaton

CSEG and Liaisons
In my experience a jump of the magnitude you report in the perturbation growth test has always been associated with a real problem. I think your 1 year run validates that.

There are a couple of things I'd try. One is to redo the perturbation growth test with debug flags on. If the test is successful it indicates a problem with the compiler optimizations. You can then experiment with reducing the optimization until you hit the level where the test passes.

Another test to try is to make sure that you are getting identical results independent of the number of mpi tasks being used. If this test fails it would indicate that there are problems with the mpi installation.
 
Thanks for the suggestions. I know it is not a problem with the MPI installation, since CAM gives the same bad climate results whether it is run with MPI or not.

I re-did the perturbation growth test with debug flags on, and got a much better difference plot. The difference is only about a half-order of magnitude apart from the perturbation, and towards the end of the second day it jumps up to between 10^-6 and 10^-7. Definitely much better than 10^-2, as I saw before. So this means there is a problem with the compiler optimizations? I am not sure what these are, or how to reduce the optimization. What do I need to do? Tweak the optimizations somehow and run the pergro test again without debug flags on until the test passes?

Thanks,
Cathy
 

eaton

CSEG and Liaisons
The Fortran optimization is controled by the -fopt option to configure. You'll need to look at the documentation for the particular compiler you're using to see how to change the optimization level. The default optimization used by CAM can be found in the Makefile template ($camroot/models/atm/cam/bld/Makefile) in the macro FORTRAN_OPTIMIZATION in the appropriate compiler specific section. For example, in CAM-3.0 the default optimization for pgf90 is -fast. To reduce this optimization to -O2 you would issue the command "configure -fopt -O2"

Yes, I would tweak the optimization flags and rerun the pergro test until it passes. Then it's still necessary to run a full climate simulation for the final validation.
 

pjr

Member
to elaborate on Brian's response.

You dont say which compiler you are using, or which version of CAM3.

I am pretty sure there are known problems with optimization flags and certain versions of the Portland Group Compiler. The default settings of the first version of CAM3 broke versions of the compiler that came out after CAMs first release. In the "bugfix" release we changed the compiler flags to avoid that problem. The short version of this story is the the "-fast" flag (which is shorthad for a bunch of other flags") was redefined at one point. If you are using the original CAM release you should at least check out the build procedure in the bugfix release and incorporate the new compiler flag settings.

Phil
 
We are using the bug fix version of CAM3.0 and PGI compilers version 5.2-1.

In the cam1/models/atm/cam/bld/Makefile, in the linux section, there are the lines:

ifeq ($(DEBUG),TRUE)
# 4/10/04 - bounds checking temporarily disabled
#FFLAGS += -g -Ktrap=fp -Mbounds
FFLAGS += -g -Ktrap=fp -Mrecursive
SPEC_FFLAGS := $(FFLAGS)
else
SPEC_FFLAGS := $(FFLAGS)
# Check for override of default Fortran compiler optimizations
ifeq ($(F_OPTIMIZATION_OVERRIDE),$(null))
FORTRAN_OPTIMIZATION := -O2
endif
FFLAGS += $(FORTRAN_OPTIMIZATION)
endif

So it seems the default FORTRAN_OPTIMIZATION is already -O2, if I'm looking in the right place. How would I reduce this further? Could I use the DEBUG FFLAGS?

Cathy
 

jmccaa

New Member
Cathy's problem is exactly what I was seeing using the pgi 5.1-6 compiler on NCAR's clusters (bangkok and calgary) both with "-fast" and with "-O2". The only runs in which I did not see the anomolous warming at the top of the model were runs in which I took out all optimization flags. I believe this corresponds to "-O1", but I am not sure.

Using the portland group compiler, with no optimization flags, I was able to verify the climate against the control runs (using the oracle method, not the statistical test). This was consistent with George Carr's more formal evaluation of the coupled system. Since the temperature problem with optimization turned up was so glaring, there was no point in trying to verify that.

So... at present I could not IMHO recommend using the PGI compiler to run CAM with any level of optimization.

Jim
 
I did a perturbation growth test without any optimizations to PGF90 and the test seems to have passed, the difference curve is almost identical to the curve from the DEBUG option run, with the max difference getting to 10^-7 towards the end of the second day of the run.

Is it okay to run CAM3.0 with no compiler optimizations (as long as the climate verifies)? What do the optimizations do?

Thanks,
Cathy
 

pjr

Member
When you say "without any optimization" do you mean

1) deleting any reference to optimization flags (eg -Ox and -fast [where X is a number])

or

2) explicitly disabling the optimization use "-O0" (minus folllowed by the letter "O" followed by the number "zero").

option two is the prefered way. Option 1 is basically saying "take the default optimization (which is generally not the same as disabling optimization).

The model will run significantly slow (frequently 20-100% slower) without optimization.

Phil
 
I tried both options, and they both gave the same results for the perturbation growth test. So this fix seems to have worked, although I haven't run a full climate simulation yet. I will go with option two for the climate simulation and hopefully the output will look okay.

Cathy
 

jmccaa

New Member
For both pgf90 v5.1-6 and pgf90 v5.2-4, the man page reads:
"If -O is not specified, then the default level is 1 if -g is not specified, and 0 if -g is specified."

For 5.1-6, it seems that -O0 and -O1 are ok, while -O2 is not.

It would be nice to see if the 5.2-4 version, which solved several CAM-related problems, produces a reasonable climate with -O2.

Jim
 
Top