Abnormal value generated on a new machine when activating sea ice skeletal layer biogeochemistry

ZiqiYin · Apr 30, 2021

Hi everyone,

I ran some G (e.g. G1850ECO) and B (e.g. B1850) compsets with CESM2.1.3 to simulate sea ice biogeochemistry. To do this, I activated the skeletal layer biogeochemistry in CICE by setting ‘skl_bgc=.true.’ in the namelist. I also did some small modifications to the default setting for reading in WOA ocean surface N/Si forcing files. The setup has been already run successfully on two machines and produced reasonable outputs. However, recently, I moved to a new super computer and ran into some problems:

(1) (kind of solved) First, ‘BGC error3’ was reported in the log file because some negative values of skeletal layer nitrate concentration were generated such as -4.17E-321, but it was very very small. (Normally, according to the related equations in ice_algae.F90, no negative values should be generated.) I guess such small value was treated as zero in other machines but it was not handled well by this new machine. Since the negative values were so small, I guess they won’t influence the result. After forcing the model to continue by omitting the abort induced by this error, I found that the model output was reasonable and generally the same to the successful runs before on the other two machines.

(2) Another problem, which I am still stuck on, is that irregular abnormal values of some ice bgc tracers appear in my B compsets runs. These tracers include concentrations of mixed layer nitrate/silicate and skeletal layer nitrate/silicate. For example, in Figure 1 I show the normal mixed layer nitrate field of January, which is just the concentration of nitrate in the WOA forcing that the model reads in. Figure 2 is the version with problem. I set its colorbar scale the same to Figure 1. Actually, its maximum values is over 1E8, seen as the red spots in tropical and sub-tropical oceans. There are also other spots of small abnormal values in high latitudes, and even negative ones. Since these mixed layer nutrient fields are wrong, the calculated skeletal layer nutrient concentrations are also incorrect.

So I want to ask about the possible reason why this problem happens. I believe it is not caused by the setup: (1)The setup can run normally and produce reasonable results on the other machines (one of them also uses CESM2.1.3, the same to this new machine); (2) The G compset runs with the same setup on this new machine does not have the second problem and have reasonable outputs; (3) On the new machine, after the first error I mentioned was omitted, the setup could build and run successfully (though the outputs were wrong). By the way, on the new machine, default G and B compsets without enabling skeletal layer biogeochemistry can run normally, so the problem only happens to the skeletal layer bgc. It seems like the coupler or remapping does not function well? Could it be related to the compiler? Since one difference I found between this new machine and the other two is that this new machine uses ‘gcc4.9.4’, while the other two use ‘intel’.

Changing the PE layout, such as changing from parallel run to serial run, decreasing the number of processors do not solve the problem. Neither does setting xml variable ‘DEBUG=TRUE’.

Any advice is highly appreciated. Thanks in advance!

Ziqi

dbailey · Apr 30, 2021

I am moving this to the CICE Consortium. I do not have experience with the skeletal BGC. Hopefully @eclare@lanl_gov or @njeffery can chime in here. Note this is still CICE5 (within CESM).

ZiqiYin · May 1, 2021

Thank you Dave!

Ziqi

eclare@lanl_gov · May 10, 2021

Hi Ziqi,
Since the problems only occur when you are using a different machine/compiler, my guess is that some of the compiler options or flags aren't working the same way on this machine. For instance, many compilers have a 'flush-to-zero' option that takes care of very small values. If your compiler does not, then you'll need to add a manual command to the code to fix it (e.g. max(0, small negative value)), if it's causing an issue. But it sounds like the run continues fine in spite of it in your case, so maybe you don't need to do anything.

Another possibility is that the compiler optimization level messes something up. You can test for it by changing the optimization level (e.g. from -O2 to -O1 or -O0) and retesting. Lower optimization runs much slower, so you might have to restart from a time closer to the error. These kinds of problems are difficult to find and fix, and sometimes they also don't make much difference to the solution in a "climate" sense. You'll need to evaluate how important it is for your particular scientific problem.

That said, the problems in your item (2) seem to be coming from the other components, not the sea ice model. You could check whether they are originating in the ocean data that is passed to the coupler, in the coupler, or appearing after they are sent to the sea ice model. Then debug from there.
e

Abnormal value generated on a new machine when activating sea ice skeletal layer biogeochemistry

ZiqiYin

Ziqi Yin

New Member

Attachments

dbailey

CSEG and Liaisons

ZiqiYin

Ziqi Yin

New Member

eclare@lanl_gov

Member