Hi everyone,
I ran some G (e.g. G1850ECO) and B (e.g. B1850) compsets with CESM2.1.3 to simulate sea ice biogeochemistry. To do this, I activated the skeletal layer biogeochemistry in CICE by setting ‘skl_bgc=.true.’ in the namelist. I also did some small modifications to the default setting for reading in WOA ocean surface N/Si forcing files. The setup has been already run successfully on two machines and produced reasonable outputs. However, recently, I moved to a new super computer and ran into some problems:
(1) (kind of solved) First, ‘BGC error3’ was reported in the log file because some negative values of skeletal layer nitrate concentration were generated such as -4.17E-321, but it was very very small. (Normally, according to the related equations in ice_algae.F90, no negative values should be generated.) I guess such small value was treated as zero in other machines but it was not handled well by this new machine. Since the negative values were so small, I guess they won’t influence the result. After forcing the model to continue by omitting the abort induced by this error, I found that the model output was reasonable and generally the same to the successful runs before on the other two machines.
(2) Another problem, which I am still stuck on, is that irregular abnormal values of some ice bgc tracers appear in my B compsets runs. These tracers include concentrations of mixed layer nitrate/silicate and skeletal layer nitrate/silicate. For example, in Figure 1 I show the normal mixed layer nitrate field of January, which is just the concentration of nitrate in the WOA forcing that the model reads in. Figure 2 is the version with problem. I set its colorbar scale the same to Figure 1. Actually, its maximum values is over 1E8, seen as the red spots in tropical and sub-tropical oceans. There are also other spots of small abnormal values in high latitudes, and even negative ones. Since these mixed layer nutrient fields are wrong, the calculated skeletal layer nutrient concentrations are also incorrect.
So I want to ask about the possible reason why this problem happens. I believe it is not caused by the setup: (1)The setup can run normally and produce reasonable results on the other machines (one of them also uses CESM2.1.3, the same to this new machine); (2) The G compset runs with the same setup on this new machine does not have the second problem and have reasonable outputs; (3) On the new machine, after the first error I mentioned was omitted, the setup could build and run successfully (though the outputs were wrong). By the way, on the new machine, default G and B compsets without enabling skeletal layer biogeochemistry can run normally, so the problem only happens to the skeletal layer bgc. It seems like the coupler or remapping does not function well? Could it be related to the compiler? Since one difference I found between this new machine and the other two is that this new machine uses ‘gcc4.9.4’, while the other two use ‘intel’.
Changing the PE layout, such as changing from parallel run to serial run, decreasing the number of processors do not solve the problem. Neither does setting xml variable ‘DEBUG=TRUE’.
Any advice is highly appreciated. Thanks in advance!
Ziqi
I ran some G (e.g. G1850ECO) and B (e.g. B1850) compsets with CESM2.1.3 to simulate sea ice biogeochemistry. To do this, I activated the skeletal layer biogeochemistry in CICE by setting ‘skl_bgc=.true.’ in the namelist. I also did some small modifications to the default setting for reading in WOA ocean surface N/Si forcing files. The setup has been already run successfully on two machines and produced reasonable outputs. However, recently, I moved to a new super computer and ran into some problems:
(1) (kind of solved) First, ‘BGC error3’ was reported in the log file because some negative values of skeletal layer nitrate concentration were generated such as -4.17E-321, but it was very very small. (Normally, according to the related equations in ice_algae.F90, no negative values should be generated.) I guess such small value was treated as zero in other machines but it was not handled well by this new machine. Since the negative values were so small, I guess they won’t influence the result. After forcing the model to continue by omitting the abort induced by this error, I found that the model output was reasonable and generally the same to the successful runs before on the other two machines.
(2) Another problem, which I am still stuck on, is that irregular abnormal values of some ice bgc tracers appear in my B compsets runs. These tracers include concentrations of mixed layer nitrate/silicate and skeletal layer nitrate/silicate. For example, in Figure 1 I show the normal mixed layer nitrate field of January, which is just the concentration of nitrate in the WOA forcing that the model reads in. Figure 2 is the version with problem. I set its colorbar scale the same to Figure 1. Actually, its maximum values is over 1E8, seen as the red spots in tropical and sub-tropical oceans. There are also other spots of small abnormal values in high latitudes, and even negative ones. Since these mixed layer nutrient fields are wrong, the calculated skeletal layer nutrient concentrations are also incorrect.
So I want to ask about the possible reason why this problem happens. I believe it is not caused by the setup: (1)The setup can run normally and produce reasonable results on the other machines (one of them also uses CESM2.1.3, the same to this new machine); (2) The G compset runs with the same setup on this new machine does not have the second problem and have reasonable outputs; (3) On the new machine, after the first error I mentioned was omitted, the setup could build and run successfully (though the outputs were wrong). By the way, on the new machine, default G and B compsets without enabling skeletal layer biogeochemistry can run normally, so the problem only happens to the skeletal layer bgc. It seems like the coupler or remapping does not function well? Could it be related to the compiler? Since one difference I found between this new machine and the other two is that this new machine uses ‘gcc4.9.4’, while the other two use ‘intel’.
Changing the PE layout, such as changing from parallel run to serial run, decreasing the number of processors do not solve the problem. Neither does setting xml variable ‘DEBUG=TRUE’.
Any advice is highly appreciated. Thanks in advance!
Ziqi