Ouput difference between two identical runs

Mikasa · Jul 30, 2022

Hello, I have run two identical experiments usging CESM 2.1.3 compset B1PCTcmip6. Both of them ran 150 years.

And I have plot the timeseries of their output data. They are exactly the same between 1st to about 140th years. But after about the 140th year, there are slight difference between the two lines. Is this normal, or the run have some bug?

jedwards · Jul 31, 2022

We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.

Mikasa · Jul 31, 2022

jedwards said:
We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.

I can confirm that they are the same pelayout, compiler and mpi libraries.
In addition, I have also ran two identical experiments with compset B1850cmip6 at the same time. But their results are exactly identical. This seems confusing.

Mikasa · Jul 31, 2022

jedwards said:
We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.

Sorry, now I find the pelayout of the two B1PCTcmip6 cases are different. For clarity, I call them case 1 and 2 respectively.
During 1st to 146th years, both of case 1 and 2 set POP_NTASKS=640.
During 146th to 150th years. case 1 set POP_NTASKS=640 while case 2 set POP_NTASKS=1280.
At the 146th year, due to the MARBL ERROR, I increase the dt_count of pop to solve it. And from then on, the results begin to differ. So,

Results on different pelayouts may differ due to differences in mpi reductions in the pop ocean model.

as you said.
All of my run are planning to run with POP_NTASKS=640 from the begining. And I plan to regard case 2 as a standard case. If I meet the MARBL ERROR in the future runs, I also have to increase the dt_count of POP and change POP_NTASKS. For the rigor of comparison, should I set all the case POP_NTASKS=1280 after the MARBL ERROR?
Thank you very much!

jedwards · Jul 31, 2022

Ah - changing the dt_count in pop will also change answers even if the POP_NTASKS remains the same.

Mikasa · Jul 31, 2022

jedwards said:
Ah - changing the dt_count in pop will also change answers even if the POP_NTASKS remains the same.

Ok, I got it. Many thanks!

Ouput difference between two identical runs

Mikasa

sky

Member

jedwards

CSEG and Liaisons

Mikasa

sky

Member

Mikasa

sky

Member

jedwards

CSEG and Liaisons

Mikasa

sky

Member