Ouput difference between two identical runs

Mikasa

sky
Member
Hello, I have run two identical experiments usging CESM 2.1.3 compset B1PCTcmip6. Both of them ran 150 years.

And I have plot the timeseries of their output data. They are exactly the same between 1st to about 140th years. But after about the 140th year, there are slight difference between the two lines. Is this normal, or the run have some bug?
 

jedwards

CSEG and Liaisons
Staff member
We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.
 

Mikasa

sky
Member
We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.
I can confirm that they are the same pelayout, compiler and mpi libraries.
In addition, I have also ran two identical experiments with compset B1850cmip6 at the same time. But their results are exactly identical. This seems confusing.
 

Mikasa

sky
Member
We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.
Sorry, now I find the pelayout of the two B1PCTcmip6 cases are different. For clarity, I call them case 1 and 2 respectively.
During 1st to 146th years, both of case 1 and 2 set POP_NTASKS=640.
During 146th to 150th years. case 1 set POP_NTASKS=640 while case 2 set POP_NTASKS=1280.
At the 146th year, due to the MARBL ERROR, I increase the dt_count of pop to solve it. And from then on, the results begin to differ. So,
Results on different pelayouts may differ due to differences in mpi reductions in the pop ocean model.
as you said.
All of my run are planning to run with POP_NTASKS=640 from the begining. And I plan to regard case 2 as a standard case. If I meet the MARBL ERROR in the future runs, I also have to increase the dt_count of POP and change POP_NTASKS. For the rigor of comparison, should I set all the case POP_NTASKS=1280 after the MARBL ERROR?
Thank you very much!
 

jedwards

CSEG and Liaisons
Staff member
Ah - changing the dt_count in pop will also change answers even if the POP_NTASKS remains the same.
 
Back
Top