Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Ouput difference between two identical runs

Mikasa

sky
Member
Hello, I have run two identical experiments usging CESM 2.1.3 compset B1PCTcmip6. Both of them ran 150 years.

And I have plot the timeseries of their output data. They are exactly the same between 1st to about 140th years. But after about the 140th year, there are slight difference between the two lines. Is this normal, or the run have some bug?
 

jedwards

CSEG and Liaisons
Staff member
We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.
 

Mikasa

sky
Member
We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.
I can confirm that they are the same pelayout, compiler and mpi libraries.
In addition, I have also ran two identical experiments with compset B1850cmip6 at the same time. But their results are exactly identical. This seems confusing.
 

Mikasa

sky
Member
We test and expect the results between two identical runs to be identical given the same pelayout. Results on different pelayouts may differ due
to differences in mpi reductions in the pop ocean model. Given both runs used the same pelayout, you may have encountered a system issue -
can you do a third run? Also please identify the compiler and mpi libraries you are using.
Sorry, now I find the pelayout of the two B1PCTcmip6 cases are different. For clarity, I call them case 1 and 2 respectively.
During 1st to 146th years, both of case 1 and 2 set POP_NTASKS=640.
During 146th to 150th years. case 1 set POP_NTASKS=640 while case 2 set POP_NTASKS=1280.
At the 146th year, due to the MARBL ERROR, I increase the dt_count of pop to solve it. And from then on, the results begin to differ. So,
Results on different pelayouts may differ due to differences in mpi reductions in the pop ocean model.
as you said.
All of my run are planning to run with POP_NTASKS=640 from the begining. And I plan to regard case 2 as a standard case. If I meet the MARBL ERROR in the future runs, I also have to increase the dt_count of POP and change POP_NTASKS. For the rigor of comparison, should I set all the case POP_NTASKS=1280 after the MARBL ERROR?
Thank you very much!
 

jedwards

CSEG and Liaisons
Staff member
Ah - changing the dt_count in pop will also change answers even if the POP_NTASKS remains the same.
 
Top