klaus_wyser@smhi_se
New Member
Is anybody out there who has run CCSM on different Linux clusters and compared the results? Has anybody compared the results with and without optimization? And has anybody changed the number of CPUs and compared the results? I would like to hear about your experience.
Recently we have got a new, very powerful Linux cluster dedicated for climate research. I have installed CCSM on this machine and run a couple of tests. The results are puzzling and I would like to hear what other users have learned when changing the configuration.
Our cluster consists of dual Intel Xeon machines, each with 2Gb of memory, connected through Infiniband (with software from Scali). We use the most recent Intel 8.1 or 9 compilers. This configuration is not officially supported by NCAR and we have learned that the Xeon/Intel compiler combination may be the source of trouble. I have run the model for 10 years in each experiment with identical boundary and initial conditions. I then compared the monthly averages of surface temperature, precipitation, sea ice cover, mean sea level pressure, and precipitable water, looking at the globally averaged difference (BIAS) and the root mean squared difference (RMSD).
The results from the new cluster are different from those from our old cluster. For example I find that the globally averaged monthly mean surface temperature can differ by more than 1 K. This might be caused by the optimization flag, so I switched off the optimization when compiling. The results looked different, but the differences are still of the same size as in the optimized case. I then tested different versions of the Intel compiler (several different releases of 8.1 and 9) on the same cluster and again, the results are different but the difference is still on the same order as before. Finally I let the model run on one cluster but changed the number of CPUs for each module. Again, the results differed, but with the difference still being in the same ballpark as in all other experiments.
Summing up: results from 2 CCSM experiments are different if we
1) run the model on a different computer
2) switch optimization on/off
3) change number of processors
The difference between any 2 runs is always of the same order, and there is in general no trend with time.
I am very curious to hear other's experience in this issue.
Cheers,
Klaus
Recently we have got a new, very powerful Linux cluster dedicated for climate research. I have installed CCSM on this machine and run a couple of tests. The results are puzzling and I would like to hear what other users have learned when changing the configuration.
Our cluster consists of dual Intel Xeon machines, each with 2Gb of memory, connected through Infiniband (with software from Scali). We use the most recent Intel 8.1 or 9 compilers. This configuration is not officially supported by NCAR and we have learned that the Xeon/Intel compiler combination may be the source of trouble. I have run the model for 10 years in each experiment with identical boundary and initial conditions. I then compared the monthly averages of surface temperature, precipitation, sea ice cover, mean sea level pressure, and precipitable water, looking at the globally averaged difference (BIAS) and the root mean squared difference (RMSD).
The results from the new cluster are different from those from our old cluster. For example I find that the globally averaged monthly mean surface temperature can differ by more than 1 K. This might be caused by the optimization flag, so I switched off the optimization when compiling. The results looked different, but the differences are still of the same size as in the optimized case. I then tested different versions of the Intel compiler (several different releases of 8.1 and 9) on the same cluster and again, the results are different but the difference is still on the same order as before. Finally I let the model run on one cluster but changed the number of CPUs for each module. Again, the results differed, but with the difference still being in the same ballpark as in all other experiments.
Summing up: results from 2 CCSM experiments are different if we
1) run the model on a different computer
2) switch optimization on/off
3) change number of processors
The difference between any 2 runs is always of the same order, and there is in general no trend with time.
I am very curious to hear other's experience in this issue.
Cheers,
Klaus