Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Bit Reproducibility in CAM6

kareed

New Member
Hi Team,I noticed recently that when running CAM6 with the QPC6 compset and the standard resolution (f09_f09_mg17) on Cheyenne I am not getting bit reproducibility if I change the number of cpus (pecount).  I was under the impression that bit reproducibility should be expected even if the number of processors change, is this not the case?  Has anyone else experienced this?  I am using the released version (CESM2.1).Thanks,Kevin
 

eaton

CSEG and Liaisons
I expect answers for this compset to be independent of task count.  What are the task counts where you're seeing differences?  Maybe you could point to your cases on cheyenne. 
 

kareed

New Member
Hi Brian,First, I referenced the wroung compset, I actually noticed this issue with FHIST.In particular, I notice the differences when I am using --pecount 360 or 576  in the case below:./create_newcase --case $CASEROOT --compset FHIST --res f09_f09_mg17 --mach cheyenne --project ######## --pecount 576Unforunately I don't have the cases currently.  But I could replicate this issue if need be.Kevin
 

eaton

CSEG and Liaisons
I'm able to reproduce your result.  I thought this was only an issue for B compsets, but apparently not.  I see roundoff level differences in the very first timestep in the fields coming from the coupler.  There are flags to set to force the coupler to give task count independent results, but these flags are not set by default because they incur a performance penalty.  I think if you issue the command "./xmlchange BFBFLAG=TRUE" from your run script that will address this issue.  Let me know if that doesn't work. 
 

nick

Herold
Member
Apologies for the cross posting (here). BFBFLAG doesn't work for me when going from 256 to 384 cores with an FHIST compset. Silly question but does this flag have to be set to TRUE on both runs? What exactly does this flag do?
 

nick

Herold
Member
Answered part of the above. Can confirm the BFBFLAG has to be set to TRUE on all runs, which is a shame since I've run a bunch already without this flag.
 

zhangmeixin

mxzhang
Member
Answered part of the above. Can confirm the BFBFLAG has to be set to TRUE on all runs, which is a shame since I've run a bunch already without this flag.
Hi,nick! I ran FHIST with different numbers of nodes and got different results. Is this normal?
 

zhangmeixin

mxzhang
Member
All I can say is that I didn't get different results, while setting BFBFLAG=TRUE on both runs.
Thank you for your reply! Indeed, when I enable this option, the results are independent of the nodes, but when it's not enabled, different nodes produce different results, which I find very strange.
 
Top