Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Using CAM6 to run the same experiment with different cores, the results vary greatly

ybb

lichun
New Member
Hello! I have a problem about cam6.
I used cam6 to run the same experiment with different cores,but the results are quite different.Is that right?
What should I do if it is wrong?
I'm glad I got the answer.
 

cacraig

Cheryl Craig
CSEG and Liaisons
Staff member
What compset are you using (i.e is it a CAM-only F compset or a fully coupled compset like a B compset?) CAM F compsets are intended to give identical results independent of the PE layout. If you are experiencing this with a F compset, please give us the details requested in:https://bb.cgd.ucar.edu/cesm/threads/information-to-include-in-help-requests.4974/

If this is a B compset, changing PE layout is expected to change answers. If you want to control this behavior, there is a bfbflag documented at: 3.5. Bit-for-bit flag — CIME master documentation
 

ybb

lichun
New Member
I am very glad and honored to receive your reply.This is the first time I've posted a question. I am sorry that some of the content is not very clear.

I use the compset F2000climo(CAM-only F compset) to run a case.I changed nothing and ran the same case with different numbers of nodes(each node has 28 cores).Their results vary widely. For example, temperatures can vary by five degrees or more in winter.The version is CESM2.1.3.

I don't know what to do about it.Looking forward to your reply.THANK U,sir.
 

Attachments

  • version_info.txt
    7.3 KB · Views: 2

ybb

lichun
New Member
What compset are you using (i.e is it a CAM-only F compset or a fully coupled compset like a B compset?) CAM F compsets are intended to give identical results independent of the PE layout. If you are experiencing this with a F compset, please give us the details requested in:https://bb.cgd.ucar.edu/cesm/threads/information-to-include-in-help-requests.4974/

If this is a B compset, changing PE layout is expected to change answers. If you want to control this behavior, there is a bfbflag documented at: 3.5. Bit-for-bit flag — CIME master documentation
am very glad and honored to receive your reply.This is the first time I've posted a question. I am sorry that some of the content is not very clear.

I use the compset F2000climo(CAM-only F compset) to run a case.I changed nothing and ran the same case with different numbers of nodes(each node has 28 cores).Their results vary widely. For example, temperatures can vary by five degrees or more in winter.The version is CESM2.1.3.

I don't know what to do about it.Looking forward to your reply.THANK U,sir.
 

Attachments

  • version_info.txt
    7.3 KB · Views: 2

cacraig

Cheryl Craig
CSEG and Liaisons
Staff member
Sorry to have not replied sooner. Are you still having issues with this?
 

nick

Herold
Member
Hi @cacraig, I'm seeing this issue as well. I ran an FHIST compset (HIST_CAM60_CLM50%SP_CICE%PRES_DOCN%DOM_MOSART_CISM2%NOEVOLVE_SWAV), with 256 cores and an identical one with 384 cores and got different values by the end of the 1st month. Resolution was f09_f09_mg17. Has this been documented before?
 

nick

Herold
Member
My compiler settings are below (could aggressive optmisation cause it?):
<compiler COMPILER="intel">
<NETCDF_PATH>/apps/spack/bell/apps/netcdf-fortran/4.5.3-intel-19.0.5-75zjiqj/</NETCDF_PATH>
<PNETCDF_PATH>/apps/spack/bell/apps/parallel-netcdf/1.11.2-intel-19.0.5-ujjzfwp/</PNETCDF_PATH>
<CFLAGS>
<base> -qno-opt-dynamic-align -fp-model precise -std=gnu99 -I${NETCDF_PATH}/include/</base>
<append compile_threaded="true"> -qopenmp </append>
<append DEBUG="FALSE"> -O3 -debug minimal </append>
<append DEBUG="TRUE"> -O0 -g </append>
</CFLAGS>
<CPPDEFS>
<!-- Technical Library try "-xHost" and set "export MKL_DEBUG_CPU_TYPE=5" as environment variable -->
<append> -DFORTRANUNDERSCORE -DCPRINTEL</append>
</CPPDEFS>
<CXX_LDFLAGS>
<base> -cxxlib </base>
</CXX_LDFLAGS>
<CXX_LINKER>FORTRAN</CXX_LINKER>
<FC_AUTO_R8>
<base> -r8 </base>
</FC_AUTO_R8>
<FFLAGS>
<base> -qno-opt-dynamic-align -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -fp-model source -I${NETCDF_PATH}/include/</base>
<append compile_threaded="true"> -qopenmp </append>
<append DEBUG="TRUE"> -O0 -g -check uninit -check bounds -check pointers -fpe0 -check noarg_temp_created </append>
<append DEBUG="FALSE"> -O3 -debug minimal </append>
</FFLAGS>
<FFLAGS_NOOPT>
<base> -O0 </base>
<append compile_threaded="true"> -qopenmp </append>
</FFLAGS_NOOPT>
<FIXEDFLAGS>
<base> -fixed -132 </base>
</FIXEDFLAGS>
<FREEFLAGS>
<base> -free </base>
</FREEFLAGS>
<LDFLAGS>
<append compile_threaded="true"> -qopenmp </append>
</LDFLAGS>
<MPICC> mpicc </MPICC>
<MPICXX> mpicxx </MPICXX>
<MPIFC> mpif90 </MPIFC>
<SCC> icc </SCC>
<SCXX> icpc </SCXX>
<SFC> ifort </SFC>

<SLIBS>
<append MPILIB="mpich"> -mkl=cluster </append>
<append MPILIB="mpich2"> -mkl=cluster </append>
<append MPILIB="mvapich"> -mkl=cluster </append>
<append MPILIB="mvapich2"> -mkl=cluster </append>
<append MPILIB="mpt"> -mkl=cluster </append>
<append MPILIB="openmpi"> -mkl=cluster </append>
<append MPILIB="impi"> -mkl=cluster </append>
<append MPILIB="mpi-serial"> -mkl </append>
<append> -L${NETCDF_PATH}/lib -lnetcdff -lnetcdf</append>
</SLIBS>
<SUPPORTS_CXX>TRUE</SUPPORTS_CXX>
</compiler>
 

nick

Herold
Member
I found the below thread recommending setting the BFBFLAG to TRUE in env_run.xml, but I still get different results. What's funny is that this flag does change my results (compared to BFBFLAG=FALSE), but it doesn't make it the same as when my PE count remains unchanged.
 

peverley

Courtney Peverley
Moderator
Did setting the BFBFLAG=true for both runs solve your problem, or are you still getting different results?
 

Mengyao Xu

Mengyao Xu
New Member
Did setting the BFBFLAG=true for both runs solve your problem, or are you still getting different results?
Hi Courtney!

I am using BSSP126 compset, and the same issue appears. Even if I set the BFBFLAG=TRUE in env_run.xml, the results are still different depending on different PE layout. But as Nick said, setting the BFBFLAG=TRUE does change the results(comparing to BFBFLAG=FALSE), but does not control the reproductivity between different PE Layout. Could you help me deal with this problem?
 

nick

Herold
Member
Sorry for the delay, I updated my other post but no this one. I can confirm that setting BFBFLAG on two runs with differing layouts will produce identical results. @Mengyao Xu, are you running both runs with BFBFLAG = TRUE?
 

Mengyao Xu

Mengyao Xu
New Member
Sorry for the delay, I updated my other post but no this one. I can confirm that setting BFBFLAG on two runs with differing layouts will produce identical results. @Mengyao Xu, are you running both runs with BFBFLAG = TRUE?
Thanks Nick!

I do run both runs with BFBFLAG = TRUE(setting in env_run.xml). Also I checked my forcing data in two runs to be identical. The only difference is about PE layout(one is 8 nodes*50cores, and the other is 12nodes*32 cores). I wonder if you tested this problem using B compsets. Waiting for your reply!
 

nick

Herold
Member
No I'm using F compsets. Previously CAM/CLM runs were always bit-for-bit identical regardless of PE layout, but something changed - I think in CAM6 - so this is no longer the case. Hopefully you haven't stumbled upon a bug where B compsets don't work with the BFBFLAG.
 

Mengyao Xu

Mengyao Xu
New Member
No I'm using F compsets. Previously CAM/CLM runs were always bit-for-bit identical regardless of PE layout, but something changed - I think in CAM6 - so this is no longer the case. Hopefully you haven't stumbled upon a bug where B compsets don't work with the BFBFLAG.
Thanks, Nick!
It's very generous of you to share your modeling experience with me. However, this problem remains unsolved in my B compsets modeling. Hopefully experts from CESM team could give me a hint.
 

zhangmeixin

mxzhang
Member
What compset are you using (i.e is it a CAM-only F compset or a fully coupled compset like a B compset?) CAM F compsets are intended to give identical results independent of the PE layout. If you are experiencing this with a F compset, please give us the details requested in:Information to include in help requests

If this is a B compset, changing PE layout is expected to change answers. If you want to control this behavior, there is a bfbflag documented at: 3.5. Bit-for-bit flag — CIME master documentation
Hi!I am running the FHIST compsets, but the results vary greatly with different numbers of tasks, especially the CLM results. Is there a problem with the porting?
 
Top