Question about change in the number of gridcell?

mengqi · Mar 19, 2024

Hi,

I am working on CLM5 simulation at US Midwest. Specifically, the number of lon and lat are 50 and 26, respectively, and the number of grid cell is 1300. To speed up the running, I changed NTASKS from 16 to 32. However, I found that the number of grid cell became 1298. I am not sure why it is. I do not think that NTASKS can change the number of grid cell.

Could anyone please give me some insights on this issue?

Thanks!

oleson · Mar 19, 2024

I didn't think so either. Where are you getting that information? If you can, please attach your before and after log files.

mengqi · Mar 20, 2024

oleson said:
I didn't think so either. Where are you getting that information? If you can, please attach your before and after log files.

Hi @oleson Thanks for your reply! I guess this is a trick question. I'd like to describe this issue in details.

In general, I am working on CLM5 simulation in the US Midwest.

The base case (simulation period: 2000-2003) does work. This is the namelist (i.e., user_nl_clm):

fsurdat = '/glade/u/home/mengqij/B_simulation_region_pv/surface_datasets/surfdata_CO_region_78pfts_simyr2000_c231005_a.nc'
paramfile = '/glade/u/home/mengqij/B_simulation_region_pv/parameters/ctsm51_params.c240105_b.nc'
hist_fincl2 = 'GPP'
hist_fincl3 = 'GPP'
hist_dov2xy = .true., .false., .false.
hist_type1d_pertape = ' ', 'GRID', 'PFTS'
hist_nhtfrq = 0, -24, -24
hist_mfilt = 1, 365, 365

For the patch-level output, gridcell is 1300 (lon x lat):

Screen Shot 2024-03-20 at 4.09.11 PM.png

In the first case, I changed NTASKS from 16 to 32, and I changed hist_mfilt = 1, 1825, 1825. However, you can see gridcell is 1298. In addition, I found that some variables became one dimension (1D), such as pfts1d_lon(pft), pfts1d_lat(pft) etc. These variables are 2D in the base case, such as pfts1d_lon(time, pft), pfts1d_lat(time pft) etc:

Screen Shot 2024-03-20 at 4.12.01 PM.png

As a result, I am not sure why it is, considering that I just revised hist_mfilt and NTASKS. Could you please offer me some insights?

Thanks!

oleson · Mar 20, 2024

Can you point me to your 16 and 32 NTASK cases?
It also seems like either way you would have more processors than gridcells. 16x128 = 2048 and 32x128 = 4096. Which might be a problem?

mengqi · Mar 20, 2024

oleson said:
Can you point me to your 16 and 32 NTASK cases?
It also seems like either way you would have more processors than gridcells. 16x128 = 2048 and 32x128 = 4096. Which might be a problem?

Thank you, @oleson Sure. I think "base case" (does work!) refers to 16 NTASK case, whereas "first case" (does not work!) refers to 32 NTASK case. Could you please clarify the relationship between the processors and gridcells you mentioned?

In addition, I suspect that I cannot change from hist_mfilt = 1, 365, 365 to hist_mfilt = 1, 1825, 1825. This is likely because the total year (i.e., 4 years) in my case is less than 1825 (i.e., 5 years). Thus, it might lead to some variables (e.g., pfts1d_lon, pfts1d_lat) that dropped one dimension (i.e., time). In any case, it sounds a little weird.

oleson · Mar 20, 2024

I thought you were specifying NTASKS in node notation. E.g., for a global simulation I'm used to seeing something like this:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-1</value>
<value compclass="CPL">-14</value>
<value compclass="OCN">-14</value>
<value compclass="WAV">-14</value>
<value compclass="GLC">-14</value>
<value compclass="ICE">-14</value>
<value compclass="ROF">-14</value>
<value compclass="LND">-14</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

So, this means that LND for example will use 14X128 = 1792 processors and the ATM (DATM) will use 1X128 = 128 processors. You can see this here:

<entry id="NTASKS_PER_INST">
<type>integer</type>
<values>
<value compclass="ATM">128</value>
<value compclass="OCN">1792</value>
<value compclass="WAV">1792</value>
<value compclass="GLC">1792</value>
<value compclass="ICE">1792</value>
<value compclass="ROF">1792</value>
<value compclass="LND">1792</value>
<value compclass="ESP">1</value>
</values>
<desc>Number of tasks per instance for each component. DO NOT EDIT: Set automatically by case.setup based on NTASKS, NINST and MULTI_DRIVER</desc>

It looks like you are specifying 16 processors for LND and 1 for the other components. Did someone recommend those settings?

I just wonder if this has something to do with the weirdness you are seeing. Although I'm not sure why it "works" with 16 and not 32. Regardless, you'll be charged for use of a full node (128 processors).

I'd try something like this to start:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-1</value>
<value compclass="CPL">-2</value>
<value compclass="OCN">-2</value>
<value compclass="WAV">-2</value>
<value compclass="GLC">-2</value>
<value compclass="ICE">-2</value>
<value compclass="ROF">-2</value>
<value compclass="LND">-2</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

And see if that works and is fast enough.
You'll also need to set ROOTPE differently:

<entry id="ROOTPE">
<type>integer</type>
<values>
<value compclass="ATM">0</value>
<value compclass="CPL">-1</value>
<value compclass="OCN">-1</value>
<value compclass="WAV">-1</value>
<value compclass="GLC">-1</value>
<value compclass="ICE">-1</value>
<value compclass="ROF">-1</value>
<value compclass="LND">-1</value>
<value compclass="ESP">0</value>
</values>
<desc>ROOTPE (mpi task in MPI_COMM_WORLD) for each component</desc>
</entry>

Also, you should be able to set mfilt to something greater than your run. The history file should just contain the number of time samples corresponding to the length of the run. If you did a restart, that file would be filled until it reached 1825 time samples and then a new file would be initiated.

oleson · Mar 20, 2024

Ok, I think another problem is that you are asking for "PFTS" level output for variables that are only available at the column level. That can result in strange behavior and not necessarily throw an error. For example, in one of your cases I'm looking at (I2000_CTSM_singlept_region_pv_test_5), you are requesting QFLX_LIQ_GRND at the pft-level. The lowest subgrid level for that variable is column.

this%qflx_liq_grnd_col(begc:endc) = spval
call hist_addfld1d ( &
fname=this%info%fname('QFLX_LIQ_GRND'), &
units='mm H2O/s', &
avgflag='A', &
long_name=this%info%lname('liquid (rain+irrigation) on ground after interception'), &
ptr_col=this%qflx_liq_grnd_col, default='inactive', c2l_scale_type='urbanf')

mengqi · Mar 21, 2024

Thank you, @oleson! Much appreciated! Will check and revise it.

mengqi · Aug 6, 2024

oleson said:
I thought you were specifying NTASKS in node notation. E.g., for a global simulation I'm used to seeing something like this:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-1</value>
<value compclass="CPL">-14</value>
<value compclass="OCN">-14</value>
<value compclass="WAV">-14</value>
<value compclass="GLC">-14</value>
<value compclass="ICE">-14</value>
<value compclass="ROF">-14</value>
<value compclass="LND">-14</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

So, this means that LND for example will use 14X128 = 1792 processors and the ATM (DATM) will use 1X128 = 128 processors. You can see this here:

<entry id="NTASKS_PER_INST">
<type>integer</type>
<values>
<value compclass="ATM">128</value>
<value compclass="OCN">1792</value>
<value compclass="WAV">1792</value>
<value compclass="GLC">1792</value>
<value compclass="ICE">1792</value>
<value compclass="ROF">1792</value>
<value compclass="LND">1792</value>
<value compclass="ESP">1</value>
</values>
<desc>Number of tasks per instance for each component. DO NOT EDIT: Set automatically by case.setup based on NTASKS, NINST and MULTI_DRIVER</desc>

It looks like you are specifying 16 processors for LND and 1 for the other components. Did someone recommend those settings?

I just wonder if this has something to do with the weirdness you are seeing. Although I'm not sure why it "works" with 16 and not 32. Regardless, you'll be charged for use of a full node (128 processors).

I'd try something like this to start:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-1</value>
<value compclass="CPL">-2</value>
<value compclass="OCN">-2</value>
<value compclass="WAV">-2</value>
<value compclass="GLC">-2</value>
<value compclass="ICE">-2</value>
<value compclass="ROF">-2</value>
<value compclass="LND">-2</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

And see if that works and is fast enough.
You'll also need to set ROOTPE differently:

<entry id="ROOTPE">
<type>integer</type>
<values>
<value compclass="ATM">0</value>
<value compclass="CPL">-1</value>
<value compclass="OCN">-1</value>
<value compclass="WAV">-1</value>
<value compclass="GLC">-1</value>
<value compclass="ICE">-1</value>
<value compclass="ROF">-1</value>
<value compclass="LND">-1</value>
<value compclass="ESP">0</value>
</values>
<desc>ROOTPE (mpi task in MPI_COMM_WORLD) for each component</desc>
</entry>

Also, you should be able to set mfilt to something greater than your run. The history file should just contain the number of time samples corresponding to the length of the run. If you did a restart, that file would be filled until it reached 1825 time samples and then a new file would be initiated.

Hi @oleson

I am using CESM2 to conduct a global simulation and intend to execute a case in which the atmosphere model is coupled with the land model. Consequently, I have a related inquiry regarding node allocation. Thus, I want to confirm whether I could employ the PE layout you recommend.

Note that I run CESM at a 0.9° × 1.25° spatial resolution at a global scale.

Thanks!

oleson · Aug 6, 2024

If you are running active atm and lnd, e.g., a F or B compset, you should start with the pe layout that is generated with your create_newcase. The layout you refer to above is for lnd driven by data atm.

mengqi · Aug 6, 2024

oleson said:
If you are running active atm and lnd, e.g., a F or B compset, you should start with the pe layout that is generated with your create_newcase. The layout you refer to above is for lnd driven by data atm.

That makes sense! Thank you, @oleson!

If I want to accelerate my simulations (active atm and lnd; compset is B1850) by specifying parameters, could you offer me a few insights? According to previous advice, I may consider revising 'NTASKS', 'NTASKS_PER_INST', 'NTASKS', and 'ROOTPE'. My understanding is right?

Here is relevant information in the env_mach_pes.xml:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-4</value>
<value compclass="CPL">-4</value>
<value compclass="OCN">-2</value>
<value compclass="WAV">-1</value>
<value compclass="GLC">-1</value>
<value compclass="ICE">-1</value>
<value compclass="ROF">-2</value>
<value compclass="LND">-2</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

<entry id="NTASKS_PER_INST">
<type>integer</type>
<values>
<value compclass="ATM">512</value>
<value compclass="OCN">256</value>
<value compclass="WAV">128</value>
<value compclass="GLC">128</value>
<value compclass="ICE">128</value>
<value compclass="ROF">256</value>
<value compclass="LND">256</value>
<value compclass="ESP">1</value>
</values>
<desc>Number of tasks per instance for each component. DO NOT EDIT: Set automatically by case.setup based on NTASKS, NINST and MULTI_DRIVER</desc>
</entry>

<entry id="NTHRDS">
<type>integer</type>
<values>
<value compclass="ATM">1</value>
<value compclass="CPL">1</value>
<value compclass="OCN">1</value>
<value compclass="WAV">1</value>
<value compclass="GLC">1</value>
<value compclass="ICE">1</value>
<value compclass="ROF">1</value>
<value compclass="LND">1</value>
<value compclass="ESP">1</value>
</values>
<desc>number of threads for each task in each component</desc>
</entry>

<entry id="ROOTPE">
<type>integer</type>
<values>
<value compclass="ATM">0</value>
<value compclass="CPL">0</value>
<value compclass="OCN">-4</value>
<value compclass="WAV">0</value>
<value compclass="GLC">0</value>
<value compclass="ICE">-2</value>
<value compclass="ROF">0</value>
<value compclass="LND">0</value>
<value compclass="ESP">0</value>
</values>
<desc>ROOTPE (mpi task in MPI_COMM_WORLD) for each component</desc>
</entry>

<entry id="MULTI_DRIVER" value="FALSE">
<type>logical</type>
<valid_values>TRUE,FALSE</valid_values>
<desc>MULTI_DRIVER mode provides a separate driver/coupler component for each
ensemble member. All components must have an equal number of members. If
MULTI_DRIVER mode is False prognostic components must have the same number
of members but data or stub components may also have 1 member. </desc>
</entry>

<entry id="NINST">
<type>integer</type>
<values>
<value compclass="ATM">1</value>
<value compclass="OCN">1</value>
<value compclass="WAV">1</value>
<value compclass="GLC">1</value>
<value compclass="ICE">1</value>
<value compclass="ROF">1</value>
<value compclass="LND">1</value>
<value compclass="ESP">1</value>
</values>
<desc>Number of instances for each component. If MULTI_DRIVER is True
the NINST_MAX value will be used.
</desc>
</entry>

<entry id="NINST_LAYOUT">
<type>char</type>
<valid_values>sequential,concurrent</valid_values>
<values>
<value compclass="ATM">concurrent</value>
<value compclass="OCN">concurrent</value>
<value compclass="WAV">concurrent</value>
<value compclass="GLC">concurrent</value>
<value compclass="ICE">concurrent</value>
<value compclass="ROF">concurrent</value>
<value compclass="LND">concurrent</value>
<value compclass="ESP">concurrent</value>
</values>
<desc>Layout of component instances for each component</desc>
</entry>

<entry id="PSTRID">
<type>integer</type>
<values>
<value compclass="ATM">1</value>
<value compclass="CPL">1</value>
<value compclass="OCN">1</value>
<value compclass="WAV">1</value>
<value compclass="GLC">1</value>
<value compclass="ICE">1</value>
<value compclass="ROF">1</value>
<value compclass="LND">1</value>
<value compclass="ESP">1</value>
</values>
<desc>The mpi global processors stride associated with the mpi tasks for the a component</desc>
</entry>
</group>
</file>

oleson · Aug 6, 2024

Based on a 1 month run I did with release-cesm2.1.5 for a f19 B1850, you should be able to get about 15 years per day, which seems pretty reasonable. If you want to go faster, I suggest posting on the infrastructure forum, I don't have much experience with fully-coupled PE layouts.

Question about change in the number of gridcell?

mengqi

mj

Member

oleson

Keith Oleson

CSEG and Liaisons

mengqi

mj

Member

oleson

Keith Oleson

CSEG and Liaisons

mengqi

mj

Member

oleson

Keith Oleson

CSEG and Liaisons

oleson

Keith Oleson

CSEG and Liaisons

mengqi

mj

Member

mengqi

mj

Member

oleson

Keith Oleson

CSEG and Liaisons

mengqi

mj

Member

oleson

Keith Oleson

CSEG and Liaisons