Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Question about change in the number of gridcell?

Status
Not open for further replies.

mengqi

mj
Member
Hi,

I am working on CLM5 simulation at US Midwest. Specifically, the number of lon and lat are 50 and 26, respectively, and the number of grid cell is 1300. To speed up the running, I changed NTASKS from 16 to 32. However, I found that the number of grid cell became 1298. I am not sure why it is. I do not think that NTASKS can change the number of grid cell.

Could anyone please give me some insights on this issue?

Thanks!
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I didn't think so either. Where are you getting that information? If you can, please attach your before and after log files.
 

mengqi

mj
Member
I didn't think so either. Where are you getting that information? If you can, please attach your before and after log files.
Hi @oleson Thanks for your reply! I guess this is a trick question. I'd like to describe this issue in details.

In general, I am working on CLM5 simulation in the US Midwest.

The base case (simulation period: 2000-2003) does work. This is the namelist (i.e., user_nl_clm):

fsurdat = '/glade/u/home/mengqij/B_simulation_region_pv/surface_datasets/surfdata_CO_region_78pfts_simyr2000_c231005_a.nc'
paramfile = '/glade/u/home/mengqij/B_simulation_region_pv/parameters/ctsm51_params.c240105_b.nc'
hist_fincl2 = 'GPP'
hist_fincl3 = 'GPP'
hist_dov2xy = .true., .false., .false.
hist_type1d_pertape = ' ', 'GRID', 'PFTS'
hist_nhtfrq = 0, -24, -24
hist_mfilt = 1, 365, 365


For the patch-level output, gridcell is 1300 (lon x lat):

Screen Shot 2024-03-20 at 4.09.11 PM.png


In the first case, I changed NTASKS from 16 to 32, and I changed hist_mfilt = 1, 1825, 1825. However, you can see gridcell is 1298. In addition, I found that some variables became one dimension (1D), such as pfts1d_lon(pft), pfts1d_lat(pft) etc. These variables are 2D in the base case, such as pfts1d_lon(time, pft), pfts1d_lat(time pft) etc:

Screen Shot 2024-03-20 at 4.12.01 PM.png

As a result, I am not sure why it is, considering that I just revised hist_mfilt and NTASKS. Could you please offer me some insights?

Thanks!
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Can you point me to your 16 and 32 NTASK cases?
It also seems like either way you would have more processors than gridcells. 16x128 = 2048 and 32x128 = 4096. Which might be a problem?
 

mengqi

mj
Member
Can you point me to your 16 and 32 NTASK cases?
It also seems like either way you would have more processors than gridcells. 16x128 = 2048 and 32x128 = 4096. Which might be a problem?
Thank you, @oleson Sure. I think "base case" (does work!) refers to 16 NTASK case, whereas "first case" (does not work!) refers to 32 NTASK case. Could you please clarify the relationship between the processors and gridcells you mentioned?

In addition, I suspect that I cannot change from hist_mfilt = 1, 365, 365 to hist_mfilt = 1, 1825, 1825. This is likely because the total year (i.e., 4 years) in my case is less than 1825 (i.e., 5 years). Thus, it might lead to some variables (e.g., pfts1d_lon, pfts1d_lat) that dropped one dimension (i.e., time). In any case, it sounds a little weird.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I thought you were specifying NTASKS in node notation. E.g., for a global simulation I'm used to seeing something like this:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-1</value>
<value compclass="CPL">-14</value>
<value compclass="OCN">-14</value>
<value compclass="WAV">-14</value>
<value compclass="GLC">-14</value>
<value compclass="ICE">-14</value>
<value compclass="ROF">-14</value>
<value compclass="LND">-14</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

So, this means that LND for example will use 14X128 = 1792 processors and the ATM (DATM) will use 1X128 = 128 processors. You can see this here:

<entry id="NTASKS_PER_INST">
<type>integer</type>
<values>
<value compclass="ATM">128</value>
<value compclass="OCN">1792</value>
<value compclass="WAV">1792</value>
<value compclass="GLC">1792</value>
<value compclass="ICE">1792</value>
<value compclass="ROF">1792</value>
<value compclass="LND">1792</value>
<value compclass="ESP">1</value>
</values>
<desc>Number of tasks per instance for each component. DO NOT EDIT: Set automatically by case.setup based on NTASKS, NINST and MULTI_DRIVER</desc>

It looks like you are specifying 16 processors for LND and 1 for the other components. Did someone recommend those settings?

I just wonder if this has something to do with the weirdness you are seeing. Although I'm not sure why it "works" with 16 and not 32. Regardless, you'll be charged for use of a full node (128 processors).

I'd try something like this to start:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-1</value>
<value compclass="CPL">-2</value>
<value compclass="OCN">-2</value>
<value compclass="WAV">-2</value>
<value compclass="GLC">-2</value>
<value compclass="ICE">-2</value>
<value compclass="ROF">-2</value>
<value compclass="LND">-2</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

And see if that works and is fast enough.
You'll also need to set ROOTPE differently:

<entry id="ROOTPE">
<type>integer</type>
<values>
<value compclass="ATM">0</value>
<value compclass="CPL">-1</value>
<value compclass="OCN">-1</value>
<value compclass="WAV">-1</value>
<value compclass="GLC">-1</value>
<value compclass="ICE">-1</value>
<value compclass="ROF">-1</value>
<value compclass="LND">-1</value>
<value compclass="ESP">0</value>
</values>
<desc>ROOTPE (mpi task in MPI_COMM_WORLD) for each component</desc>
</entry>

Also, you should be able to set mfilt to something greater than your run. The history file should just contain the number of time samples corresponding to the length of the run. If you did a restart, that file would be filled until it reached 1825 time samples and then a new file would be initiated.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Ok, I think another problem is that you are asking for "PFTS" level output for variables that are only available at the column level. That can result in strange behavior and not necessarily throw an error. For example, in one of your cases I'm looking at (I2000_CTSM_singlept_region_pv_test_5), you are requesting QFLX_LIQ_GRND at the pft-level. The lowest subgrid level for that variable is column.

this%qflx_liq_grnd_col(begc:endc) = spval
call hist_addfld1d ( &
fname=this%info%fname('QFLX_LIQ_GRND'), &
units='mm H2O/s', &
avgflag='A', &
long_name=this%info%lname('liquid (rain+irrigation) on ground after interception'), &
ptr_col=this%qflx_liq_grnd_col, default='inactive', c2l_scale_type='urbanf')
 

mengqi

mj
Member
I thought you were specifying NTASKS in node notation. E.g., for a global simulation I'm used to seeing something like this:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-1</value>
<value compclass="CPL">-14</value>
<value compclass="OCN">-14</value>
<value compclass="WAV">-14</value>
<value compclass="GLC">-14</value>
<value compclass="ICE">-14</value>
<value compclass="ROF">-14</value>
<value compclass="LND">-14</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

So, this means that LND for example will use 14X128 = 1792 processors and the ATM (DATM) will use 1X128 = 128 processors. You can see this here:

<entry id="NTASKS_PER_INST">
<type>integer</type>
<values>
<value compclass="ATM">128</value>
<value compclass="OCN">1792</value>
<value compclass="WAV">1792</value>
<value compclass="GLC">1792</value>
<value compclass="ICE">1792</value>
<value compclass="ROF">1792</value>
<value compclass="LND">1792</value>
<value compclass="ESP">1</value>
</values>
<desc>Number of tasks per instance for each component. DO NOT EDIT: Set automatically by case.setup based on NTASKS, NINST and MULTI_DRIVER</desc>

It looks like you are specifying 16 processors for LND and 1 for the other components. Did someone recommend those settings?

I just wonder if this has something to do with the weirdness you are seeing. Although I'm not sure why it "works" with 16 and not 32. Regardless, you'll be charged for use of a full node (128 processors).

I'd try something like this to start:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-1</value>
<value compclass="CPL">-2</value>
<value compclass="OCN">-2</value>
<value compclass="WAV">-2</value>
<value compclass="GLC">-2</value>
<value compclass="ICE">-2</value>
<value compclass="ROF">-2</value>
<value compclass="LND">-2</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

And see if that works and is fast enough.
You'll also need to set ROOTPE differently:

<entry id="ROOTPE">
<type>integer</type>
<values>
<value compclass="ATM">0</value>
<value compclass="CPL">-1</value>
<value compclass="OCN">-1</value>
<value compclass="WAV">-1</value>
<value compclass="GLC">-1</value>
<value compclass="ICE">-1</value>
<value compclass="ROF">-1</value>
<value compclass="LND">-1</value>
<value compclass="ESP">0</value>
</values>
<desc>ROOTPE (mpi task in MPI_COMM_WORLD) for each component</desc>
</entry>

Also, you should be able to set mfilt to something greater than your run. The history file should just contain the number of time samples corresponding to the length of the run. If you did a restart, that file would be filled until it reached 1825 time samples and then a new file would be initiated.
Hi @oleson

I am using CESM2 to conduct a global simulation and intend to execute a case in which the atmosphere model is coupled with the land model. Consequently, I have a related inquiry regarding node allocation. Thus, I want to confirm whether I could employ the PE layout you recommend.

Note that I run CESM at a 0.9° × 1.25° spatial resolution at a global scale.

Thanks!
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
If you are running active atm and lnd, e.g., a F or B compset, you should start with the pe layout that is generated with your create_newcase. The layout you refer to above is for lnd driven by data atm.
 

mengqi

mj
Member
If you are running active atm and lnd, e.g., a F or B compset, you should start with the pe layout that is generated with your create_newcase. The layout you refer to above is for lnd driven by data atm.
That makes sense! Thank you, @oleson!

If I want to accelerate my simulations (active atm and lnd; compset is B1850) by specifying parameters, could you offer me a few insights? According to previous advice, I may consider revising 'NTASKS', 'NTASKS_PER_INST', 'NTASKS', and 'ROOTPE'. My understanding is right?

Here is relevant information in the env_mach_pes.xml:

<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">-4</value>
<value compclass="CPL">-4</value>
<value compclass="OCN">-2</value>
<value compclass="WAV">-1</value>
<value compclass="GLC">-1</value>
<value compclass="ICE">-1</value>
<value compclass="ROF">-2</value>
<value compclass="LND">-2</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>

<entry id="NTASKS_PER_INST">
<type>integer</type>
<values>
<value compclass="ATM">512</value>
<value compclass="OCN">256</value>
<value compclass="WAV">128</value>
<value compclass="GLC">128</value>
<value compclass="ICE">128</value>
<value compclass="ROF">256</value>
<value compclass="LND">256</value>
<value compclass="ESP">1</value>
</values>
<desc>Number of tasks per instance for each component. DO NOT EDIT: Set automatically by case.setup based on NTASKS, NINST and MULTI_DRIVER</desc>
</entry>

<entry id="NTHRDS">
<type>integer</type>
<values>
<value compclass="ATM">1</value>
<value compclass="CPL">1</value>
<value compclass="OCN">1</value>
<value compclass="WAV">1</value>
<value compclass="GLC">1</value>
<value compclass="ICE">1</value>
<value compclass="ROF">1</value>
<value compclass="LND">1</value>
<value compclass="ESP">1</value>
</values>
<desc>number of threads for each task in each component</desc>
</entry>

<entry id="ROOTPE">
<type>integer</type>
<values>
<value compclass="ATM">0</value>
<value compclass="CPL">0</value>
<value compclass="OCN">-4</value>
<value compclass="WAV">0</value>
<value compclass="GLC">0</value>
<value compclass="ICE">-2</value>
<value compclass="ROF">0</value>
<value compclass="LND">0</value>
<value compclass="ESP">0</value>
</values>
<desc>ROOTPE (mpi task in MPI_COMM_WORLD) for each component</desc>
</entry>

<entry id="MULTI_DRIVER" value="FALSE">
<type>logical</type>
<valid_values>TRUE,FALSE</valid_values>
<desc>MULTI_DRIVER mode provides a separate driver/coupler component for each
ensemble member. All components must have an equal number of members. If
MULTI_DRIVER mode is False prognostic components must have the same number
of members but data or stub components may also have 1 member. </desc>
</entry>

<entry id="NINST">
<type>integer</type>
<values>
<value compclass="ATM">1</value>
<value compclass="OCN">1</value>
<value compclass="WAV">1</value>
<value compclass="GLC">1</value>
<value compclass="ICE">1</value>
<value compclass="ROF">1</value>
<value compclass="LND">1</value>
<value compclass="ESP">1</value>
</values>
<desc>Number of instances for each component. If MULTI_DRIVER is True
the NINST_MAX value will be used.
</desc>
</entry>

<entry id="NINST_LAYOUT">
<type>char</type>
<valid_values>sequential,concurrent</valid_values>
<values>
<value compclass="ATM">concurrent</value>
<value compclass="OCN">concurrent</value>
<value compclass="WAV">concurrent</value>
<value compclass="GLC">concurrent</value>
<value compclass="ICE">concurrent</value>
<value compclass="ROF">concurrent</value>
<value compclass="LND">concurrent</value>
<value compclass="ESP">concurrent</value>
</values>
<desc>Layout of component instances for each component</desc>
</entry>

<entry id="PSTRID">
<type>integer</type>
<values>
<value compclass="ATM">1</value>
<value compclass="CPL">1</value>
<value compclass="OCN">1</value>
<value compclass="WAV">1</value>
<value compclass="GLC">1</value>
<value compclass="ICE">1</value>
<value compclass="ROF">1</value>
<value compclass="LND">1</value>
<value compclass="ESP">1</value>
</values>
<desc>The mpi global processors stride associated with the mpi tasks for the a component</desc>
</entry>
</group>
</file>
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Based on a 1 month run I did with release-cesm2.1.5 for a f19 B1850, you should be able to get about 15 years per day, which seems pretty reasonable. If you want to go faster, I suggest posting on the infrastructure forum, I don't have much experience with fully-coupled PE layouts.
 
Status
Not open for further replies.
Top