Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

finidat interpolation seems to be slower when I add more CPU's?

plichtig

Pablo Lichtig
New Member
Hi all,
I am currently running CESM2.2.0, with a refined grid over South America. I am trying to spin up CLM for the new grid using the FHIST compset and running a year in 2018, as stated in the MUSICA wiki.
What seems surprising to me is that the interpolation for finidat seems to be much slower when I assign more tasks for my simulation. I was wondering whether this speaks of a mistake in my namelist or a bug in the model. I compiled the model with Intel-oneapi compilers and OpenMPI.

my user_nl_clm is bellow. Let me know which other data might help debug this problem.
Thanks
Pablo

! Users should add all user specific namelist changes below in the form of
! namelist_var = new_namelist_value
!
! EXCEPTIONS:
! Set use_cndv by the compset you use and the CLM_BLDNML_OPTS -dynamic_vegetation setting
! Set use_vichydro by the compset you use and the CLM_BLDNML_OPTS -vichydro setting
! Set use_cn by the compset you use and CLM_BLDNML_OPTS -bgc setting
! Set use_crop by the compset you use and CLM_BLDNML_OPTS -crop setting
! Set spinup_state by the CLM_BLDNML_OPTS -bgc_spinup setting
! Set co2_ppmv with CCSM_CO2_PPMV option
! Set dtime with L_NCPL option
! Set fatmlndfrc with LND_DOMAIN_PATH/LND_DOMAIN_FILE options
! Set finidat with RUN_REFCASE/RUN_REFDATE/RUN_REFTOD options for hybrid or branch cases
! (includes $inst_string for multi-ensemble cases)
! or with CLM_FORCE_COLDSTART to do a cold start
! or set it with an explicit filename here.
! Set maxpatch_glcmec with GLC_NEC option
! Set glc_do_dynglacier with GLC_TWO_WAY_COUPLING env variable
!----------------------------------------------------------------------------------

flanduse_timeseries = '/scratch/m/m300788/cesm/inputdata/grids/ne0np4.SAMwrf01.ne30x4//clm_surfdata_5_0/landuse.timeseries_ne0np4.SAMwrf01.ne30x4_hist_16pfts_Irrig_CMIP6_simyr1850-2015_c200921.nc'

!===================================================
! Comment out the fsurdat file you need to use:
!===================================================
! fsurdat = '/scratch/m/m300788/cesm/inputdata/grids/ne0np4.SAMwrf01.ne30x4//clm_surfdata_5_0/surfdata_ne0np4.SAMwrf01.ne30x4_hist_16pfts_Irrig_CMIP6_simyr2000_c200921.nc'
fsurdat = '/scratch/m/m300788/cesm/inputdata/grids/ne0np4.SAMwrf01.ne30x4//clm_surfdata_5_0/surfdata_ne0np4.SAMwrf01.ne30x4_hist_16pfts_Irrig_CMIP6_simyr1850_c200921.nc
 

plichtig

Pablo Lichtig
New Member
I was late to edit and add some data: I am doing 2 supposedly identical runs. One has 1280 tasks, and finished inteerpolating in about 2:30 hs. The other one, with 3840 tasks, has been running for 6:30 hs and has not finished yet.

Here I add as well my lnd_in

&clm_inparm
albice = 0.50,0.30
co2_type = 'diagnostic'
collapse_urban = .false.
create_crop_landunit = .true.
crop_fsat_equals_zero = .false.
fatmlndfrc = '/scratch/m/m300788/cesm/inputdata/grids/ne0np4.SAMwrf01.ne30x4/domains/domain.lnd.ne0np4.SAMwrf01.ne30x4_tx0.1v2.200921.nc'
finidat = '/scratch/m/m300788/cesm/inputdata/lnd/clm2/initdata_map/clmi.F2000.2000-01-01.ne120pg3_mt13_simyr2000_c200728.nc'
fsnowaging = '/scratch/m/m300788/cesm/inputdata/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc'
fsnowoptics = '/scratch/m/m300788/cesm/inputdata/lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc'
fsurdat = '/scratch/m/m300788/cesm/inputdata/grids/ne0np4.SAMwrf01.ne30x4//clm_surfdata_5_0/surfdata_ne0np4.SAMwrf01.ne30x4_hist_16pfts_Irrig_CMIP6_simyr1850_c200921.nc'
glc_do_dynglacier = .true.
glc_snow_persistence_max_days = 0
h2osno_max = 10000.0
irrigate = .true.
maxpatch_glcmec = 10
maxpatch_pft = 17
n_dom_landunits = 0
n_dom_pfts = 0
nlevsno = 12
nsegspc = 35
paramfile = '/scratch/m/m300788/cesm/inputdata/lnd/clm2/paramdata/clm5_params.c200624.nc'
run_zero_weight_urban = .false.
snow_cover_fraction_method = 'SwensonLawrence2012'
soil_layerstruct_predefined = '20SL_8.5m'
toosmall_crop = 0.d00
toosmall_glacier = 0.d00
toosmall_lake = 0.d00
toosmall_soil = 0.d00
toosmall_urban = 0.d00
toosmall_wetland = 0.d00
use_bedrock = .true.
use_century_decomp = .false.
use_cn = .false.
use_crop = .false.
use_dynroot = .false.
use_fates = .false.
use_fertilizer = .false.
use_fun = .false.
use_grainproduct = .false.
use_hydrstress = .true.
use_init_interp = .true.
use_lai_streams = .false.
use_lch4 = .false.
use_luna = .true.
use_nitrif_denitrif = .false.
use_soil_moisture_streams = .false.
use_subgrid_fluxes = .true.
use_vertsoilc = .false.
/
&ndepdyn_nml
/
&popd_streams
/
&urbantv_streams
model_year_align_urbantv = 1850
stream_fldfilename_urbantv = '/scratch/m/m300788/cesm/inputdata/lnd/clm2/urbandata/CLM50_tbuildmax_Oleson_2016_0.9x1.25_simyr1849-2106_c160923.nc'
stream_year_first_urbantv = 1850
stream_year_last_urbantv = 2106
urbantvmapalgo = 'nn'
/
&light_streams
/
&soil_moisture_streams
/
&lai_streams
/
&atm2lnd_inparm
glcmec_downscale_longwave = .true.
lapse_rate = 0.006
lapse_rate_longwave = 0.032
longwave_downscaling_limit = 0.5
precip_repartition_glc_all_rain_t = 0.
precip_repartition_glc_all_snow_t = -2.
precip_repartition_nonglc_all_rain_t = 2.
precip_repartition_nonglc_all_snow_t = 0.
repartition_rain_snow = .true.
/
&lnd2atm_inparm
melt_non_icesheet_ice_runoff = .true.
/
&clm_canopyhydrology_inparm
interception_fraction = 1.0
maximum_leaf_wetted_fraction = 0.05
use_clm5_fpi = .true.
/
&cnphenology
/
&clm_soilhydrology_inparm
/
&dynamic_subgrid
do_transient_crops = .true.
do_transient_pfts = .true.
flanduse_timeseries = '/scratch/m/m300788/cesm/inputdata/grids/ne0np4.SAMwrf01.ne30x4//clm_surfdata_5_0/landuse.timeseries_ne0np4.SAMwrf01.ne30x4_hist_16pfts_Irrig_CMIP6_simyr1850-2015_c200921.nc'
reset_dynbal_baselines = .false.
/
&cnvegcarbonstate
/
&finidat_consistency_checks
/
&dynpft_consistency_checks
/
&clm_initinterp_inparm
init_interp_method = 'general'
/
&century_soilbgcdecompcascade
/
&soilhydrology_inparm
baseflow_scalar = 0.001d00
/
&luna
jmaxb1 = 0.17
/
&friction_velocity
zetamaxstable = 0.5d00
/
&mineral_nitrogen_dynamics
/
&soilwater_movement_inparm
dtmin = 60.
expensive = 42
flux_calculation = 1
inexpensive = 1
lower_boundary_condition = 2
soilwater_movement_method = 1
upper_boundary_condition = 1
verysmall = 1.e-8
xtolerlower = 1.e-2
xtolerupper = 1.e-1
/
&rooting_profile_inparm
rooting_profile_method_carbon = 1
rooting_profile_method_water = 1
/
&soil_resis_inparm
soil_resis_method = 1
/
&bgc_shared
/
&canopyfluxes_inparm
itmax_canopy_fluxes = 40
use_undercanopy_stability = .false.
/
&aerosol
fresh_snw_rds_max = 204.526d00
/
&clmu_inparm
building_temp_method = 1
urban_hac = 'ON_WASTEHEAT'
urban_traffic = .false.
/
&clm_soilstate_inparm
organic_frac_squared = .false.
/
&clm_nitrogen
lnc_opt = .false.
/
&clm_snowhydrology_inparm
lotmp_snowdensity_method = 'Slater2017'
reset_snow = .false.
reset_snow_glc = .false.
reset_snow_glc_ela = 1.e9
snow_dzmax_l_1 = 0.03d00
snow_dzmax_l_2 = 0.07d00
snow_dzmax_u_1 = 0.02d00
snow_dzmax_u_2 = 0.05d00
snow_dzmin_1 = 0.010d00
snow_dzmin_2 = 0.015d00
snow_overburden_compaction_method = 'Vionnet2012'
upplim_destruct_metamorph = 175.d00
wind_dependent_snow_density = .true.
/
&cnprecision_inparm
/
&clm_glacier_behavior
glacier_region_behavior = 'single_at_atm_topo','virtual','virtual','multiple'
glacier_region_ice_runoff_behavior = 'melted','melted','remains_ice','remains_ice'
glacier_region_melt_behavior = 'remains_in_place','replaced_by_ice','replaced_by_ice','replaced_by_ice'
/
&crop
/
&irrigation_inparm
irrig_depth = 0.6
irrig_length = 14400
irrig_method_default = 'drip'
irrig_min_lai = 0.0
irrig_start_time = 21600
irrig_target_smp = -3400.
irrig_threshold_fraction = 1.0
limit_irrigation_if_rof_enabled = .false.
use_groundwater_irrigation = .false.
/
&surfacealbedo_inparm
snowveg_affects_radiation = .true.
/
&water_tracers_inparm
enable_water_isotopes = .false.
enable_water_tracer_consistency_checks = .false.
/
&clm_humanindex_inparm
calc_human_stress_indices = 'FAST'
/
&cnmresp_inparm
/
&photosyns_inparm
leafresp_method = 0
light_inhibit = .true.
modifyphoto_and_lmr_forcrop = .true.
rootstem_acc = .false.
stomatalcond_method = 'Medlyn2011'
/
&cnfire_inparm
/
&cn_general
/
&nitrif_inparm
/
&lifire_inparm
/
&ch4finundated
/
&clm_canopy_inparm
leaf_mr_vcm = 0.015d00
/
&scf_swenson_lawrence_2012_inparm
int_snow_max = 2000.
n_melt_glcmec = 10.0d00
/
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
Hmmm. I agree this is not expected. In general MPI programs do have a limit where adding processors doesn't help (and actually hurts), because there is more time spent on communication than on the actual computation task. So in general finding this behavior for a MPI program is not necessarily unexpected.

In this case the initial interpolation doesn't do communication between processors for the algorithm itself. But, there is parallel I/O and communication associated with it which I'm thinking is the problem here. It does read in the finidat file and then write it out again after it's done the interpolation. If you aren't building with Parallel-NetCDF (pnetcdf) that could be part of the slowdown. But, you also might be beyond the limit of the number of I/O processors that can be efficient for this case. By, default it doesn't use all tasks for I/O (typically only using a quarter of them), but the optimal number of I/O tasks does depend on the specific machine and how it's hardware is configured.

Also only part of the interpolation is threaded, so if you are using a different number of threads between the two cases that is one thing to wonder about.

To really evaluate this I'd want to see a curve of several different number of processor counts, starting at a much lower number. There might be limitations where you can't go down to one processor, but starting at the smallest number would be a good start. What you might see with a curve of many points is that it gets faster as tasks increase, but then starts to worsen at some point. So figuring out that point where it worsens would be helpful to see. The point where that happens is going to be specific to both your specific grid, your compute hardware, and whether built with pnetcdf or not.
 

plichtig

Pablo Lichtig
New Member
I will try it and let you know as soon as I have time (probably increasing by 16 cpus at a time). I just checked and I think that I am running it with parallel-netcdf (at least, I am loading the pnetcdf module in my config-machines), but it might not be working.
I guess a workaround would be to just run it for one day with less CPUs and then point to that finidat_dest.
Another (cleaner) solution would be to check how I can modify the NTASKS until the interpolation is done, since I would think that there is probably no way to optimize it for every architecture in every machine.
Thanks
Pablo
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
It's good to hear that you are loading pnetcdf. But, yes you should check that it's actually being used.

You can't modify the number of NTASKS after the simulation has started. But, yes you are right you could run with a number that's optimal for the interpolation and then save the file for later runs that will use a different number of NTASKS. We are working on making the saving of the interpolated files easier with the latest development version of the model. The optimal number for each operation is going to be different on different machines.

We used to have the interpolation in an offline tool separate from CTSM. In a way creating the interpolated finidat files using CTSM is like that offline tool -- it's something you want to do before your regular simulations.

Yes, having a graph that increases by 16 processors should give you a good idea of the scaling curve and what a good number for interpolation would be. You can do the same for running the model and see what it's scaling is like, which we be totally different from the interpolation at initialization curve.
 
Top