Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Errors when CLM is outputing variables

Hi All,


I am running CESM1.0.4 (I1850CN). After a month run, I got the following error messages from ccsm.log

pio_support::pio_die:: myrank= -1 : ERROR: box_rearrange.F90:
692 : box_rearrange_comp2io: size(compbuf)= 478
not equal to size(compdof)= 488
pio_support::pio_die:: myrank= -1 : ERROR: box_rearrange.F90:
692 : box_rearrange_comp2io: size(compbuf)= 476
not equal to size(compdof)= 488
pio_support::pio_die:: myrank= -1 : ERROR: box_rearrange.F90:
692 : box_rearrange_comp2io: size(compbuf)= 475
not equal to size(compdof)= 488
pio_support::pio_die:: myrank= -1 : ERROR: box_rearrange.F90:


-----------------
I quite guess that it is an error when the model is writing the outputs into the nc file, becase
1. the nc file is created (but only some variables are wrriten into, such as longxy, area..etc; no model outputs) before the model aborted.
2. " hist_htapes_wrapup : Writing current time sample to local history file" is printed in lnd.log file, but "the "hist_htapes_wrapup : Closing local history file" is not printed to lnd.log file.


I can't find the reason. Thanks for your help.

The clm.buildnml.csh is

&clm_inparm
co2_type = 'diagnostic'
create_crop_landunit = .false.
dtime = 1800
fatmgrid = '$DIN_LOC_ROOT/lnd/clm2/griddata/griddata_0360x0720_c120203.nc'
fatmlndfrc = '$DIN_LOC_ROOT/lnd/clm2/griddata/fracdata_0360x0720_c120204.nc'
finidat = ''
fpftcon = '$DIN_LOC_ROOT/lnd/clm2/pftdata/pft-physiology.c110425.nc'
frivinp_rtm = '$DIN_LOC_ROOT/lnd/clm2/rtmdata/rdirc_0.5x0.5_simyr2000_c101124.nc'
fsnowaging = '$DIN_LOC_ROOT/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc'
fsnowoptics = '$DIN_LOC_ROOT/lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc'
fsurdat = '$DIN_LOC_ROOT/lnd/MsTMIP/surfdata_0.5x0.5_simyr1801_c05212012.nc'
ice_runoff = .true.
rtm_nsteps = 12
urban_hac = 'ON_WASTEHEAT'
urban_traffic = .false.
/
&ndepdyn_nml
ndepmapalgo = 'bilinear'
stream_fldfilename_ndep = '/fndep_clm_hist_simyr1849-2006_1.9x2.5_c100428.nc'
stream_year_first_ndep = 1901
stream_year_last_ndep = 1901
/
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
Hi ThunderHM

Yes, you've diagnosed the issue to writing the NetCDF from CLM. The error is in running PIO (Parallel Input/Output) from CLM to write the file. This usually indicates some type of problem with the decomposition and passing data on the CLM decomposition to the decomposition for PIO. This might be a problem with the specific number of processors used, or the specific number of PIO processors used. You might be able to get around it, simply by changing either the number of processors to run the model, or by changing the number of processors that PIO uses. It could also be an issue with using PNETCDF.

We need more details on your setup. Give us the machine/compiler, number of processors for land, and information from the env*.xml files for use of PIO. I'll have the expert in PIO (Jim Edwards) take a look at this once you give us more information.
 
Hi Erik,
Thank you so much.

The machine I'm using is PNNL Olympus cluster (Linux), and the compiler is pgi 11.8
The number of processors for land is:









































The information from the env_run.xlm file for use of PIO is:

























































































The PIO information from the env_build.xml is:






Thanks for your help again.

Huimin Lei
 
Hi Jim

I followed your instructions, but the repository UUID doesn't match expected UUID.So I removed the pio directory, and used the "svn co" command to download the new pio you suggest.
Unfortunately, it still does not work and printed out the same error messages.
Then, I did some more tests.
(1)I changed the number of outputs by setting the clm.buildnml.csh (as a test, only TG will be output).
Then, the monthly history file can be successfully generated. However, the model aborted again with the same error messages after
(in lnd.log file)
" htape_create : Opening netcdf rhtape
./clmrun.clm2.rh0.1902-01-01-00000.nc
htape_create : Successfully defined netcdf restart history file 1"

(2)I changed the node to 1, it still does not work.

Are there some possible problems in setting the machine files such as Macro.mach ?

Thanks again!
 

jedwards

CSEG and Liaisons
Staff member
I think that the issue is that this forum is reformatting the tag name, here it is again:

[pre]http://parallelio.googlecode.com/svn/branches/cesm1_0_5_rel/pio[/pre]

is this the one that you tried?
 
Hi Jim,

Yes, it is. I'm sure that the pio was updated from your URL.
I attached the ccsm.log for you.

I tried different NTASKS in env_mach_pes.xml as Erik suggested.
I found that it worked when NTASKS
 

jedwards

CSEG and Liaisons
Staff member
I haven't been able to reproduce your issue here. What version of netcdf are you using? Can you try netcdf 4.2 if you are not already using it?
 
Hi Jim,
I fixed the problem. The error is due to that the resolution of the surface data I used is not equal to the model resolution I set.
It is really hard to find out the error is from here. But I still don't know why the model can not display the error when it is reading the surface data.

Thanks again.
Best Wishes.
 
Top