Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CLM memory issue with high resolution surface data

Dear all, I am trying to set up a CLM4.5 offline simulation with high resolution (3x3min) regional grid. The surface data file surfdata_3x3min_Indonesia_simyr2010_c170928.nc is about 3.8GB, which causes the model to stop right after the lnd.log says "sucessfully read surface boundary data" (see below and the attached cesm and lnd log files). To check if the surface data is valid, I used the same surface file and exactly same case settings with a latest version CLM (near 5.0) and it had no problem. After read surface boundary data the lnd.log prints additional lines about  Surface Grid Characteristics and Decomposition Characteristics and the model finished successfully. So the problem seems to be related to the version CLM4.5, its model parallelization, prcoessor layout or else? Last lines of cesm.log Opened existing file /usr/users/yfan1/cesm_input/lnd/clm2/surfdata_map/surfdata_3x3min_Indonesia_sim yr2010_c170928.nc       65536Sep 30 06:02:28 2017 51442 4 10.1 handleTSRegisterTerm(): TS reports task pid on host killed or core dumpedforrtl: severe (41): insufficient virtual memory  Last lines of lnd.logAttempting to read surface boundary data ..... (GETFIL): attempting to find local file surfdata_3x3min_Indonesia_simyr2010_c170928.nc (GETFIL): using /usr/users/yfan1/cesm_input/lnd/clm2/surfdata_map/surfdata_3x3min_Indonesia_simyr2010_c170928.nc check_var: variable xc is not on dataset surfrd_get_data lon_var = LONGXY lat_var =LATIXY domain_clean: cleaning          480         245 Successfully read surface boundary data I also tried the 0.9x1.25 resolution global surface data surfdata_0.9x1.25_simyr2000_c130418.nc (about 500MB) with CLM4.5 and it worked through without memory issue. But when I used another file surfdata_360x720cru_simyr2000_c130927.nc (2.3GB), the same "forrtl: severe (41): insufficient virtual memory" was threw out after reading in the surface data.  So it seems to confirm that CLM4.5 could not handle large surface dataset. To check if the memory issus is caused by model parallelization or prcoessor layout, I tried a series tests with the version CLM4.5. First, I increased the requested nodes and memory per core to 16GB but it still did not help. However the near CLM5 version used the default setting with only 4GB memory per core and it worked well with the 3.8GB 3x3min regional surface data. I notice that there is some difference in the memory alloc/dealloc information in cesm.log from the two versions of model. CLM4.5 does not dealloc the memory (dealloc=0) while CLM5 dealloc most of the memory (see below). Could this be the reason why CLM4.5 gives error "insufficient virtual memory "? Our IT department told me that the CLM4.5 run almost occupied the whole 256GB memory on the node. In CLM4.5, it always prints:8 MB memory   alloc in MB is             8.008 MB memory dealloc in MB is             0.00Memory block size conversion in bytes is          1020.51 In CLM5, it always prints:8 MB memory   alloc in MB is             8.008 MB memory dealloc in MB is             7.77Memory block size conversion in bytes is          3983.19 I wonder if this memory issue with CLM4.5 also observed by other users? Or is it because of some wrong model settings in my case? It will be great if someone with experience in using high resolution grid and large surface data could help point out the probelm. You may wonder why I do not just use the near CLM5 version since it can handle the large surface data. The reason is I implemented a lot of model developments in the CLM4.5 version which cannot be moved to CLM5 in a short period. I did not touch any model files about MPI, PIO or memory parallelization.I will appreciate if anyone could help on this.best regards,Yuanchao    
 
Top