Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Transient run with error message in NH4 uptake

jiamengl

Jiameng Lai
Member
What version of the code are you using?
ctsm5.1.dev159

Have you made any changes to files in the source tree?
Yes, I made changes in the code, and I am running with C13 on and with transient land use.

Describe every step you took leading up to the problem:
I have made changes to my code, and trying to running with C13 on and with transient land use. My case starts in 1950, runs for ~60 years but ends in 2010-05 with error message below:
car.edu 652: problem with limitations on nh4 uptake 0.000000000000000E+000
dec0268.hsn.de.hpc.ucar.edu 652: -Infinity
dec0268.hsn.de.hpc.ucar.edu 652: ENDRUN:
dec0268.hsn.de.hpc.ucar.edu 652: ERROR: too much NH4 uptake predicted by FUN

I have tried to run a C13-on, constant land cover case (with year 2000 fsurdat) with all the code change set as the same as this transient land use, and this case runs smoothly from 1950 to 2014. Here is what I added in user_nl_clm for the transient run when compared with the constant land cover run:
use_lch4=.false.
fsurdat = '/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_1.9x2.5_hist_78pfts_CMIP6_simyr1850_c190304.nc'
use_init_interp = .true.
flanduse_timeseries= '/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/landuse.timeseries_1.9x2.5_hist_78pfts_CMIP6_simyr1850-2015_c170824.nc'

I turned ch4 off in the transient run as I met some errors in CH4 mode, but CH4 was not what I intend to study.
My code change has nothing to do with NH4 or FUN. As the case with constant land cover runs smoothly, I suspect the issue is relevant to land cover change, but I have no clue how to fix it.

The cesm.log file is too large to upload, the location is: /glade/derecho/scratch/jiamengl/C13.gm.global.lu/run/cesm.log.1087851.desched1.250705-040251
 

slevis

Moderator
Staff member
Here are some troubleshooting ideas:
1) I'm assuming that the same simulation without your code modifications works. If you have not confirmed that, you may wish to do so to avoid looking for a problem in the wrong place.
2) So, again, assuming that the problem originates in your code modifications, I see a few lines below your ERROR message the following information:
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 00000000010382BD shr_abort_mod_mp_ 114 shr_abort_mod.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 00000000005D892F abortutils_mp_end 55 abortutils.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 0000000000B67E8B soilbiogeochemcom 807 SoilBiogeochemCompetitionMod.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 0000000000E397FD cndrivermod_mp_cn 476 CNDriverMod.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 000000000086453B cnvegetationfacad 1007 CNVegetationFacade.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 00000000005E86BF clm_driver_mp_clm 1028 clm_driver.F90

dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 000000000058F33E lnd_comp_nuopc_mp 904 lnd_comp_nuopc.F90
...which tells us exactly the line of code where the model stops. I opened the file SoilBiogeochemCompetitionMod.F90 and near line 807 I see that the variable that reports "-Infinity" is smin_nh4_to_plant_vr. This may seem unhelpful at first, but you should be able to follow through the code how this variable ended up with "-Infinity" (such as a division by zero somewhere?) as a result of your code modifications.
 

jiamengl

Jiameng Lai
Member
Here are some troubleshooting ideas:
1) I'm assuming that the same simulation without your code modifications works. If you have not confirmed that, you may wish to do so to avoid looking for a problem in the wrong place.
2) So, again, assuming that the problem originates in your code modifications, I see a few lines below your ERROR message the following information:
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 00000000010382BD shr_abort_mod_mp_ 114 shr_abort_mod.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 00000000005D892F abortutils_mp_end 55 abortutils.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 0000000000B67E8B soilbiogeochemcom 807 SoilBiogeochemCompetitionMod.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 0000000000E397FD cndrivermod_mp_cn 476 CNDriverMod.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 000000000086453B cnvegetationfacad 1007 CNVegetationFacade.F90
dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 00000000005E86BF clm_driver_mp_clm 1028 clm_driver.F90

dec0268.hsn.de.hpc.ucar.edu 652: cesm.exe 000000000058F33E lnd_comp_nuopc_mp 904 lnd_comp_nuopc.F90
...which tells us exactly the line of code where the model stops. I opened the file SoilBiogeochemCompetitionMod.F90 and near line 807 I see that the variable that reports "-Infinity" is smin_nh4_to_plant_vr. This may seem unhelpful at first, but you should be able to follow through the code how this variable ended up with "-Infinity" (such as a division by zero somewhere?) as a result of your code modifications.
Thanks for the suggestion. I followed through the code and found that the Infinity value can be traced to the variable decomp_cpools_vr_col. However, the first time this variable became Infinity appears after a call of restartvar in SoilBiogeochemCarbonStateType.F90, and I don't understand why. Below is the code that calls this function.

ptr2d => this%decomp_cpools_vr_col(:,:,k)
call restartvar(ncid=ncid, flag=flag, varname=trim(varname)//"_vr", xtype=ncd_double, & dim1name='column', dim2name='levgrnd', switchdim=.true., & long_name='', units='g/m3', fill_value=spval, & scale_by_thickness=.false., & interpinic_flag='interp', readvar=readvar, data=ptr2d)

To debug, I stored the values of decomp_cpools_vr_col before calling restartvat into a temporary variable, and then after calling restartvar, I output the temporary variable and decomp_cpools_vr_col. While the temporary variable has a reasonable value (either 200.00 or 0.00), the decomp_cpools_vr_col became -Infinity. I have no idea why calling this function would lead to an infinite value. Could you please help on this?
 

slevis

Moderator
Staff member
It would be interesting to know whether the "call restartvar" is reading or writing (determined by the argument "flag") when this happens.

But also you did not address whether this is only a problem with the version that includes your changes or whether you see the same behavior with the unchanged version.
 

jiamengl

Jiameng Lai
Member
It would be interesting to know whether the "call restartvar" is reading or writing (determined by the argument "flag") when this happens.

But also you did not address whether this is only a problem with the version that includes your changes or whether you see the same behavior with the unchanged version.
The 'call restartvar' here is reading. Sorry I forgot to mention, this problem only appears with the version including my change and with transient land cover; there is no error in the unchanged version.
 

slevis

Moderator
Staff member
Thank you, it is very helpful to know that the problem appears only in the version that includes your change. This way you can focus your troubleshooting on just your changes:
- If there's a way to introduce your changes one at-a-time instead of all at once, you may gain insight into the source of the problem more easily.
- In a post above I mentioned a variable that was reported as "-Infinity" and often this can originate in a division by zero somewhere.
 

jiamengl

Jiameng Lai
Member
Thank you, it is very helpful to know that the problem appears only in the version that includes your change. This way you can focus your troubleshooting on just your changes:
- If there's a way to introduce your changes one at-a-time instead of all at once, you may gain insight into the source of the problem more easily.
- In a post above I mentioned a variable that was reported as "-Infinity" and often this can originate in a division by zero somewhere.
Thanks for the suggestion. I am wondering if you have any clue why the call of restartvar (reading) would lead to the '-Infinity'? The variable value is normal before this call, and my code does not involve any changes about the restartvar and SoilBiogeochemCarbonStateType.F90 where this function is called.
 

slevis

Moderator
Staff member
But I do have a thought now:
If (1) the variable is not infinity before the call and it's infinity after the call and (2) the call is in the "read" phase, then the infinity likely comes from the restart file. This file is generated by the model to allow bit-for-bit restarts. Is this a restart file generated by your modified code or a default restart file provided with the model?
 

jiamengl

Jiameng Lai
Member
But I do have a thought now:
If (1) the variable is not infinity before the call and it's infinity after the call and (2) the call is in the "read" phase, then the infinity likely comes from the restart file. This file is generated by the model to allow bit-for-bit restarts. Is this a restart file generated by your modified code or a default restart file provided with the model?
Will the model write/read restart file every time step or only when a .r is generated? Because the error message did not occur in the first time step after a RESUBMIT. The case has been resubmitted several times before the error message, so I think it is using a generated restart file which means my code could have changed the variables written into the restart file.
 

slevis

Moderator
Staff member
The user controls how frequently the model writes restart (...clm2.r...) files by changing REST_N in env_run.xml. The user also controls how often the model reads restart files by changing STOP_N and RESUBMIT. I agree with you that your code likely introduced one or more infinity value(s) in the last restart file before the error appeared.
 

jiamengl

Jiameng Lai
Member
The user controls how frequently the model writes restart (...clm2.r...) files by changing REST_N in env_run.xml. The user also controls how often the model reads restart files by changing STOP_N and RESUBMIT. I agree with you that your code likely introduced one or more infinity value(s) in the last restart file before the error appeared.
Thanks. This run starts in 2010-01-01 and the error occurs in 2010-05 (the STOP_N is 13, and REST_N is 8). I checked the rpointer file and found it was using the restart file on 2010-01-01 and was reading the variable 'soil2_vr' in this call of restartvar. However, when I downloaded the restart file, I found there were no 'Infinity' or abnormal values in the variable 'soil2_vr', instead, all the elements of the matrix 'soil2_vr' are zero in the restart file. So I am confused how an 'infinity' value appeared after reading this variable from restart file...
 

slevis

Moderator
Staff member
To make sure that I understood correctly:
The run started in 2010-01-01 and crashed in 2010-05, but soil2_vr already showed infinity value(s) at 2010-01-01. I do not know why this would happen after reading the restart file if the same file does not show infinity values for this variable. I think I would search all the variables in the file for infinity, just in case. I might also go back to the simulation that generated the restart file and look for infinity in variables in that simulation.
 

slevis

Moderator
Staff member
I might also add "write" statements in the vicinity of your code changes, since we know that the model works correctly without your changes.
 
Top