Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

serious CAM4 run error(.*** ZM_CONV: IENTROPY: Failed and about to exit, info follow)

Dear All,

I am running the CAM4 for 30 year period(1997-2000) with boundary condition from cam repository file named "sst_HadOIBl_bc_0.47x0.63_1850_2008_c100128.nc". with the following namelist command options:
*********************************************
/shome/2009ast3222/CAM4/ccsm4_0/models/atm/cam/bld/build-namelist -test -config /shome/2009ast3222/CAM4/Phd_Work/CAM4_runs_0.47x0.63_Resolution_30years/bld_0.47x0.63_32P/config_cache.xml -ignore_ic_date -namelist "&camexp start_ymd=19900102 stop_ymd=20001231 start_type='continue' stop_option='ndays' stop_n=4014 nhtfrq=0,-24,-24,-24,-3,-3 ndens=2 mfilt=1,30,30,30,240,240 empty_htapes=.true. sstcyc=.false. bndtvs='/shome/2009ast3222/CAM4/inputdata/atm/cam/sst/sst_HadOIBl_bc_0.47x0.63_1850_2008_c100128.nc' stream_year_first=1990 stream_year_last=2000 fincl1='TREFHT:A','PRECT:A','PS:A','U:A','V:A','OMEGA:A','T:A','Q:A','RELHUM:A','CLOUD:A','TAUX:A','TAUY:A','SNOWHICE:A','SNOWHLND:A','Z3:A','OMEGA500:A','OMEGA850:A','Q200:A','Q850:A','T300:A','T850:A','U200:A','U850:A','V200:A','V850:A','Z300:A','Z500:A','Z700:A' fincl2='TREFHT:A','PRECT:A','PS:A','RELHUM:A','SOLIN:A' fincl3='TREFHT:M' fincl4='TREFHT:X' fincl5='U:A' fincl6='V:A' scenario_ghg='RAMPED' bndtvghg='/shome/2009ast3222/CAM4/inputdata/atm/cam/ggas/ghg_hist_1850-2005_c090419.nc'/"
*********************************************

model simulated up to 1997-01 successfully but after writing the history file for date 1997-02-04 simulation got aborted with the following error message in log file.

********************************************
QNEG4 WARNING from TPHYSAC Max possible LH flx exceeded at 1 points. , Worst excess = -3.6819E-05, lchnk = ***, i = 1, same as indices lat = 37, lon = 49
WSHIST: writing time sample 75 to h-file 5 DATE=1997/02/04 NCSEC= 32400

WSHIST: writing time sample 75 to h-file 6 DATE=1997/02/04 NCSEC= 32400

nstep, te 124291 0.33270310539466891E+10 0.33270308977347960E+10 -0.86574476012283581E-05 0.98518767488017183E+05
BalanceCheck: soil balance error nstep = 124291 point =119246 imbalance = 0.000000 W/m2
clm2: completed timestep 124291
nstep, te 124292 0.33270466828474002E+10 0.33270461802941132E+10 -0.27852088965497001E-04 0.98518794728744615E+05
*** ZM_CONV: IENTROPY: Failed and about to exit, info follows ****
ZM_CONV: IENTROPY. Details: call#,lchnk,icol= 2**** 16 lat: 89.53 lon: 284.38 P(mb)= 138.26 Tfg(K)= 169.56 qt(g/kg) = 0.12 qsat(g/kg) = NaN, s(J/kg) = NaN
ENDRUN:**** ZM_CONV IENTROPY: Tmix did not converge ****
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31[cli_31]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31
rank 31 in job 1 compute-0-12.local_40267 caused collective abort of all ranks
exit status of rank 31: return code 1

***********************************************

i checked the boundary condition file for the particular lat.,lon reported in error but didn't find anything.
please look in to the issue and suggest the required so that i can complete it.

thanking you in anticipation

Ram
Indian Institute of Technology
Delhi-INDIA
 

eaton

CSEG and Liaisons
This problem has been difficult to track down because the IENTROPY error is
just a symptom of the problem. The IENTROPY routine is detecting an
unrealistic atmospheric state which was given to it as input.

We have seen the IENTROPY problem mainly in runs at 1 deg and higher
resolutions. The solution which has been implemented is to add additional
substepping of the vertical remapping operation in the FV dycore. This is
controlled using the namelist variable nspltvrm and was added to code that
was first released with CESM1_0. The default for nspltvrm is 1 in versions
CESM1_0, CESM1_0_1, CESM1_0_2. To try is solution to the IENTROPY failure
one must manually set nspltvrm=2 in the namelist. In CESM1_0_3 and later
the default is nspltvrm=2 for 1/2 deg and finer grids, and so the solution
requires manual setting of nspltvrm only for a 1 deg grid. We have
recently decided that it's more robust to always set nspltvrm=2 for the 1
deg grid as well, so that will be the default setting starting in CESM1_1
which we expect to release in November.
 
Hi eaton,

Is there any way that works for CCSM4?
I'm running CCSM4 at 2 deg resolution with compset F. I got the same problem. But the same problem still exists when I set nspltvrm=2.
Thank you.

Jie
 

eaton

CSEG and Liaisons
The nspltvrm code was introduced in CESM-1.0, so is not available in CCSM-4.0. Note that there are other possible causes for the ientropy failure and substepping the vertical remapping is not a guaranteed solution. But you'll need to use a later release to find out whether it helps or not. Note that all CESM releases have backwards support for the version of the physics package used in CAM-4.0.
 
Hi, I'm running CESM-1.0.4 at 0.5 deg resolution with compset F. I am running into the exact same problem. Obviously nspltvrm =2. What else can I try to solve this problem? 
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
Please check the thread:https://bb.cgd.ucar.edu/convection-fails-very-hot-climatesfor failure in zm_conv.F90 
 
Thanks for reply. I was playing around with the new zm_conv.F90_cesm1_0_beta20 file. If I am using this file instead of the original zm_conv.F90 I get the following error message:forrtl: severe (174): SIGSEGV, segmentation fault occurredImage              PC                         Routine                       Line        Sourceccsm.exe           0000000000E46CAD  tp_core_mp_xtpv_.         469         tp_core.F90ccsm.exe           0000000000E3A353   tp_core_mp_tp2c_.         119         tp_core.F90ccsm.exe           0000000000DFECE6   sw_core_mp_c_sw_.       309         sw_core.F90ccsm.exe           0000000000C99784   cd_core_.R                    732        cd_core.F90ccsm.exe           000000000094B887   dyn_comp_mp_dyn_r       1820       dyn_comp.F90ccsm.exe           0000000000684C48   stepon_mp_stepon_         420        stepon.F90ccsm.exe           000000000049E1D4   cam_comp_mp_cam_r      225        cam_comp.F90ccsm.exe           000000000048A096   atm_comp_mct_mp_a       549        atm_comp_mct.F90ccsm.exe           000000000041056E   ccsm_comp_mod_mp_       2166      ccsm_comp_mod.F90ccsm.exe           00000000004285DD  MAIN__                          91         ccsm_driver.F90ccsm.exe           000000000040E85C   Unknown               Unknown          Unknownlibc.so.6            00007F81A9408CDD  Unknown               Unknown          Unknownccsm.exe           000000000040E759  Unknown               Unknown           Unknown  if I just comment out line 3714 b=b+merge.... I get again the error message sayingENDRUN:**** ZM_CONV IENTROPY: Tmix did not converge ****What is the best way to proceed? Is there any way to start with a neutral cami file for 0.5 model resolutions?
 
Hi,I also have same problem (*** ZM_CONV: IENTROPY: Failed) but when i build namelist by using state_debug_checks =.true.then we found that model abort by

 WSHIST: nhfil(            1 )=camrun.cam.h0.0000-01-01-00000.nc
 Opening netcdf history file camrun.cam.h0.0000-01-01-00000.nc
 Opened file camrun.cam.h0.0000-01-01-00000.nc to write       524288
 H_DEFINE: Successfully opened netcdf file
 nstep, te        1                      -Inf   0.33204137999815559E+10                       NaN                      -Inf
 ERROR: shr_assert_in_domain: state%ps has invalid value 
                       NaN  at location:             1
 Expected value to be a number.
(shr_sys_abort) ERROR: NaN produced in physics_state by package before tphysbc (dycore?
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 16 in communicator MPI_COMM_WORLD
with errorcode 1001.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 ERROR: shr_assert_in_domain: state%ps has invalid value 
                       NaN  at location:             1
 Expected value to be a number.
(shr_sys_abort) ERROR: NaN produced in physics_state by package before tphysbc (dycore?
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping
 ERROR: shr_assert_in_domain: state%ps has invalid value 
                       NaN  at location:             1

 
Top