serious CAM4 run error(.*** ZM_CONV: IENTROPY: Failed and about to exit, info follow)

cam_iitd10@gmail_com · Feb 13, 2012

Dear All,

I am running the CAM4 for 30 year period(1997-2000) with boundary condition from cam repository file named "sst_HadOIBl_bc_0.47x0.63_1850_2008_c100128.nc". with the following namelist command options:
*********************************************
/shome/2009ast3222/CAM4/ccsm4_0/models/atm/cam/bld/build-namelist -test -config /shome/2009ast3222/CAM4/Phd_Work/CAM4_runs_0.47x0.63_Resolution_30years/bld_0.47x0.63_32P/config_cache.xml -ignore_ic_date -namelist "&camexp start_ymd=19900102 stop_ymd=20001231 start_type='continue' stop_option='ndays' stop_n=4014 nhtfrq=0,-24,-24,-24,-3,-3 ndens=2 mfilt=1,30,30,30,240,240 empty_htapes=.true. sstcyc=.false. bndtvs='/shome/2009ast3222/CAM4/inputdata/atm/cam/sst/sst_HadOIBl_bc_0.47x0.63_1850_2008_c100128.nc' stream_year_first=1990 stream_year_last=2000 fincl1='TREFHT:A','PRECT:A','PS:A','U:A','V:A','OMEGA:A','T:A','Q:A','RELHUM:A','CLOUD:A','TAUX:A','TAUY:A','SNOWHICE:A','SNOWHLND:A','Z3:A','OMEGA500:A','OMEGA850:A','Q200:A','Q850:A','T300:A','T850:A','U200:A','U850:A','V200:A','V850:A','Z300:A','Z500:A','Z700:A' fincl2='TREFHT:A','PRECT:A','PS:A','RELHUM:A','SOLIN:A' fincl3='TREFHT:M' fincl4='TREFHT:X' fincl5='U:A' fincl6='V:A' scenario_ghg='RAMPED' bndtvghg='/shome/2009ast3222/CAM4/inputdata/atm/cam/ggas/ghg_hist_1850-2005_c090419.nc'/"
*********************************************

model simulated up to 1997-01 successfully but after writing the history file for date 1997-02-04 simulation got aborted with the following error message in log file.

********************************************
QNEG4 WARNING from TPHYSAC Max possible LH flx exceeded at 1 points. , Worst excess = -3.6819E-05, lchnk = ***, i = 1, same as indices lat = 37, lon = 49
WSHIST: writing time sample 75 to h-file 5 DATE=1997/02/04 NCSEC= 32400

WSHIST: writing time sample 75 to h-file 6 DATE=1997/02/04 NCSEC= 32400

nstep, te 124291 0.33270310539466891E+10 0.33270308977347960E+10 -0.86574476012283581E-05 0.98518767488017183E+05
BalanceCheck: soil balance error nstep = 124291 point =119246 imbalance = 0.000000 W/m2
clm2: completed timestep 124291
nstep, te 124292 0.33270466828474002E+10 0.33270461802941132E+10 -0.27852088965497001E-04 0.98518794728744615E+05
*** ZM_CONV: IENTROPY: Failed and about to exit, info follows ****
ZM_CONV: IENTROPY. Details: call#,lchnk,icol= 2**** 16 lat: 89.53 lon: 284.38 P(mb)= 138.26 Tfg(K)= 169.56 qt(g/kg) = 0.12 qsat(g/kg) = NaN, s(J/kg) = NaN
ENDRUN:**** ZM_CONV IENTROPY: Tmix did not converge ****
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31[cli_31]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31
rank 31 in job 1 compute-0-12.local_40267 caused collective abort of all ranks
exit status of rank 31: return code 1
***********************************************

i checked the boundary condition file for the particular lat.,lon reported in error but didn't find anything.
please look in to the issue and suggest the required so that i can complete it.

thanking you in anticipation

Ram
Indian Institute of Technology
Delhi-INDIA

eaton · Oct 16, 2012

This problem has been difficult to track down because the IENTROPY error is
just a symptom of the problem. The IENTROPY routine is detecting an
unrealistic atmospheric state which was given to it as input.

We have seen the IENTROPY problem mainly in runs at 1 deg and higher
resolutions. The solution which has been implemented is to add additional
substepping of the vertical remapping operation in the FV dycore. This is
controlled using the namelist variable nspltvrm and was added to code that
was first released with CESM1_0. The default for nspltvrm is 1 in versions
CESM1_0, CESM1_0_1, CESM1_0_2. To try is solution to the IENTROPY failure
one must manually set nspltvrm=2 in the namelist. In CESM1_0_3 and later
the default is nspltvrm=2 for 1/2 deg and finer grids, and so the solution
requires manual setting of nspltvrm only for a 1 deg grid. We have
recently decided that it's more robust to always set nspltvrm=2 for the 1
deg grid as well, so that will be the default setting starting in CESM1_1
which we expect to release in November.

jhe@rsmas_miami_edu · Oct 17, 2012

Hi eaton,

Is there any way that works for CCSM4?
I'm running CCSM4 at 2 deg resolution with compset F. I got the same problem. But the same problem still exists when I set nspltvrm=2.
Thank you.

Jie

eaton · Oct 29, 2012

The nspltvrm code was introduced in CESM-1.0, so is not available in CCSM-4.0. Note that there are other possible causes for the ientropy failure and substepping the vertical remapping is not a guaranteed solution. But you'll need to use a later release to find out whether it helps or not. Note that all CESM releases have backwards support for the version of the physics package used in CAM-4.0.

nadinsel@uchicago_edu · Aug 7, 2013

Hi, I'm running CESM-1.0.4 at 0.5 deg resolution with compset F. I am running into the exact same problem. Obviously nspltvrm =2. What else can I try to solve this problem?

hannay · Aug 7, 2013

Please check the thread:https://bb.cgd.ucar.edu/convection-fails-very-hot-climatesfor failure in zm_conv.F90

nadinsel@uchicago_edu · Aug 12, 2013

Thanks for reply. I was playing around with the new zm_conv.F90_cesm1_0_beta20 file. If I am using this file instead of the original zm_conv.F90 I get the following error message:forrtl: severe (174): SIGSEGV, segmentation fault occurredImage PC Routine Line Sourceccsm.exe 0000000000E46CAD tp_core_mp_xtpv_. 469 tp_core.F90ccsm.exe 0000000000E3A353 tp_core_mp_tp2c_. 119 tp_core.F90ccsm.exe 0000000000DFECE6 sw_core_mp_c_sw_. 309 sw_core.F90ccsm.exe 0000000000C99784 cd_core_.R 732 cd_core.F90ccsm.exe 000000000094B887 dyn_comp_mp_dyn_r 1820 dyn_comp.F90ccsm.exe 0000000000684C48 stepon_mp_stepon_ 420 stepon.F90ccsm.exe 000000000049E1D4 cam_comp_mp_cam_r 225 cam_comp.F90ccsm.exe 000000000048A096 atm_comp_mct_mp_a 549 atm_comp_mct.F90ccsm.exe 000000000041056E ccsm_comp_mod_mp_ 2166 ccsm_comp_mod.F90ccsm.exe 00000000004285DD MAIN__ 91 ccsm_driver.F90ccsm.exe 000000000040E85C Unknown Unknown Unknownlibc.so.6 00007F81A9408CDD Unknown Unknown Unknownccsm.exe 000000000040E759 Unknown Unknown Unknown if I just comment out line 3714 b=b+merge.... I get again the error message sayingENDRUN:**** ZM_CONV IENTROPY: Tmix did not converge ****What is the best way to proceed? Is there any way to start with a neutral cami file for 0.5 model resolutions?

rajuphyamu@gmail_com · Jul 24, 2017

Hi,I also have same problem (*** ZM_CONV: IENTROPY: Failed) but when i build namelist by using state_debug_checks =.true.then we found that model abort by

WSHIST: nhfil(            1 )=camrun.cam.h0.0000-01-01-00000.nc
Opening netcdf history file camrun.cam.h0.0000-01-01-00000.nc
Opened file camrun.cam.h0.0000-01-01-00000.nc to write       524288
H_DEFINE: Successfully opened netcdf file
nstep, te        1                      -Inf   0.33204137999815559E+10                       NaN                      -Inf
ERROR: shr_assert_in_domain: state%ps has invalid value
                       NaN at location:             1
Expected value to be a number.
(shr_sys_abort) ERROR: NaN produced in physics_state by package before tphysbc (dycore?
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 16 in communicator MPI_COMM_WORLD
with errorcode 1001.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
ERROR: shr_assert_in_domain: state%ps has invalid value
                       NaN at location:             1
Expected value to be a number.
(shr_sys_abort) ERROR: NaN produced in physics_state by package before tphysbc (dycore?
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping
ERROR: shr_assert_in_domain: state%ps has invalid value
                       NaN at location:             1

serious CAM4 run error(.*** ZM_CONV: IENTROPY: Failed and about to exit, info follow)

cam_iitd10@gmail_com

Member

eaton

CSEG and Liaisons

jhe@rsmas_miami_edu

New Member

eaton

CSEG and Liaisons

nadinsel@uchicago_edu

New Member

hannay

Cecile Hannay

AMWG Liaison

nadinsel@uchicago_edu

New Member

rajuphyamu@gmail_com

New Member