Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

failed CESM run without error message

shandong

Xiao
New Member
I got a failed CESM run, but without an obvious error message. Any suggestions on the cause of this?

Any advice is appreciated!

/gpfs/fs1/scratch/shandong/cnh_neon_16_to_14/run/cesm.log.190731-172826

24: QNEG3 from chemistry/H2SO4:m= 7 lat/lchnk= 412 Min. mixing ratio violated at 2 points. Reset to 1.0E-36 Worst =-1.0E-12 at i,k= 8 20
7: QNEG4 WARNING from TPHYSAC Max possible LH flx exceeded at 1 points. , Worst excess = -8.1622E-03, lchnk = 175, i = 10, same as indices lat = 68, lon = 37
9: QNEG3 from chemistry/SO2:m= 8 lat/lchnk= 204 Min. mixing ratio violated at 1 points. Reset to 1.0E-36 Worst =-1.6E-12 at i,k= 6 23
7: pLCL iteration is negative and set to psmin in uwshcu.F90
7: 1.462250724802063E-003 -206.603388093524 60980.0873614369
23: index: 24635
23: n,kl,ku,m 20 2 2 7
23: dgbsv info: 24635 1
23:
23: ab matrix
23: 1 0.0000000 0.0000000 NaN NaN 0.0000000
23: 2 0.0000000 NaN NaN -0.0728857 0.0000000
23: 3 0.0000000 -0.1418063 1.1296584 -0.0604913 0.0000000
23: 4 0.0000000 -0.0567727 1.1810325 0.0000000 -0.1063048
23: 5 0.0000000 0.0000000 NaN 0.0000000 0.0000000
23: 6 -0.1205412 -73923.0519993 NaN -1.4114534 0.0000000
23: 7 0.0000000 -2.2241969 3.2688183 -0.5197458 0.0000000
23: 8 0.0000000 -0.8573648 1.8354877 -0.1907077 0.0000000
23: 9 0.0000000 -0.3157419 1.3063771 -0.0689999 0.0000000
23:10 0.0000000 -0.1156695 1.1115380 -0.0243720 0.0000000
23:11 0.0000000 -0.0425381 1.0408461 -0.0133050 0.0000000
23:12 0.0000000 -0.0164741 1.0210527 -0.0052288 0.0000000
23:13 0.0000000 -0.0077477 1.0080223 -0.0016714 0.0000000
23:14 0.0000000 -0.0027935 1.0025323 -0.0004641 0.0000000
23:15 0.0000000 -0.0008609 1.0007747 -0.0003648 0.0000000
23:16 0.0000000 -0.0003106 1.0006667 -0.0001831 0.0000000
23:17 0.0000000 -0.0003018 1.0002941 -0.0000674 0.0000000
23:18 0.0000000 -0.0001111 1.0001082 -0.0000248 0.0000000
23:19 0.0000000 -0.0000409 1.0000398 -0.0000121 0.0000000
23:20 0.0000000 -0.0000150 1.0000121 0.0000000 0.0000000
23:
-1:MPT ERROR: MPI_COMM_WORLD rank 23 has terminated without calling MPI_Finalize()
-1: aborting job
 

fischer

CSEG and Liaisons
Staff member
Hi, Like your other issue, I tried looking at your run. But it looks like the scrubber removed several of the files.
But from what I see, it looks like a system issue. Try doing a rerun.

Chris
 

shandong

Xiao
New Member
Sorry, some of the data have been removed.

I just did a re-run with debug ON. The job fails again. Do you mind take a look? Many thanks in advance!


The configuration is like following:
./create_newcase -case /glade/scratch/shandong/cesm_cases/*** -res f19_g16 -user_compset 2000_CAM5_CLM45%BGC_CICE_DOCN%SOM_RTM_SGLC_SWAV -mach cheyenne


The log is here:
/gpfs/fs1/scratch/shandong/cnh_neon_16_to_14/run/cesm.log.200205-205900


case folder:
/gpfs/fs1/scratch/shandong/cesm_cases/cnh_neon_16_to_14
 

fischer

CSEG and Liaisons
Staff member
You exceeded the wallclock limit on the run. It looks like the model was running just fine, but was running slower because of
debugging. Try running again like you normally would. It looks like your problems were caused by system issues on cheyenne.
 
Top