Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Can COURLIM error cause segmentation fault?

CAM seems to be crashing when trying to start the second year. I'm getting this at the end of my output file:
......
0: NSTEP = 17517 8.780811763075404E-05 5.443801409129482E-06 258.073 9.85749E+04 3.599147229307466E+01 1.19 0.20
0: nstep, te 17518 3432624348.59493065 -4.95464360210630606 0.493983685243713799E-03 98574.9381812880456
0:COURLIM: *** Courant limit exceeded at k,lat= 1 36 (estimate = 1.172), solution has been truncated to wavenumber 26 ***
0: *** Original Courant limit exceeded at k,lat= 1 36 (estimate = 1.172) ***
0: NSTEP = 17518 8.780721330020390E-05 5.442824417154486E-06 258.067 9.85749E+04 3.598899561996890E+01 1.17 0.20
0: nstep, te 17519 3432560115.87041569 -4.15802108261320313 0.414559596648222473E-03 98574.9154112783726
0:
0: INICFILE: Writing clm initial conditions dataset at ./petm2240modern.clm2.i.0402-01-01-00000.nc at nstep = 17519
0:
0: (PUTFIL): Issuing shell cmd:(mswrite -t 1825 ./petm2240modern.clm2.i.0402-01-01-00000.nc /THRASHER/csm/petm2240modern/lnd/init/petm2240modern.clm2.i.0402-01-01-00000.nc && /bin/rm ./petm2240modern.clm2.i.0402-01-01-00000.nc )&
0:COURLIM: *** Courant limit exceeded at k,lat= 1 36 (estimate = 1.157), solution has been truncated to wavenumber 26 ***
0: *** Original Courant limit exceeded at k,lat= 1 36 (estimate = 1.157) ***
0: NSTEP = 17519 8.780601420604118E-05 5.433226433585909E-06 258.065 9.85749E+04 3.598916909886925E+01 1.16 0.19
0: nstep, te 17520 3432533197.25453186 -5.67663983450995513 0.565967665459802616E-03 98574.9168762819318
Job /usr/local/lsf/7.0/aix5-64/bin/poejob /ptmp/thrasher/cam_eoc/petm2240modern/bld/cam

TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
===== ========== ================ ======================= ===================
00046 bv1304en.u /ptmp/thrasher/c Exit (139) 02/29/2008 16:17:17
00044 bv1304en.u /ptmp/thrasher/c Exit (139) 02/29/2008 16:17:17
00045 bv1304en.u /ptmp/thrasher/c Exit (139) 02/29/2008 16:17:17
00047 bv1304en.u /ptmp/thrasher/c Exit (139) 02/29/2008 16:17:17
00040 bv1403en.u /ptmp/thrasher/c Exit (139) 02/29/2008 16:17:17
......


And this in my error file:
......
43:print_memusage iam 43 stepon after dynpkg. -1 in the next line means unavailable
43:print_memusage: size, rss, share, text, datastack= -1 60236 -1 -1 -1
14:print_memusage iam 14 stepon after dynpkg. -1 in the next line means unavailable
14:print_memusage: size, rss, share, text, datastack= -1 70252 -1 -1 -1
ERROR: 0031-250 task 28: Segmentation fault
ERROR: 0031-250 task 24: Segmentation fault
ERROR: 0031-250 task 26: Segmentation fault
ERROR: 0031-250 task 18: Segmentation fault
ERROR: 0031-250 task 25: Segmentation fault
......


Any ideas? I'm running on bluevista, and I've got dtime = 1800 for both CAM and CLM and the stacksize set to unlimited. I tried a completely fresh version of the code (CAM 3.1.p2) as recommended by the CISL support guys, but it's doing the same thing.
 
Just to update, I changed dtime to 1200 and still got the COURLIM errors and seg faults. I then set dtime to 900, which did not produce any COURLIM errors but still caused seg faulting.

Any help would be appreciated!
 
Here's what I'm getting in my core files:

__diagnostics_NMOD_diag_physvar_ic : 758 # in file
physpkg@OL@1 : 139 # in file
# At location 0x0900000001523164 but procedure information unavailable.
# At location 0x090000000151f958 but procedure information unavailable.
physpkg : 135 # in file
stepon : 228 # in file
cam : 241 # in file


It seems there's something going on with QCWAT...
 
Top