Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

WACCM crashing after 95 years

lantao@ucar_edu

New Member
Hello, I am running CESM1.1.1 (WACCM4) with prescribed sea ice and SST forcing (F_2000_WACCM). It crashes after 95 years.Right before the model crashes, the error message is like below: 78:  findsp not converging at point i, k            2          54
 78:  t, q, p, enin    202.013868948721 5.886565323329356E-002
 78:   23194.1639514279        369817.680521060
 78:  tsp, qsp, enout    652.677208061405       -3.77411252009257
 78:  -5985690.77341165
 78: ENDRUN:cldwat::FINDSP -- not converging
 78:Abort(1) on node 78 (rank 78 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 78
 78:INFO: 0031-306  pm_atexit: pm_exit_value is 1.
INFO: 0031-251  task 26 exited: rc=1
ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1 in task 26Could anyone give some suggestions on how to deal with this ? Thanks, Lantao
 

santos

Member
Sorry for the slow response, Lantao. I've just gotten back from vacation.Until recently, the water vapor saturation routines, such as findsp, can behave badly when given cold inputs; they were rewritten for CESM 1.2. Additionally, WACCM is operating on the edge of numerical stability for the FV dycore; this means that in rare cases it can encounter specific combinations of temperature/pressure/humidity that cause a crash in these routines.The simplest thing that you could try would be to use the namelist option "nspltvrm = 2". This should improve model stability without changing climate, and hopefully can prevent this error from occurring again.
 

lantao@ucar_edu

New Member
Hi, Sean,My WACCM experiment still has some intermittent crashing, for some different reasons this time. The error message is like below:  39: imp_sol: Time step   1.1250000000000E+01 failed to converge @ (lchnk,lev,col,nstep) =    420    53     2******  39: imp_sol: Failed to converge @ (lchnk,lev,col,nstep,dt,time) =    420    53     2******  1.1250000000000E+01  1.8000000000000E+03  39: CL        1.000E+00  39: CLO       1.000E+00  39: OCLO      1.000E+00  39: HOCL      1.000E+00  39: CLONO2    1.000E+00  39: imp_sol : @ (lchnk,lev,col) =          420          53           2  failed  39:         165  timesINFO: 0031-251  task 149 exited: rc=-11INFO: 0031-251  task 59 exited: rc=-1134:forrtl: error (78): process killed (SIGTERM)The way I did by following your advise to pass the crashing is to switch the parameter "nspltvrm" between 2 and 1 (default in CESM1.1.1). Temporarily this works, but the crashing still occur intermittently roughly every 20 years for either value of nspltvrm. I am just wondering if there is any other solution for this problem. Thanks a lot for the help. 
 

lantao@ucar_edu

New Member
And one more question about the size of the WACCM log files.I noticed that each cesm.log.??????-??????.gz is very large (300MB), so that I have to move these log files frequently out of my home directory. Otherwise my home directory will be full within a few weeks. After checking the log file, I found that most of the file is fillled by warning messages like below:   4:  filew failed, worst i, j, qtmp, q =            1          13  10:  filew failed, worst i, j, qtmp, q =            1          33  10: -1.384606648208407E-195 -1.310203105267379E-195These warning messages occupy over 90% of the log file. I am just wondering if this is common in WACCM experiments. Thanks,
 
Top