Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

SSP585 simulation breaks in year 2095

demetray

Demetra Yancopoulos
New Member
I am using CESM2.1.3 and running a CMIP6 SSP5-85 simulation (2015-2100) with the component set SSP585_CAM60_CLM50%BGC-CROP-CMIP6DECK_CICE%CMIP6_POP2%ECO%ABIO-DIC_MOSART_CISM2%NOEVOLVE_WW3_BGC%BDRD.

I have been submitting simulations in 3-year chunks. I have successfully run the model through the year 2092 (with restart files for 2093-01-01). At that point, the simulation restarts and runs for a while. However, it randomly stops during this run. In the run directory, files seem to be populated until the date 2095-06-01, but nothing after 2092-12-01 exists in the output files. So, the run never finishes. I can't find any sign of the error in the log files. I am totally stumped.

I have resubmitted the simulation a couple times and the same thing happens.

Any suggestions for where I can look to diagnose the error? Anybody encounter this error before with an SSP5-85 compset? Are there files I should clear out in the case or run directory before resubmitting the simulation?
 

slevis

Moderator
Staff member
Unless somebody has a different suggestion, I would start by checking whether you have filled some disk space beyond your quota. If not, I would want to know whether the model stops at the same timestep and in the same exact way every time. I might start writing restart files more frequently so as to restart the simulation closer to the point of failure and eventually possibly start adding write statements in the code that may help reveal where the model crashes.
 
Vote Upvote 0 Downvote

demetray

Demetra Yancopoulos
New Member
Unless somebody has a different suggestion, I would start by checking whether you have filled some disk space beyond your quota. If not, I would want to know whether the model stops at the same timestep and in the same exact way every time. I might start writing restart files more frequently so as to restart the simulation closer to the point of failure and eventually possibly start adding write statements in the code that may help reveal where the model crashes.
Thank you for the advice! I have plenty of space left in scratch (and have some other experiments running and storing output there with no problems). I will start by writing the restart files more frequently. What kind of write statements and where should I put them?
 
Vote Upvote 0 Downvote

demetray

Demetra Yancopoulos
New Member
I think I was looking in the wrong place for a logged error. In the log files in the run directory (as opposed to log directory), I was able to find more information:


First:
MARBL ERROR (marbl_co2calc_mod:drtsafe): bounding bracket for pH solution not found



Then a bit later:

(Task 123, block 1) MARBL WARNING (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) it = 3

MPICH Notice [Rank 1147] [job id 5c0aa1d0-31f8-4d17-93bb-47b401825b4a] [Sat Dec 27 06:21:34 2025] [dec1332] - Abort(0) (rank 1147 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1147

aborting job:

application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1147


And finally:

forrtl: error (78): process killed (SIGTERM)


This source claims it is an outside process killing the job, whereas other sources suggest it is model instability (though when I look at the outputs, values seem reasonable). Do you know if this is an outside process or internal to CESM2? Do you have any suggestions for how I can fix this?

I am not interested in the marine biogeochemistry, so perhaps I should just turn MARBL off... How can I do that? Is it a problem to do it part way through the experiment (I already have many decades of data I don't want to give up)?
 
Vote Upvote 0 Downvote
Top