Problem running CESM: failed to converge

koichi omi · Feb 18, 2020

Hi all,

I run CESM2.1.2 on my laboratory computer.

I set STOP_OPTION=nsteps, STOP_N=1, but the calculation never stops.

cesm.log file says
"
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 2616 52 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 1288 52 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 1952 53 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 2948 81 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 3031 70 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 3114 70 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 3114 86 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 2450 70 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 2450 86 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 3363 68 0
imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep) = 3363 84 0
...
"
The sentences like this were repeated in the log file.
Please see log file attached to this.

Is this normal behavior?

Here is the configuration of the case

resolution: f09_f09_mg17
compset: FW1850 (1850_CAM60%WCTS_CLM50%SP_CICE%PRES_DOCN%DOM_MOSART_CISM2%NOEVOLVE_SWAV )

Is there any suggestion to fix this error?

Sincerely,

jedwards · Feb 19, 2020

You need to set STOP_N=3 or greater.

koichi omi · Feb 19, 2020

Thank you so much for your reply.

I tried the setting of STOP_OPTION=nsteps, STOP_N=10 under the same configuration of the case.
In addition, I tried STOP_OPTION=nyears.
However, the same problem occurred.

I also noticed the warning below in log file when the program reads the input data.
"WARNING: Rearr optional argument is a pio2 feature, ignored in pio1"
please see the log file attached to this.
Does this warning have a relationship with the running problem?

Is there any suggestion to fix this problem?

jedwards · Feb 19, 2020

Right at the top of the log you sent me I see:
(seq_timemgr_clockInit) WARNING: Stop time too short, not all components will
be advanced and restarts won't be written

Could you send your drv_in file and cpl.log for the case with STOP_N=10, STOP_OPTION='nsteps'

koichi omi · Feb 19, 2020

Thank you for your quick reply.

These are the drv_in and cpl.log.

Sincerely,

jedwards · Feb 20, 2020

I don't see any problem here - the cpl.log ends before the first time step starts so how does this indicate that it doesn't stop after 10?

koichi omi · Feb 20, 2020

Thank you for your reply.
Sorry, I should explain the problem more clearly.
The problem is that the calculation is not advanced from the initial step.
Therefore, the calculation never ends.
So I want to know why this happens.

jedwards · Feb 21, 2020

Oh okay - Have you tried F1850? I think that this is an issue for the cam chem forum.

koichi omi · Feb 24, 2020

Thank you for your reply.

I tried F1850. The calculation was completed successfully.
What should I do for FW1850?

mmills · Feb 24, 2020

The failing implicit solver message indicates that the chemistry is likely unstable with the initial condition provided. This can happen if you make changes in chemistry, or perhaps just because you are running on a different computer than the one that initial condition file was generated with. Have you made any changes to the chemistry?

In cases such as this, you can usually get to a stable state by running the model forward a few days with higher time splitting (shorter time steps) for dynamics, tracer advection, and vertical remapping. The default time splits for WACCM are:

fv_nsplit = 16
fv_nspltrac = 4
fv_nspltvrm = 4

Try running for 2 days with this in your user_nl_cam file in your case directory:

fv_nsplit = 128
fv_nspltrac = 128
fv_nspltvrm = 128
inithist = 'DAILY'

If the run completes 2 days, you can continue the run after removing these lines from the user_nl_cam. The purpose of the inithist line is to produce a new initial condition file (*.cam.i.*) that you can use for future stable startups. Let me know if this works.

koichi omi · Feb 27, 2020

Thank you so much for your reply.

All I changed was MPI_RUN_COMMAND and run_exe. I didn't change the chemistry.
Also, I run CESM on the same computer.

I tried the settings you told me, but the same problem of convergence occurred.

koichi omi · Apr 17, 2020

This problem was solved by deleting all the input files and downloading them again.
The cause of this problem may be that one of the input data files was broken.
Thank you!

jedwards · Apr 17, 2020

For cesm2.1.x there is a new option to the check_input_data command - ./check_input_data --chksum will compare chksums for all of the inputdata used in a case and let you know if any input files are corrupted.

koichi omi · Apr 17, 2020

Thank you so much for your advice!

Hemraj · Jun 2, 2021

I found the same problem (imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep)) as well. These are my major setting of the model.
codebase=cesm2_1_3
compset=FC2010climo
resolution=f09_f09_mg17

As suggested above, I also checked if my input file is corrupted (using: ./check_input_data --chksum), but it seems okay.
Could you please suggest to me why this problem is occurring and how can I resolve it?
Thank you so much.

Hemraj · Jun 2, 2021

Hemraj said:
I found the same problem (imp_sol: time step 1800.000 failed to converge @ (lchnk,vctrpos,nstep)) as well. These are my major setting of the model.
codebase=cesm2_1_3
compset=FC2010climo
resolution=f09_f09_mg17

As suggested above, I also checked if my input file is corrupted (using: ./check_input_data --chksum), but it seems okay.
Could you please suggest to me why this problem is occurring and how can I resolve it?
Thank you so much.

for a better understanding of the problem, I am also attaching the log file.

jedwards · Jun 3, 2021

This looks like a possible system issue rather than a model issue - a model issue would normally present a traceback and abort message at the end of the log. Check the lnd.log to see if there is further info there and also check the slurm logs in your case directory for errors there.

Hemraj · Jun 3, 2021

jedwards said:
This looks like a possible system issue rather than a model issue - a model issue would normally present a traceback and abort message at the end of the log. Check the lnd.log to see if there is further info there and also check the slurm logs in your case directory for errors there.

Thank you so much for the reply.
I checked both lnd.log and slurm logs ($case>logs>run_environment.txt.297530.210604-044944), but didn't find any error. For your reference, the files are attached herewith.
Could it be because of the input file where I made changes?
Thank you.

jeffhu · Feb 13, 2022

Hemraj said:
Thank you so much for the reply.
I checked both lnd.log and slurm logs ($case>logs>run_environment.txt.297530.210604-044944), but didn't find any error. For your reference, the files are attached herewith.
Could it be because of the input file where I made changes?
Thank you.

Hi. I have went through the same problem as you described, and I checked the lnd log file, which is similar to mine. May I ask that how you solve this problem? Thanks!

marcinkupilas · Apr 25, 2023

I am having a similar problem and I'd be very grateful if someone could help.

I am running compset FWmaHIST, res ne30_ne30_mg17 on the checked out cesm tag cam6_2_011 for running WACCM-RR.

I have attached the cesm.log and resolved atm_in file.

Problem running CESM: failed to converge

koichi

New Member

Attachments

CSEG and Liaisons

koichi

New Member

Attachments

CSEG and Liaisons

koichi

New Member

Attachments

CSEG and Liaisons

koichi

New Member

CSEG and Liaisons

koichi

New Member

CSEG and Liaisons

koichi

New Member

koichi

New Member

CSEG and Liaisons

koichi

New Member

Hemraj Bhattarai

Member

Hemraj Bhattarai

Member

Attachments

CSEG and Liaisons

Hemraj Bhattarai

Member

Attachments

jeffhu

New Member

Marcin Kupilas

New Member

Attachments