Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

An issue that can cause POP2 to crash on restart

mlevy

Michael Levy
CSEG and Liaisons
Staff member
Can you point me to a case directory? I want to check a couple of things:
1) Make sure your case is using a netcdf restart file with init_ts_file_fmt='bin'2) See what version of CESM you are running, because I think different versions had different fixes (though I might be thinking of a different issue...) Thanks!~Mike
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
Can you point me to a case directory? I want to check a couple of things:
1) Make sure your case is using a netcdf restart file with init_ts_file_fmt='bin'2) See what version of CESM you are running, because I think different versions had different fixes (though I might be thinking of a different issue...) Thanks!~Mike
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
I am running cesm1.2.0Indeed, I used a workaround: I did the first segment then, I did a branch of the first segment.First segment is:/glade/p/cesmdata/cseg/runs/cesm1_2/b.e12.B1850C5CN.ne30_g16.init.ch.013_yr1_3

Then, the branch:/glade/p/cesmdata/cseg/runs/cesm1_2/b.e12.B1850C5CN.ne30_g16.init.ch.013
When I do a branch, it works fine. But it would be nicer not to have to go through a branch  
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
I am running cesm1.2.0Indeed, I used a workaround: I did the first segment then, I did a branch of the first segment.First segment is:/glade/p/cesmdata/cseg/runs/cesm1_2/b.e12.B1850C5CN.ne30_g16.init.ch.013_yr1_3

Then, the branch:/glade/p/cesmdata/cseg/runs/cesm1_2/b.e12.B1850C5CN.ne30_g16.init.ch.013
When I do a branch, it works fine. But it would be nicer not to have to go through a branch  
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
I am running cesm1.2.0Indeed, I used a workaround: I did the first segment then, I did a branch of the first segment.First segment is:/glade/p/cesmdata/cseg/runs/cesm1_2/b.e12.B1850C5CN.ne30_g16.init.ch.013_yr1_3

Then, the branch:/glade/p/cesmdata/cseg/runs/cesm1_2/b.e12.B1850C5CN.ne30_g16.init.ch.013
When I do a branch, it works fine. But it would be nicer not to have to go through a branch  
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
I just built a similar case (cesm1_2_0, B1850C5CN, and ne30_g16) - I was able to run for 5 days, set CONTINUE_RUN = TRUE, and then run for 5 more days. Either I'm not quite mimicking your setup exactly right, or the issue you're running into is different from the one addressed in this thread. I'll send you an email, but maybe we can line up a time to chat next week and make sure I'm on the same page as you regarding set up and the errors you are seeing.
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
I just built a similar case (cesm1_2_0, B1850C5CN, and ne30_g16) - I was able to run for 5 days, set CONTINUE_RUN = TRUE, and then run for 5 more days. Either I'm not quite mimicking your setup exactly right, or the issue you're running into is different from the one addressed in this thread. I'll send you an email, but maybe we can line up a time to chat next week and make sure I'm on the same page as you regarding set up and the errors you are seeing.
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
I just built a similar case (cesm1_2_0, B1850C5CN, and ne30_g16) - I was able to run for 5 days, set CONTINUE_RUN = TRUE, and then run for 5 more days. Either I'm not quite mimicking your setup exactly right, or the issue you're running into is different from the one addressed in this thread. I'll send you an email, but maybe we can line up a time to chat next week and make sure I'm on the same page as you regarding set up and the errors you are seeing.
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
I ran the compset B_1850_CAM5_CN at ne30_g16 (cesm1_2_0)I used the options:init_ts_suboption = 'spunup'
init_ts_file = 'g40.000.pop.r.0301-01-01-00000'
init_ts_file_fmt = 'bin'  I did a succesfull 3-year runON restart, I get the error: 
 Global Time Averages: 01-02-0001 01:00:00
 VDC_BCK:   0.160397863902526    
 VVC_BCK:    1.60397863717241    
 (io_pio_init)  create file ./b.e12.B1850C5CN.ne30_g16.init.ch.013.pop.h.once.nc
 
 tavg file written: ./b.e12.B1850C5CN.ne30_g16.init.ch.013.pop.h.once.nc

          VOLUME AND TRACER BUDGET INITIALIZATION:
          ========================================
      volume_t (cm^3)           = 0.132513845249E+25
      SUM [volume*T] (C   cm^3) = 0.233261398038E+25
      SUM [volume*S] (ppt cm^3) = 0.233259223421E+28
 ovf_loc_prd: nsteps_total=           2  ovf=           4  swap ovf UV old/new
 prd set old/new=           1           9
 
------------------------------------------------------------------------
 
POP Exiting...
POP_SolversChronGear: solver not converged
POP_SolverRun: error in ChronGear
POP_BarotropicDriver: error in solver
Step: error in barotropic

 Let's talk next week
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
I ran the compset B_1850_CAM5_CN at ne30_g16 (cesm1_2_0)I used the options:init_ts_suboption = 'spunup'
init_ts_file = 'g40.000.pop.r.0301-01-01-00000'
init_ts_file_fmt = 'bin'  I did a succesfull 3-year runON restart, I get the error: 
 Global Time Averages: 01-02-0001 01:00:00
 VDC_BCK:   0.160397863902526    
 VVC_BCK:    1.60397863717241    
 (io_pio_init)  create file ./b.e12.B1850C5CN.ne30_g16.init.ch.013.pop.h.once.nc
 
 tavg file written: ./b.e12.B1850C5CN.ne30_g16.init.ch.013.pop.h.once.nc

          VOLUME AND TRACER BUDGET INITIALIZATION:
          ========================================
      volume_t (cm^3)           = 0.132513845249E+25
      SUM [volume*T] (C   cm^3) = 0.233261398038E+25
      SUM [volume*S] (ppt cm^3) = 0.233259223421E+28
 ovf_loc_prd: nsteps_total=           2  ovf=           4  swap ovf UV old/new
 prd set old/new=           1           9
 
------------------------------------------------------------------------
 
POP Exiting...
POP_SolversChronGear: solver not converged
POP_SolverRun: error in ChronGear
POP_BarotropicDriver: error in solver
Step: error in barotropic

 Let's talk next week
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
I ran the compset B_1850_CAM5_CN at ne30_g16 (cesm1_2_0)I used the options:init_ts_suboption = 'spunup'
init_ts_file = 'g40.000.pop.r.0301-01-01-00000'
init_ts_file_fmt = 'bin'  I did a succesfull 3-year runON restart, I get the error: 
 Global Time Averages: 01-02-0001 01:00:00
 VDC_BCK:   0.160397863902526    
 VVC_BCK:    1.60397863717241    
 (io_pio_init)  create file ./b.e12.B1850C5CN.ne30_g16.init.ch.013.pop.h.once.nc
 
 tavg file written: ./b.e12.B1850C5CN.ne30_g16.init.ch.013.pop.h.once.nc

          VOLUME AND TRACER BUDGET INITIALIZATION:
          ========================================
      volume_t (cm^3)           = 0.132513845249E+25
      SUM [volume*T] (C   cm^3) = 0.233261398038E+25
      SUM [volume*S] (ppt cm^3) = 0.233259223421E+28
 ovf_loc_prd: nsteps_total=           2  ovf=           4  swap ovf UV old/new
 prd set old/new=           1           9
 
------------------------------------------------------------------------
 
POP Exiting...
POP_SolversChronGear: solver not converged
POP_SolverRun: error in ChronGear
POP_BarotropicDriver: error in solver
Step: error in barotropic

 Let's talk next week
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
Hi Cecile,As we talked about, the issue was that the "init_ts_file_fmt" was being read from user_nl_pop2 even though CONTINUE_RUN=TRUE and this value should be taken from an environment variable set by pop2.buildnml.csh; this is very similar to bug 1861 and will be fixed in future CESM versions. For 1.2.0, the workaround is to run with RESUBMIT=0 and manually remove the "init_ts_*" variables from user_nl_pop2 before continuing the run.~Mike
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
Hi Cecile,As we talked about, the issue was that the "init_ts_file_fmt" was being read from user_nl_pop2 even though CONTINUE_RUN=TRUE and this value should be taken from an environment variable set by pop2.buildnml.csh; this is very similar to bug 1861 and will be fixed in future CESM versions. For 1.2.0, the workaround is to run with RESUBMIT=0 and manually remove the "init_ts_*" variables from user_nl_pop2 before continuing the run.~Mike
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
Hi Cecile,As we talked about, the issue was that the "init_ts_file_fmt" was being read from user_nl_pop2 even though CONTINUE_RUN=TRUE and this value should be taken from an environment variable set by pop2.buildnml.csh; this is very similar to bug 1861 and will be fixed in future CESM versions. For 1.2.0, the workaround is to run with RESUBMIT=0 and manually remove the "init_ts_*" variables from user_nl_pop2 before continuing the run.~Mike
 
POP Exiting in control simulations.Using the fully coupled CESM1.0.4 (B_1850, f19_gx1v6), the control simulation stops after 2000-year run. The error message follows:POP Exiting...
POP_SolversChronGear: solver not converged
POP_SolverRun: error in ChronGear
POP_BarotropicDriver: error in solver
Step: error in barotropic
As it is the control run, all is default. Also init_ts_file_fmt = 'nc' is correct. Is the error the bug of model?  Is there anyone using this resolution to do more than 2000-y run?Look forward to any suggestions. ThanksKun Wang 
 
POP Exiting in control simulations.Using the fully coupled CESM1.0.4 (B_1850, f19_gx1v6), the control simulation stops after 2000-year run. The error message follows:POP Exiting...
POP_SolversChronGear: solver not converged
POP_SolverRun: error in ChronGear
POP_BarotropicDriver: error in solver
Step: error in barotropic
As it is the control run, all is default. Also init_ts_file_fmt = 'nc' is correct. Is the error the bug of model?  Is there anyone using this resolution to do more than 2000-y run?Look forward to any suggestions. ThanksKun Wang 
 
POP Exiting in control simulations.Using the fully coupled CESM1.0.4 (B_1850, f19_gx1v6), the control simulation stops after 2000-year run. The error message follows:POP Exiting...
POP_SolversChronGear: solver not converged
POP_SolverRun: error in ChronGear
POP_BarotropicDriver: error in solver
Step: error in barotropic
As it is the control run, all is default. Also init_ts_file_fmt = 'nc' is correct. Is the error the bug of model?  Is there anyone using this resolution to do more than 2000-y run?Look forward to any suggestions. ThanksKun Wang 
 
Top