Main menu

Navigation

4xCO2 experiment crashed during running...

9 posts / 0 new
Last post
11438023@...
4xCO2 experiment crashed during running...

Hi there,

I meet a strange problem during my 4xCO2 sensitivity experiment.

My CESM version is 1.2.2, and the compset is E1850C5CN with CN (CLM) mode turned off. 

Firstly I ran 100 years to spin up from coldstart, and the restart fiies' date is 101-01-01;

Then I set up a hybrid run, changing co2 concentration to 4 times its preindustrial value while other parameter stay unchagned, and start my experiment with the restart file described above (RUN_REFDATE=0101-01-01, STOP_OPTION=nyears, STOP_N=6);

But the experiment stops every time it reaches year 143 and month 7, and there have no obvious error shown in the log file (see below). 

I have excluded the cause of nodes, so does anyone know why this problem happened?

Thanks very much!


cesm.log

########################################  ##########################

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    901 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-5.7E-12 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    570 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-6.1E-12 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    415 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-2.9E-12 at i,k=   1  1

  filew failed, worst i, j, qtmp, q =            1          73

 -6.265316431815470E-013  0.000000000000000E+000

  filew failed, worst i, j, qtmp, q =            1          73

 -7.505473598505249E-015  0.000000000000000E+000

 dpcoup cant adjust           3         561           8 -5.080309997666980E-018

  0.000000000000000E+000  4.311432476213216E-018

 QNEG3 from convect_deep/CLDLIQ:m=  2 lat/lchnk=    302 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-2.2E-12 at i,k=   4 27

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    264 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-2.0E-12 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    336 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-7.8E-10 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    334 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-1.3E-10 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=   1034 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-5.1E-12 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    886 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-6.2E-09 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    564 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-1.1E-11 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    833 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-1.4E-10 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    563 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-1.1E-10 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    563 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-3.8E-11 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    564 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-2.4E-10 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    836 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-3.8E-11 at i,k=   1  1

 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    227 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-8.0E-11 at i,k=   1  1

forrtl: error (65): floating invalid

Image              PC                Routine            Line        Source

cesm.exe           00000000006AB42F  mo_usrrxt_mp_usrr         581  mo_usrrxt.F90

cesm.exe           000000000061D8E6  mo_gas_phase_chem         724  mo_gas_phase_chemdr.F90

cesm.exe           00000000005B1DFE  chemistry_mp_chem        1473  chemistry.F90

cesm.exe           0000000000743FEE  physpkg_mp_tphysa        1396  physpkg.F90

cesm.exe           0000000000741B45  physpkg_mp_phys_r        1131  physpkg.F90

cesm.exe           0000000000550F7D  cam_comp_mp_cam_r         300  cam_comp.F90

cesm.exe           000000000054515C  atm_comp_mct_mp_a         539  atm_comp_mct.F90

cesm.exe           00000000004BDD2D  ccsm_comp_mod_mp_        4079  ccsm_comp_mod.F90

cesm.exe           00000000004E6B35  MAIN__                     91  ccsm_driver.F90

cesm.exe           00000000004B733C  Unknown               Unknown  Unknown

libc.so.6          0000003BD661ECDD  Unknown               Unknown  Unknown

cesm.exe           00000000004B7239  Unknown               Unknown  Unknown

yhrun: error: cn2980: task 54: Aborted

yhrun: First task exited 60s ago

yhrun: tasks 0-53,55-119: running

yhrun: task 54: exited abnormally

yhrun: Terminating job step 5127560.0

yhrun: Job step aborted: Waiting up to 2 seconds for job step to finish.

slurmd[cn2655]: *** STEP 5127560.0 KILLED AT 2016-06-17T17:44:02 WITH SIGNAL 9 ***

slurmd[cn2655]: *** STEP 5127560.0 KILLED AT 2016-06-17T17:44:02 WITH SIGNAL 9 ***

yhrun: error: cn4155: tasks 108-119: Killed

###################################################################



Best regard,

Duan

 

 

 

aliceb

Hi Duan,

We are currently in the process of isolating this problem and working on the fix. We will post an update once we know more.

Alice Bertini
Software Engineer
NCAR / CESM

11438023@...

Dear Alice,

Thanks for your reply! 

BTW, I also posted another porblem before in which i used 'RUN_TYPE=startup', instead of hybrid, from coldstart for 4xco2 running, and it crashed with ' urban longwave radiation balance error '.   Oleson said it's from a bad longwave radiation value in the atmosphere, so do you have any suggestation?

Best,

Duan

mvr

Hello Duan...

try the following fix in rrtmg_lw_rtrnmc.f90 and rrtmg_lw_rtrnmr.f90:

currently using this:
 
      do ibnd = 1,nbndlw
        if (ibnd.eq.1 .or ibnd.eq.4 .or. ibnd.ge.10) then
          secdiff(ibnd) = 1.66_r8
        else
          secdiff(ibnd) = a0(ibnd) + a1(ibnd)*exp(a2(ibnd)*pwvcm)
        endif
      enddo
 
      if (pwvcm.lt.1.0) secdiff(6) = 1.80_r8
      if (pwvcm.gt.7.1) secdiff(7) = 1.50_r8
 
change to:
 
      do ibnd = 1,nbndlw
        if (ibnd.eq.1 .or. ibnd.eq.4 .or. ibnd.ge.10) then
          secdiff(ibnd) = 1.66_r8
        else
          secdiff(ibnd) = a0(ibnd) + a1(ibnd)*exp(a2(ibnd)*pwvcm)
          if (secdiff(ibnd) .gt. 1.80_r8) secdiff(ibnd) = 1.80_r8
          if (secdiff(ibnd) .lt. 1.50_r8) secdiff(ibnd) = 1.50_r8
        endif
      enddo
we're in the process of formally getting this fix into CAM...hope this helps -mathew  
11438023@...

Hi,

Thank you very much! I will try this and let you know the resutls. 

Best,

Duan

hch081@...

Hello, 

May I know if there is any update about this bug-fix?

As I am running the AGCM version of CESM1.0 with a 50-km horizontal resolution (f05_f05), the future climate runs have a similar crash.

 QNEG3 from vertical diffusion/SO2:m=  8 lat/lchnk= 8950 Min. mixing ratio violated at    4 points.  Reset to  1.0E-36 Worst =-1.2E-10 at i,k=   4 30

 dpcoup dqreq           3        9147          15 -6.768938331155651E-035

  3.207403706088607E-034  5.779655791372648E-035

forrtl: severe (174): SIGSEGV, segmentation fault occurred

Image              PC                Routine            Line        Source             

ccsm.exe           000000000069A497  rrtmg_lw_rtrnmc_m         376  rrtmg_lw_rtrnmc.f90

ccsm.exe           000000000069528A  rrtmg_lw_rad_mp_r         494  rrtmg_lw_rad.f90

ccsm.exe           0000000000583AC0  radlw_mp_rad_rrtm         272  radlw.F90

ccsm.exe           0000000000572004  radiation_mp_radi        1099  radiation.F90

ccsm.exe           000000000085AD4B  tphysbc_                  475  tphysbc.F90

ccsm.exe           00000000005405BF  physpkg_mp_phys_r         673  physpkg.F90

ccsm.exe           0000000000485D82  cam_comp_mp_cam_r         236  cam_comp.F90

ccsm.exe           000000000047A61E  atm_comp_mct_mp_a         549  atm_comp_mct.F90

ccsm.exe           0000000000410AD3  ccsm_comp_mod_mp_        2168  ccsm_comp_mod.F90

ccsm.exe           0000000000421E55  MAIN__                     91  ccsm_driver.F90

ccsm.exe           000000000040EE8C  Unknown               Unknown  Unknown

libc.so.6          00002AAAAC2E4C36  Unknown               Unknown  Unknown

ccsm.exe           000000000040ED89  Unknown               Unknown  Unknown

 

Best Regards,

Hoffman

 

mvr

Hi...

The bug fix has been applied to newer versions of the code and tagged, but that is not the case for older release versions.  It is 

recommended that you apply the fix by making the code modifications mentioned earlier in this thread.  

Hope that helps,

-mathew

 

 

hch081@...

Hi Mathew, 

 

Thank you very much!

After applying the code changes mentioned earlier in this thread, the restart run does not crash.

Is it correct to say that these code changes do not affect any physics or parameterization in the model?

 

Best Regards,

Hoffman

 

 

 

mvr

Hi...

This does not affect any parameterization, but the physics does change in the sense that it corrects a problem and it will 

be answer-changing.  Hope that clarifies and glad to hear it solves your problem.

-mathew

 

 

Log in or register to post comments

Who's new

  • rory
  • dangcheng111@...
  • yuxisuo2020@...
  • bryn.ronalds@...
  • brookeadams@...