Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

4xCO2 experiment crashed during running...

Hi there,I meet a strange problem during my 4xCO2 sensitivity experiment.My CESM version is 1.2.2, and the compset is E1850C5CN with CN (CLM) mode turned off. Firstly I ran 100 years to spin up from coldstart, and the restart fiies' date is 101-01-01;Then I set up a hybrid run, changing co2 concentration to 4 times its preindustrial value while other parameter stay unchagned, and start my experiment with the restart file described above (RUN_REFDATE=0101-01-01, STOP_OPTION=nyears, STOP_N=6);But the experiment stops every time it reaches year 143 and month 7, and there have no obvious error shown in the log file (see below). I have excluded the cause of nodes, so does anyone know why this problem happened? Thanks very much!
cesm.log########################################  ########################## QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    901 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-5.7E-12 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    570 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-6.1E-12 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    415 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-2.9E-12 at i,k=   1  1  filew failed, worst i, j, qtmp, q =            1          73 -6.265316431815470E-013  0.000000000000000E+000  filew failed, worst i, j, qtmp, q =            1          73 -7.505473598505249E-015  0.000000000000000E+000 dpcoup cant adjust           3         561           8 -5.080309997666980E-018  0.000000000000000E+000  4.311432476213216E-018 QNEG3 from convect_deep/CLDLIQ:m=  2 lat/lchnk=    302 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-2.2E-12 at i,k=   4 27 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    264 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-2.0E-12 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    336 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-7.8E-10 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    334 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-1.3E-10 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=   1034 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-5.1E-12 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    886 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-6.2E-09 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    564 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-1.1E-11 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    833 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-1.4E-10 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    563 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-1.1E-10 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    563 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-3.8E-11 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    564 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-2.4E-10 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    836 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-3.8E-11 at i,k=   1  1 QNEG3 from TPHYSBCb:m=  5 lat/lchnk=    227 Min. mixing ratio violated at    1 points.  Reset to  0.0E+00 Worst =-8.0E-11 at i,k=   1  1forrtl: error (65): floating invalidImage              PC                Routine            Line        Sourcecesm.exe           00000000006AB42F  mo_usrrxt_mp_usrr         581  mo_usrrxt.F90cesm.exe           000000000061D8E6  mo_gas_phase_chem         724  mo_gas_phase_chemdr.F90cesm.exe           00000000005B1DFE  chemistry_mp_chem        1473  chemistry.F90cesm.exe           0000000000743FEE  physpkg_mp_tphysa        1396  physpkg.F90cesm.exe           0000000000741B45  physpkg_mp_phys_r        1131  physpkg.F90cesm.exe           0000000000550F7D  cam_comp_mp_cam_r         300  cam_comp.F90cesm.exe           000000000054515C  atm_comp_mct_mp_a         539  atm_comp_mct.F90cesm.exe           00000000004BDD2D  ccsm_comp_mod_mp_        4079  ccsm_comp_mod.F90cesm.exe           00000000004E6B35  MAIN__                     91  ccsm_driver.F90cesm.exe           00000000004B733C  Unknown               Unknown  Unknownlibc.so.6          0000003BD661ECDD  Unknown               Unknown  Unknowncesm.exe           00000000004B7239  Unknown               Unknown  Unknownyhrun: error: cn2980: task 54: Abortedyhrun: First task exited 60s agoyhrun: tasks 0-53,55-119: runningyhrun: task 54: exited abnormallyyhrun: Terminating job step 5127560.0yhrun: Job step aborted: Waiting up to 2 seconds for job step to finish.slurmd[cn2655]: *** STEP 5127560.0 KILLED AT 2016-06-17T17:44:02 WITH SIGNAL 9 ***slurmd[cn2655]: *** STEP 5127560.0 KILLED AT 2016-06-17T17:44:02 WITH SIGNAL 9 ***yhrun: error: cn4155: tasks 108-119: Killed###################################################################

Best regard,Duan   
 

aliceb

Member
Hi Duan,We are currently in the process of isolating this problem and working on the fix. We will post an update once we know more.
 

aliceb

Member
Hi Duan,We are currently in the process of isolating this problem and working on the fix. We will post an update once we know more.
 
Dear Alice,Thanks for your reply! BTW, I also posted another porblem before in which i used 'RUN_TYPE=startup', instead of hybrid, from coldstart for 4xco2 running, and it crashed with ' urban longwave radiation balance error '.   Oleson said it's from a bad longwave radiation value in the atmosphere, so do you have any suggestation?Best,Duan
 
Dear Alice,Thanks for your reply! BTW, I also posted another porblem before in which i used 'RUN_TYPE=startup', instead of hybrid, from coldstart for 4xco2 running, and it crashed with ' urban longwave radiation balance error '.   Oleson said it's from a bad longwave radiation value in the atmosphere, so do you have any suggestation?Best,Duan
 

mvr

Member
Hello Duan...try the following fix in rrtmg_lw_rtrnmc.f90 and rrtmg_lw_rtrnmr.f90:
Code:
currently using this:
 
      do ibnd = 1,nbndlw
        if (ibnd.eq.1 .or ibnd.eq.4 .or. ibnd.ge.10) then
          secdiff(ibnd) = 1.66_r8
        else
          secdiff(ibnd) = a0(ibnd) + a1(ibnd)*exp(a2(ibnd)*pwvcm)
        endif
      enddo
 
      if (pwvcm.lt.1.0) secdiff(6) = 1.80_r8
      if (pwvcm.gt.7.1) secdiff(7) = 1.50_r8
 
change to:
 
      do ibnd = 1,nbndlw
        if (ibnd.eq.1 .or. ibnd.eq.4 .or. ibnd.ge.10) then
          secdiff(ibnd) = 1.66_r8
        else
          secdiff(ibnd) = a0(ibnd) + a1(ibnd)*exp(a2(ibnd)*pwvcm)
          if (secdiff(ibnd) .gt. 1.80_r8) secdiff(ibnd) = 1.80_r8
          if (secdiff(ibnd) .lt. 1.50_r8) secdiff(ibnd) = 1.50_r8
        endif
      enddo
we're in the process of formally getting this fix into CAM...hope this helps -mathew  
 

mvr

Member
Hello Duan...try the following fix in rrtmg_lw_rtrnmc.f90 and rrtmg_lw_rtrnmr.f90:
Code:
currently using this:
 
      do ibnd = 1,nbndlw
        if (ibnd.eq.1 .or ibnd.eq.4 .or. ibnd.ge.10) then
          secdiff(ibnd) = 1.66_r8
        else
          secdiff(ibnd) = a0(ibnd) + a1(ibnd)*exp(a2(ibnd)*pwvcm)
        endif
      enddo
 
      if (pwvcm.lt.1.0) secdiff(6) = 1.80_r8
      if (pwvcm.gt.7.1) secdiff(7) = 1.50_r8
 
change to:
 
      do ibnd = 1,nbndlw
        if (ibnd.eq.1 .or. ibnd.eq.4 .or. ibnd.ge.10) then
          secdiff(ibnd) = 1.66_r8
        else
          secdiff(ibnd) = a0(ibnd) + a1(ibnd)*exp(a2(ibnd)*pwvcm)
          if (secdiff(ibnd) .gt. 1.80_r8) secdiff(ibnd) = 1.80_r8
          if (secdiff(ibnd) .lt. 1.50_r8) secdiff(ibnd) = 1.50_r8
        endif
      enddo
we're in the process of formally getting this fix into CAM...hope this helps -mathew  
 

hch081@uib_no

New Member
Hello, May I know if there is any update about this bug-fix?As I am running the AGCM version of CESM1.0 with a 50-km horizontal resolution (f05_f05), the future climate runs have a similar crash. QNEG3 from vertical diffusion/SO2:m=  8 lat/lchnk= 8950 Min. mixing ratio violated at    4 points.  Reset to  1.0E-36 Worst =-1.2E-10 at i,k=   4 30 dpcoup dqreq           3        9147          15 -6.768938331155651E-035  3.207403706088607E-034  5.779655791372648E-035forrtl: severe (174): SIGSEGV, segmentation fault occurredImage              PC                Routine            Line        Source             ccsm.exe           000000000069A497  rrtmg_lw_rtrnmc_m         376  rrtmg_lw_rtrnmc.f90ccsm.exe           000000000069528A  rrtmg_lw_rad_mp_r         494  rrtmg_lw_rad.f90ccsm.exe           0000000000583AC0  radlw_mp_rad_rrtm         272  radlw.F90ccsm.exe           0000000000572004  radiation_mp_radi        1099  radiation.F90ccsm.exe           000000000085AD4B  tphysbc_                  475  tphysbc.F90ccsm.exe           00000000005405BF  physpkg_mp_phys_r         673  physpkg.F90ccsm.exe           0000000000485D82  cam_comp_mp_cam_r         236  cam_comp.F90ccsm.exe           000000000047A61E  atm_comp_mct_mp_a         549  atm_comp_mct.F90ccsm.exe           0000000000410AD3  ccsm_comp_mod_mp_        2168  ccsm_comp_mod.F90ccsm.exe           0000000000421E55  MAIN__                     91  ccsm_driver.F90ccsm.exe           000000000040EE8C  Unknown               Unknown  Unknownlibc.so.6          00002AAAAC2E4C36  Unknown               Unknown  Unknownccsm.exe           000000000040ED89  Unknown               Unknown  Unknown Best Regards,Hoffman 
 

hch081@uib_no

New Member
Hello, May I know if there is any update about this bug-fix?As I am running the AGCM version of CESM1.0 with a 50-km horizontal resolution (f05_f05), the future climate runs have a similar crash. QNEG3 from vertical diffusion/SO2:m=  8 lat/lchnk= 8950 Min. mixing ratio violated at    4 points.  Reset to  1.0E-36 Worst =-1.2E-10 at i,k=   4 30 dpcoup dqreq           3        9147          15 -6.768938331155651E-035  3.207403706088607E-034  5.779655791372648E-035forrtl: severe (174): SIGSEGV, segmentation fault occurredImage              PC                Routine            Line        Source             ccsm.exe           000000000069A497  rrtmg_lw_rtrnmc_m         376  rrtmg_lw_rtrnmc.f90ccsm.exe           000000000069528A  rrtmg_lw_rad_mp_r         494  rrtmg_lw_rad.f90ccsm.exe           0000000000583AC0  radlw_mp_rad_rrtm         272  radlw.F90ccsm.exe           0000000000572004  radiation_mp_radi        1099  radiation.F90ccsm.exe           000000000085AD4B  tphysbc_                  475  tphysbc.F90ccsm.exe           00000000005405BF  physpkg_mp_phys_r         673  physpkg.F90ccsm.exe           0000000000485D82  cam_comp_mp_cam_r         236  cam_comp.F90ccsm.exe           000000000047A61E  atm_comp_mct_mp_a         549  atm_comp_mct.F90ccsm.exe           0000000000410AD3  ccsm_comp_mod_mp_        2168  ccsm_comp_mod.F90ccsm.exe           0000000000421E55  MAIN__                     91  ccsm_driver.F90ccsm.exe           000000000040EE8C  Unknown               Unknown  Unknownlibc.so.6          00002AAAAC2E4C36  Unknown               Unknown  Unknownccsm.exe           000000000040ED89  Unknown               Unknown  Unknown Best Regards,Hoffman 
 

mvr

Member
Hi...The bug fix has been applied to newer versions of the code and tagged, but that is not the case for older release versions.  It is recommended that you apply the fix by making the code modifications mentioned earlier in this thread.  Hope that helps,-mathew  
 

mvr

Member
Hi...The bug fix has been applied to newer versions of the code and tagged, but that is not the case for older release versions.  It is recommended that you apply the fix by making the code modifications mentioned earlier in this thread.  Hope that helps,-mathew  
 

hch081@uib_no

New Member
Hi Mathew,  Thank you very much!After applying the code changes mentioned earlier in this thread, the restart run does not crash.Is it correct to say that these code changes do not affect any physics or parameterization in the model? Best Regards,Hoffman   
 

hch081@uib_no

New Member
Hi Mathew,  Thank you very much!After applying the code changes mentioned earlier in this thread, the restart run does not crash.Is it correct to say that these code changes do not affect any physics or parameterization in the model? Best Regards,Hoffman   
 

mvr

Member
Hi...This does not affect any parameterization, but the physics does change in the sense that it corrects a problem and it will be answer-changing.  Hope that clarifies and glad to hear it solves your problem.-mathew  
 

mvr

Member
Hi...This does not affect any parameterization, but the physics does change in the sense that it corrects a problem and it will be answer-changing.  Hope that clarifies and glad to hear it solves your problem.-mathew  
 
Top