Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

branch run of B1850C5CN case

lyu4@gmu_edu

New Member
Hi all, There is a contral run (referred as original control run in the following) on yellowstone server. The case name is "b.e11.B1850C5CN.f09_g16.005".  The length is 2200 years.I am trying to run a branch run based on this contral run starting from year 1201. I have downloaded the restart files from HPSS and put them under the run directory of my run.The variables in env_run I changed are:id="RUN_TYPE"   value="branch"id="RUN_REFCASE"   value="b.e11.B1850C5CN.f09_g16.005"id="RUN_REFDATE"   value="1201-01-01"id="GET_REFCASE"   value="FALSE" The model stopped in the initialization section. Then, I changed "branch" to "hybrid". At this time, the model did run successfully for 5 month (since it is only a test run, I set the time to 5 months.), but the output for hybrid is different from the original control run on the yellowstone server. My questions are:why is the branch failed?why is hybrid different from the original control run? Thank you.Liang 
 

jedwards

CSEG and Liaisons
Staff member
The reason that the branch run failed is almost certainly answered through an examination of the resulting log files.    The reason that the answers are not the same is because the compilers and other supporting software on the system has changed since thetime of that control run.   This will introduce small differences in each calculation and even if these differences start out very small they will grow in time.
 

lyu4@gmu_edu

New Member
Hi Jedwards, Thank you for your response.I only find three log files (attached below). From atm and cpl log files, I cannot find errors. From cesm log file, I find that error starts from line 27480. I also paste a few lines before Line 27480 as follows:27479    1: NetCDF: Variable not found27480    1: pio_support::pio_die:: myrank=          -1 : ERROR: nf_mod.F90:         679 :27481    1: NetCDF: Variable not found27482    1:Abort(1) on node 1 (rank 1 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 1) - proce      ss 127483    1:INFO: 0031-306  pm_atexit: pm_exit_value is 1.27484 INFO: 0031-251  task 1 exited: rc=127485 ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1 in task 127486  628:forrtl: error (78): process killed (SIGTERM) It says "NetCDF: Variable not found". I think it means variables in atm restart file as menstioned in Line 26875, which is pasted below:26875    1: Opened existing file b.e11.B1850C5CN.f09_g16.005.cam.r.1201-01-01-00000.nc I have checked the above atm restart file. It has the same name and size as the restart file on HPSS. I think this file is correct.What do you think the error might be? I am using CESM1.2.2. The version created those restart files should be older than the verstion I am using. 
 

jedwards

CSEG and Liaisons
Staff member
Restart files are not compatible between the cam version you are using and the one made to do the control run.   Starting from hybrid is the correct solution.
 

lyu4@gmu_edu

New Member
I really appreciate your help.Now, I can run branch run successfully by using cesm 1.1.1. Unfortunately, my result is still different from the control run. I want a bit by bit result, so I don't need to run a control by myself. To do so, I do need more infomation about how the b.e11.B1850C5CN.f09_g16.005 control run was done. Can you offer me that information? 
 
Hi, Jedwards,I encountered a similar problem as Liang. I also tried to make a branch run from the year 1301 of b.e11.B1850C5CN.f09_g16.005. However, the ccsm.log file reports the following error after initialization:" rtm ymd= 13010101 rtm tod= 1800 sync ymd= 13010101 sync tod= 10800 (shr_sys_abort)ERROR: rof_run_mct :: RTM clock is not in sync with Master Sync clock"Would you have any suggestions to solve this problem?Thank you,Fukai
 
Dear Fukai,I also have a same proble as you met:"(shr_sys_abort) ERROR: rof_run_mct                     :: RTM clock is not in sync with Master Sync clock"
I want to know if you have figure it out? Thank you!
Best,Dachao
 
Top