Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Segmentation fault after resubmission. F1850.f19_f19_mg17 with additional output variables.

dliu

Dunyu Liu
New Member
Dear all,

I am running a CESM2.1.3 model with compset F1850 and resolution f19_f19_mg17 on Lonestar6 at TACC, UT Austin. In additional to the default setup, I added a few hundreds variables to be output in the user_nl_cam (attached). The model ran successfully for 20 simulation years. But upon the first resubmission, the code failed when opening existing file b.e21.F1850.f19.f19.mg17.001.cam.rs.0021-01-01-00000.nc and showed segmentation fault. The error log is also attached and the error could be found at line 1574.

Would anyone please show me any clues on how to fix the issue?

Many thanks and very appreciate your time and help!

Best regards,
Dunyu Liu

1655330914041.png
 

Attachments

  • user_nl_cam.txt
    4.7 KB · Views: 6
  • cesm.log.241483.220615-141911.txt
    605 KB · Views: 5

jedwards

CSEG and Liaisons
Staff member
I'm not sure that this indicates a problem in the cam.rs file. If you look a little further down in the cesm.log you are getting a traceback to rrtmg code:

==== backtrace (tid:3755838) ====
0 0x0000000000012b30 .annobin_sigaction.c() sigaction.c:0
1 0x0000000000e9a12b rrtmg_sw_taumol_mp_taumol_sw_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/physics/rrtmg/aer_src/rrtmg_sw_taumol.f90:288
2 0x0000000000e93acc rrtmg_sw_spcvmc_mp_spcvmc_sw_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/physics/rrtmg/aer_src/rrtmg_sw_spcvmc.f90:300
3 0x0000000000e8ae5f rrtmg_sw_rad_mp_rrtmg_sw_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/physics/rrtmg/aer_src/rrtmg_sw_rad.f90:497
4 0x00000000007291fa radsw_mp_rad_rrtmg_sw_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/physics/rrtmg/radsw.F90:515
5 0x000000000070b411 radiation_mp_radiation_tend_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/physics/rrtmg/radiation.F90:1105
6 0x00000000006c5189 physpkg_mp_tphysbc_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/physics/cam/physpkg.F90:2261
7 0x00000000006be3d8 physpkg_mp_phys_run1_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/physics/cam/physpkg.F90:1054
8 0x0000000000500c7c cam_comp_mp_cam_run1_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/control/cam_comp.F90:258
9 0x00000000004f8452 atm_comp_mct_mp_atm_init_mct_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/components/cam/src/cpl/atm_comp_mct.F90:292
10 0x000000000043c304 component_mod_mp_component_init_cc_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/cime/src/drivers/mct/main/component_mod.F90:267
11 0x00000000004308a7 cime_comp_mod_mp_cime_init_() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/cime/src/drivers/mct/main/cime_comp_mod.F90:2015
12 0x0000000000439329 MAIN__() /work/07931/dunyuliu/group_share/CESM2.1.3/my_cesm_sandbox_lonestar6/cime/src/drivers/mct/main/cime_driver.F90:114
13 0x000000000041b822 main() ???:0
14 0x00000000000234a3 __libc_start_main() ???:0
15 0x000000000041b72e _start() ???:0
 

jedwards

CSEG and Liaisons
Staff member
All I can suggest is to look at diagnostics at the end of the first 20 years and see if anything looks wrong. Also sometimes these are system errors and you should just resubmit and see if you get the same result.
 

dliu

Dunyu Liu
New Member
All I can suggest is to look at diagnostics at the end of the first 20 years and see if anything looks wrong. Also sometimes these are system errors and you should just resubmit and see if you get the same result.
Thanks much, Jim! The error repeats. I will take a further look at the error.
 

dliu

Dunyu Liu
New Member
Could you post your atm.log file?
Hi Jim, seems the problem is related to the variables I put in there. After I replace them with another set of variables, the code can restart. Attached is the working user_nl_cam for reference. Thanks! -Dunyu
 

Attachments

  • user_nl_cam.txt
    5.4 KB · Views: 9

dliu

Dunyu Liu
New Member
Could you post your atm.log file?
Hi Jim, this error with rrtmg code occurred with another model --compset BSSP370cmip6 --res f09_g17. Changing output variables in username list didn't work this time. The problem is the same - I can run the code for 3 years from the start but then failed to restart the model. Below is the errror log in the cesm log. Is there any change associated with the rrtmg code among different cesm versions? I noticed the BSSP370cmip6 was orignally run with cesmv2.1.4-exp02. And by any chance, is there a working cesm 2.1.3 installation on ls6? Many thanks! -Dunyu

==== backtrace (tid:2163119) ====
0 0x0000000000012b30 .annobin_sigaction.c() sigaction.c:0
1 0x0000000000e99f9b rrtmg_sw_taumol_mp_taumol_sw_() *CESM2.1.3/my_cesm_sandbox/components/cam/src/physics/rrtmg/aer_src/rrtmg_sw_taumol.f90:288
2 0x0000000000e9393c rrtmg_sw_spcvmc_mp_spcvmc_sw_() *CESM2.1.3/my_cesm_sandbox/components/cam/src/physics/rrtmg/aer_src/rrtmg_sw_spcvmc.f90:300
3 0x0000000000e8accf rrtmg_sw_rad_mp_rrtmg_sw_() *CESM2.1.3/my_cesm_sandbox/components/cam/src/physics/rrtmg/aer_src/rrtmg_sw_rad.f90:497
4 0x000000000072904a radsw_mp_rad_rrtmg_sw_()
*CESM2.1.3/my_cesm_sandbox/components/cam/src/physics/rrtmg/radsw.F90:515
5 0x000000000070b261 radiation_mp_radiation_tend_()
*CESM2.1.3/my_cesm_sandbox/components/cam/src/physics/rrtmg/radiation.F90:1105
6 0x00000000006c51c9 physpkg_mp_tphysbc_()
*CESM2.1.3/my_cesm_sandbox/components/cam/src/physics/cam/physpkg.F90:2261
7 0x00000000006be418 physpkg_mp_phys_run1_()
*CESM2.1.3/my_cesm_sandbox/components/cam/src/physics/cam/physpkg.F90:1054
8 0x0000000000500d3c cam_comp_mp_cam_run1_()
*CESM2.1.3/my_cesm_sandbox/components/cam/src/control/cam_comp.F90:258
9 0x00000000004f8512 atm_comp_mct_mp_atm_init_mct_()
*CESM2.1.3/my_cesm_sandbox/components/cam/src/cpl/atm_comp_mct.F90:292
10 0x000000000043c3c4 component_mod_mp_component_init_cc_() *CESM2.1.3/my_cesm_sandbox/cime/src/drivers/mct/main/component_mod.F90:267
11 0x0000000000430967 cime_comp_mod_mp_cime_init_()
*CESM2.1.3/my_cesm_sandbox/cime/src/drivers/mct/main/cime_comp_mod.F90:2015
12 0x00000000004393e9 MAIN__() *CESM2.1.3/my_cesm_sandbox/cime/src/drivers/mct/main/cime_driver.F90:114
13 0x000000000041b8e2 main() ???:0
14 0x00000000000234a3 __libc_start_main() ???:0
15 0x000000000041b7ee _start() ???:0
 

jedwards

CSEG and Liaisons
Staff member
In the cesm version naming convention for a case in CESM A.B.C you should use the most recent version
for which A and B are the same as the base case. So if you are already using 2.1.4 you do not want to go back to 2.1.3.
The latest version in the 2.1 series is cesm2.1.4-rc.11

I'm not aware of any issues associated with lonestar6, but I'll have another look.
 

dliu

Dunyu Liu
New Member
In the cesm version naming convention for a case in CESM A.B.C you should use the most recent version
for which A and B are the same as the base case. So if you are already using 2.1.4 you do not want to go back to 2.1.3.
The latest version in the 2.1 series is cesm2.1.4-rc.11

I'm not aware of any issues associated with lonestar6, but I'll have another look.
Thanks! It seems I missed to add the flag -user-dir-mods when creating the case. After adding it back, the model can restart now. Thanks much! -Dunyu
 
Top