Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

forrtl: error (78): process killed (SIGTERM)

I tried to start a branch run with model version cesm1_3_beta01. The error message in cesm.log.141030-124253 showing the following information in several places.

  forrtl: error (78): process killed (SIGTERM)
  Image              PC                Routine            Line        Source
  libpthread.so.0    00002AFA183BA2A5  Unknown               Unknown  Unknown
  libpoe.so          00002AFA1D3A3AE2  Unknown               Unknown  Unknown
  libpthread.so.0    00002AFA183B2851  Unknown               Unknown  Unknown
  libc.so.6          00002AFA1A96690D  Unknown               Unknown  Unknownis there any advice on it? Thanks so much,Ying
 

jedwards

CSEG and Liaisons
Staff member
Yu need to look further up in the log file to find the real error, usually it will be in the lines preceding the first of these sigterm messages.  
 
Thanks for your reply. The preceding lines are something like: INFO: 0031-251  task 3554 exited: rc=1The case directory is /glade/u/home/yingli/cesm_1_2_2/runs/f.FAMIPC5.ne120_ne120.test.007 
 

jedwards

CSEG and Liaisons
Staff member
Thanks for providing the case directory.   I didn't see anything obvious, sometimes it may work to just try it again.   I'm not sure it will work to branch a cesm1_3 run from a cesm1_2_2 run.   If you are going to use 1_3, why beta01?  Maybe beta12 instead?
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
Hi,Please use beta12 or use my modified directory at:/glade/p/work/hannay/cesm_tags/cesm1_3_beta02_mods 
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
From user:
- I also tried the tag beta12, and got error message after invoking ./cesm_step

1) ./create_newcase -case $CASEDIR -res ne120_ne120 -compset F_AMIP_CAM5 -mach yellowstone
2) ./xmlchange RUN_TYPE=branch,RUN_REFDATE=2000-01-01,RUN_REFCASE=FAMIPC5_ne120_79to05_03_omp2,GET_REFCASE=FALSE
./cesm_setup

I got these error messages after invoking ./cesm_setuprtm.buildnml.csh could not find restart file for branch or hybrid startERROR: rtm.buildnml.csh failedERROR: /glade/u/home/yingli/cesm/runs/f.FAMIPC5.ne120_ne120.test.013/preview_namelists failed: 25344
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
Please copy your restart files into the run directory. From the error it seems the restarts files are in in the run directroy:
"rtm.buildnml.csh could not find restart file"
 
I tried this tag: /glade/p/work/hannay/cesm_tags/cesm1_3_beta02_mods,and still got the error message like this:6604:INFO: 06605:INFO: 0031-306  pm_atexit: pm_exit_value is 1.6611:forrtl: error (78): process killed (SIGTERM)6611:Image              PC                Routine            Line        Source6611:libpthread.so.0    00002B9A05BC52A5  Unknown               Unknown  Unknown6611:libpoe.so          00002B9A0ABAEAE2  Unknown               Unknown  Unknown6611:libpthread.so.0    00002B9A05BBD851  Unknown               Unknown  Unknown6611:libc.so.6          00002B9A0817190D  Unknown               Unknown  Unknown6601:forrtl: error (78): process killed (SIGTERM)6601:Image              PC                Routine            Line        Source6601:libpthread.so.0    00002AE3E4B0E2A5  Unknown               Unknown  Unknown6601:libpoe.so          00002AE3E9AF7AE2  Unknown               Unknown  Unknown6601:libpthread.so.0    00002AE3E4B06851  Unknown               Unknown  Unknown6601:libc.so.6          00002AE3E70BA90D  Unknown               Unknown  Unknown 031-306  pm_atexit: pm_exit_value is 1. Here is the case directory: /glade/u/home/yingli/cesm/runs/f.FAMIPC5.ne120_ne120.test.014
 

jedwards

CSEG and Liaisons
Staff member
There is an error reading the ice restart file.    Try changing ICE_PIO_TYPENAME to netcdf.   
 
Hi Jedwards,Thanks for your reply. Where do you get the message on the error reading the ice restart file? I didn't find it in cesm.log and ice.log files. So I've tried changed ICE_PIO_TYPENAME to netcdf in env_run.xmlAnd the run still crashedThe new case directory is /glade/u/home/yingli/cesm/runs/f.FAMIPC5.ne120_ne120.test.015Any help?Thanks,Ying
 

jedwards

CSEG and Liaisons
Staff member
I found this in the CESM log:4097: pio_support::pio_die:: myrank=          -1 : ERROR: pionfread_mod.F90.in:
4097:         200 : NetCDF: Start+count exceeds dimension boundBut I think I see something else that may be contributing to the problem:  In your env_mach_pes.xml file please change NTASKS_* so that they are multiples of 15.   That is change 8192 to 8190 and 4096 to 4095.   You need to do a cesm_setup -clean; cesm_setup; $CASE.clean_build; $CASE.build   and try again. 
 
 I tried changing those number, and the run crashed again. 4096:CalcWorkPerBlock: Total blocks: 11606 Ice blocks: 11606 IceFree blocks:     0 Land blocks:     04097: pio_support::pio_die:: myrank=          -1 : ERROR: pionfread_mod.F90.in:4097:         200 : NetCDF: Start+count exceeds dimension bound4097:Abort(1) on node 4097 (rank 4097 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 4097 4097:INFO: 0031-306  pm_atexit: pm_exit_value is 1
 

jedwards

CSEG and Liaisons
Staff member
The meaing of the error that you are seeing is that the size of the array you are trying to read is inconsistant with what the model expects.
 

jedwards

CSEG and Liaisons
Staff member
I mean that the dimensions of whatever variable is being read when that error appears are not consistant with what the model things they should be.
 
1) After copying the restart files into the run directory, this issue is solved. 2) But I got a new error using tag /glade/p/cesmdata/cseg/.dev/cesm1_3_beta12: ERROR: in _validate_pair (package Build::NamelistDefinition): Variable name soil_erod not found in /glade/p/cesmdata/cseg/.dev/cesm1_3_beta12/models/atm/cam/bld/namelist_files/namelist_definition.xml ERROR: cam.buildnml.csh failedSo I removed the parameter "soil_erod      = '/glade/p/cesm/cseg//inputdata/atm/cam/dst/dst_1.9x2.5_c090203.nc'" in user_nl_cam. 3) And then I got this new error message:2491: pio_support::pio_die:: myrank=          -1 : ERROR: nf_mod.F90:         679 :2491: Variable not found5476:Abort(1) on node 5476 (rank 5476 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 54765476:INFO: 0031-306  pm_atexit: pm_exit_value is 1.
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
I am coyping your last message here as the reply seems to be out of order.

1) After copying the restart files into the run directory, this issue is solved. 2) But I got a new error using tag /glade/p/cesmdata/cseg/.dev/cesm1_3_beta12: ERROR: in _validate_pair (package Build::NamelistDefinition): Variable name soil_erod not found in /glade/p/cesmdata/cseg/.dev/cesm1_3_beta12/models/atm/cam/bld/namelist_files/namelist_definition.xml ERROR: cam.buildnml.csh failedSo I removed the parameter "soil_erod      = '/glade/p/cesm/cseg//inputdata/atm/cam/dst/dst_1.9x2.5_c090203.nc'" in user_nl_cam. 3) And then I got this new error message:2491: pio_support::pio_die:: myrank=          -1 : ERROR: nf_mod.F90:         679 :2491: Variable not found5476:Abort(1) on node 5476 (rank 5476 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 54765476:INFO: 0031-306  pm_atexit: pm_exit_value is 1.
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
The reason your got an error for the soil erodability file is that the namelist variables cahnged between beta02 and beta12.The variable name cahned from soil_erod to  soil_erod_file In beta12, you should add:soil_erod_file      = '/glade/p/cesm/cseg//inputdata/atm/cam/dst/dst_1.9x2.5_c090203.nc'
I am not sure about the second error:
pio_support::pio_die It seems that are some incompatbility with the array you are trying to read and what CESM expects the size to be.  
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
/glade/u/home/yingli/cesm/runs/f.FAMIPC5.ne120_ne120.test.015I see that you are trying to do a branch run. I think you should do a startup run in which you prescribe the SSTs and you set the land initial file.
There is no benefit to try to do a branch run in your case. A branch is much more restrictive. In user_nl_clm:
finidat = '/glade/p/cesm/amwg/hannay/inputdata/FAMIPC5_ne120_79to05_03_omp2/rest/2000-01-01-00000/FAMIPC5_ne120_79to05_03_omp2.clm2.r.2000-01-01-00000.nc'MAke sure you are setting SSTICE_DATA_FILENAME and teh correpsonding variables for SST/ICE in env_run.xml  
 
Thanks for your timly reply. I tried the start-up run with both tags and have successfully archived history files.

I looked at the notes, and have several followup quesitons.  

-"start-up: all model components are initialized from basic default initial condition"
So can we also change the initial condition in the startup run?

-"set finidat: options for hybrid or branch one"
I looked into the log file, and find "Opened existing file FAMIPC5_ne120_79to05_03_omp2.clm2.r.2000-01-01-00000.nc". So it seems that the code did read the land initial file.
So Can finidat be used in the startup run? 

- In user_nl_cam,
Is it necessary to add the following? Or in what situation do we need to set the ncdata? 
ncdata = '/glade/p/cesm/amwg/hannay/inputdata/FAMIPC5_ne120_79to05_03_omp2/rest/2000-01-01-00000/FAMIPC5_ne120_79to05_03_omp2.cam.i.2000-01-01-00000.nc'

What is the difference between finidat and ncdata? Does ncdata specifically refers to the atmospheric initial condition, but finidat is more general? 
 
Top