Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM 1.0.5 porting runtime issues

ebuzan@odu_edu

New Member
I am trying to port CESM 1.0.5 to one of my university's computing clusters. I can build the model with no errors, but I get several errors at runtime depending on the compset I use.

With compset X, the model appears to run to completion but crashes at the following line in cpl.log:
Write restart file at 10106 0
(seq_rest_write) write rpointer file rpointer.drv

With compset A, the model crashes while initializing the atm component at the following line in atm.log:
(shr_strdata_readnml) reading input namelist file: datm_atm_in
(shr_stream_init) Reading file nyf.giss.T62.stream.txt

With compset FW, the model also crashes while initializing atm at the following point:
(GETFIL): attempting to find local file
f40.2000.track1.4deg.001.cam2.i.0013-01-01-00000.nc
(GETFIL): using
/user/temp/c/ebuza001/inputdata/atm/waccm/ic/f40.2000.track1.4deg.001.cam2.i.00
13-01-01-00000.nc
 

santos

Member
Check the end of ccsm.log and see if there is any other information there. In particular, look for anything containing the string "ERROR" or "ENDRUN", or messages printed by the compiler.
 

ebuzan@odu_edu

New Member
I don't see any useful error information in ccsm.log, just the following block at the end of the log:
forrtl: severe (168): Program Exception - illegal instruction
Image              PC                Routine            Line        Source             
ccsm.exe           00000000012C62C8  Unknown               Unknown  Unknown
libnetcdff.so.5    00002AAAAAED8DF8  Unknown               Unknown  Unknown
libnetcdff.so.5    00002AAAAAEE11B7  Unknown               Unknown  Unknown
ccsm.exe           00000000011D8A8D  Unknown               Unknown  Unknown
ccsm.exe           00000000010F5469  Unknown               Unknown  Unknown
ccsm.exe           00000000004B0DB7  cam_pio_utils_mp_         613  cam_pio_utils.F90
ccsm.exe           00000000005C3F37  startup_initialco          65  startup_initialconds.F90
ccsm.exe           00000000004D718F  inital_mp_cam_ini          84  inital.F90
ccsm.exe           00000000004796D9  cam_comp_mp_cam_i         147  cam_comp.F90
ccsm.exe           000000000047590E  atm_comp_mct_mp_a         272  atm_comp_mct.F90
ccsm.exe           000000000041DC57  ccsm_comp_mod_mp_         684  ccsm_comp_mod.F90
ccsm.exe           0000000000420956  MAIN__                     90  ccsm_driver.F90
ccsm.exe           000000000040ED8C  Unknown               Unknown  Unknown
libc.so.6          00002AAAAC2DE994  Unknown               Unknown  Unknown
ccsm.exe           000000000040EC99  Unknown               Unknown  Unknown
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
 

jedwards

CSEG and Liaisons
Staff member
> forrtl: severe (168): Program Exception - illegal instruction
This would indicate that your compiler is producing instructions incompatable with your CPU.   There are two ways I know of that this can happen
  1. You are compiling on a front-end or login node which has a different chip set than the node you are running on
  2. You are explicitly setting a compiler flag indicating a chip different than the one you are running on.  
 
 

jedwards

CSEG and Liaisons
Staff member
Actually looking at that traceback a little closer I see that the crash is actually occuring inlibnetcdff.so.5  Did you build the netcdf library yourself?   Could it be that this library isn't compatible with your system?
 

ebuzan@odu_edu

New Member
I did build the netcdf library from source since I was having trouble getting the cluster's netcdf module running and to make sure netcdf and the model were built with the same compiler per the user guide. I also ran "make check" on both the Fortran and C libraries and all the tests passed.
 
Top