Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

a problem when running F_1850_CAM5 on jet.rdhpcs.noaa.gov machine

I am trying to run F_1850_CAM5 on jet.rdhpcs.noaa.gov machine which is GNU/Linux, has ifort compiler 11.1 and /opt/mvapich2/1.4.1 installed.

the model can be compiled successfully and the executable file ccsm.exe also has been generated with the following compile options.


FC := mpif90
CC := mpicc

CFLAGS := $(CPPDEFS)
FIXEDFLAGS := -132
FREEFLAGS :=
FFLAGS := $(CPPDEFS) -g -fp-model precise -convert big_endian -assume byterecl -ftz -traceback
FFLAGS_NOOPT := $(FFLAGS) -O0
FFLAGS_OPT := -O2
LDFLAGS :=
AR := ar
MOD_SUFFIX := mod
CONFIG_SHELL :=

However, while running, the model crashes rapidly with the following error messages:

Writing jet.cice.r.0001-01-06-00000.nc
Writing jet.cice.r.0001-01-06-00000.nc
Opened file jet.cam2.r.0001-01-06-00000.nc to write 35
MPI process (rank: 9) terminated unexpectedly on n308
MPI process (rank: 33) terminated unexpectedly on n311
MPI process (rank: 17) terminated unexpectedly on n309
MPI process (rank: 49) terminated unexpectedly on n306
Exit code -5 signaled from n305
MPI process (rank: 25) terminated unexpectedly on n310
MPI process (rank: 41) terminated unexpectedly on n307
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
ccsm.exe 000000000169D4C5 Unknown Unknown Unknown
ccsm.exe 0000000001680385 Unknown Unknown Unknown
ccsm.exe 000000000167C24E Unknown Unknown Unknown
ccsm.exe 000000000168E91C Unknown Unknown Unknown
ccsm.exe 0000000001578AA8 Unknown Unknown Unknown
ccsm.exe 000000000155D305 Unknown Unknown Unknown
ccsm.exe 00000000015617BA Unknown Unknown Unknown
ccsm.exe 00000000004D294D phys_buffer_mp_pb 552 phys_buffer.F90
ccsm.exe 0000000000537C68 restart_physics_m 236 restart_physics.F90
ccsm.exe 000000000049ED64 cam_restart_mp_ca 249 cam_restart.F90
ccsm.exe 000000000046817D cam_comp_mp_cam_r 397 cam_comp.F90
ccsm.exe 000000000045A4D2 atm_comp_mct_mp_a 528 atm_comp_mct.F90
ccsm.exe 0000000000413300 MAIN__ 2034 ccsm_driver.F90
ccsm.exe 0000000000406FEC Unknown Unknown Unknown
libc.so.6 00002B15FF4D9994 Unknown Unknown Unknown
ccsm.exe 0000000000406EF9 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
....


any suggestions are appreciated, thank you very much.
 

eaton

CSEG and Liaisons
The model appears to be failing while trying to write the restart file. Sometimes this is due to hitting a memory limit. I'd try some simple tests like running a lower resolution, or running the same resolution with more tasks on more nodes to increase the available memory and reduce the per task memory requirement.
 
Top