Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Runtime problems, porting to a new Linux cluster

I am porting CESM 1.2.0 to a new Linux cluster at U.Albany.I have successfuly compiled and ran complete test cases with -compset X and -compset S.My problem is that for every other case that I have tried (including -compsets B, E and F at various resolutions), I get a successful build, but the model fails at runtime.Here is the last few lines of an example cesm.log file from one of my failed runs (in this case -compset B, but I get the same errors for any other compset I've tried):  Opened existing file b40.1850.track1.1deg.006.cam.i.0863-01-01-00000.nc       65536 Opened existing file  /data/rose_scr/cesm_inputdata/atm/cam/topo/USGS-gtopo30_0.9x1.25_remap_c051027. nc      131072--------------------------------------------------------------------------mpirun noticed that process rank 84 with PID 59514 on node snow-04 exited on signal 11 (Segmentation fault).--------------------------------------------------------------------------3 total processes killed (some possibly by mpirun during cleanup) The crash does NOT always occur on the same node. It DOES always seems to occur after reading the topography input file, as above. I have checked the input file (which I obtained from svn repository at NCAR) and it seems fine (e.g. I can open and view it with ncview).I am using the latest version 14 of the intel compilers, and netcdf libraries that were build with the same compilers.Any sugggestions?Thanks,Brian 
 
I have isolated the error to the FV dynamical core of CAM.The crash occurs during execution of subroutine cam_initial() in $CCSMROOT/models/atm/cam/src/dynamics/fv/inital.F90 The same runtime error occurs for any compset that includes the finite volume CAM. A test configuration using a different dynamical core builds and runs to completion successfully (-compset F_AMIP_CAM5  -res ne30np4_gx1v6)I am using version 14.0.1 of the intel compilers, and openmpi 1.6.4. Any suggestions as to what is causing the finite volume CAM model to fail? 
 
I have isolated the error to the FV dynamical core of CAM.The crash occurs during execution of subroutine cam_initial() in $CCSMROOT/models/atm/cam/src/dynamics/fv/inital.F90 The same runtime error occurs for any compset that includes the finite volume CAM. A test configuration using a different dynamical core builds and runs to completion successfully (-compset F_AMIP_CAM5  -res ne30np4_gx1v6)I am using version 14.0.1 of the intel compilers, and openmpi 1.6.4. Any suggestions as to what is causing the finite volume CAM model to fail? 
 
I have isolated the error to the FV dynamical core of CAM.The crash occurs during execution of subroutine cam_initial() in $CCSMROOT/models/atm/cam/src/dynamics/fv/inital.F90 The same runtime error occurs for any compset that includes the finite volume CAM. A test configuration using a different dynamical core builds and runs to completion successfully (-compset F_AMIP_CAM5  -res ne30np4_gx1v6)I am using version 14.0.1 of the intel compilers, and openmpi 1.6.4. Any suggestions as to what is causing the finite volume CAM model to fail? 
 

jedwards

CSEG and Liaisons
Staff member
Hi Brian,
Have you tried compiling with DEBUG=TRUE in the env_build.xml ?   If it works in this mode then you can try reducing the optimization for just the fv-dycore files or even just the inital.F90 file.   You may be able to run by just reducing the optimazation of inital.F90.   We haven't updated yet to intel 14.x, it's certainly possible that you've uncovered a new compiler bug.
 

jedwards

CSEG and Liaisons
Staff member
Hi Brian,
Have you tried compiling with DEBUG=TRUE in the env_build.xml ?   If it works in this mode then you can try reducing the optimization for just the fv-dycore files or even just the inital.F90 file.   You may be able to run by just reducing the optimazation of inital.F90.   We haven't updated yet to intel 14.x, it's certainly possible that you've uncovered a new compiler bug.
 

jedwards

CSEG and Liaisons
Staff member
Hi Brian,
Have you tried compiling with DEBUG=TRUE in the env_build.xml ?   If it works in this mode then you can try reducing the optimization for just the fv-dycore files or even just the inital.F90 file.   You may be able to run by just reducing the optimazation of inital.F90.   We haven't updated yet to intel 14.x, it's certainly possible that you've uncovered a new compiler bug.
 
Thanks for the suggestions jedwardsIt does indeed compile and run successfully with DEBUG=TRUE. It also compiles and runs successfully with DEBUG=FALSE and FFLAGS:= -O1 (but fails at runtime with -O2 or -O3).I'm afraid that changing optimization settings for only one section of the code is beyond my scripting abilities. I'll be happy to test this if you can post instructions on how to set it up.- Brian
 
Thanks for the suggestions jedwardsIt does indeed compile and run successfully with DEBUG=TRUE. It also compiles and runs successfully with DEBUG=FALSE and FFLAGS:= -O1 (but fails at runtime with -O2 or -O3).I'm afraid that changing optimization settings for only one section of the code is beyond my scripting abilities. I'll be happy to test this if you can post instructions on how to set it up.- Brian
 
Thanks for the suggestions jedwardsIt does indeed compile and run successfully with DEBUG=TRUE. It also compiles and runs successfully with DEBUG=FALSE and FFLAGS:= -O1 (but fails at runtime with -O2 or -O3).I'm afraid that changing optimization settings for only one section of the code is beyond my scripting abilities. I'll be happy to test this if you can post instructions on how to set it up.- Brian
 

jedwards

CSEG and Liaisons
Staff member
Hi Brian,Intel 13.1.2 is what we are currently using.  
To compile a file or files with different compiler flags create a Depends.{machine} or Depends.{compiler} file in your case directory where {machine} or {compiler} matchs the cesm name for your machine or compiler, then put the special Makefile instructions in that file.   So for example to run inital.F90 at reduced optimization you might write a Depends.intel file that looks like:inital.o: inital.F90
    $(FC) -c $(INCLDIR) $(INCS) $(FFLAGS) $(FREEFLAGS) -O0 $<

Note that the space in front of $(FC) should be a tab and that you will need to clean_build or at least touch the inital.F90 file before running build again. - Jim
 

jedwards

CSEG and Liaisons
Staff member
Hi Brian,Intel 13.1.2 is what we are currently using.  
To compile a file or files with different compiler flags create a Depends.{machine} or Depends.{compiler} file in your case directory where {machine} or {compiler} matchs the cesm name for your machine or compiler, then put the special Makefile instructions in that file.   So for example to run inital.F90 at reduced optimization you might write a Depends.intel file that looks like:inital.o: inital.F90
    $(FC) -c $(INCLDIR) $(INCS) $(FFLAGS) $(FREEFLAGS) -O0 $<

Note that the space in front of $(FC) should be a tab and that you will need to clean_build or at least touch the inital.F90 file before running build again. - Jim
 

jedwards

CSEG and Liaisons
Staff member
Hi Brian,Intel 13.1.2 is what we are currently using.  
To compile a file or files with different compiler flags create a Depends.{machine} or Depends.{compiler} file in your case directory where {machine} or {compiler} matchs the cesm name for your machine or compiler, then put the special Makefile instructions in that file.   So for example to run inital.F90 at reduced optimization you might write a Depends.intel file that looks like:inital.o: inital.F90
    $(FC) -c $(INCLDIR) $(INCS) $(FFLAGS) $(FREEFLAGS) -O0 $<

Note that the space in front of $(FC) should be a tab and that you will need to clean_build or at least touch the inital.F90 file before running build again. - Jim
 
Thanks Jim.After a lot of trial and error, I have found the offending file: $CCSMROOT/models/atm/cam/src/dynamics/fv/spmd_dyn.F90I can now compile and run successfully with optimization set to -O2 globally, and the following line in Depends.intelspmd_dyn.o: spmd_dyn.F90$(FC) -c $(INCLDIR) $(INCS) $(FFLAGS) $(FREEFLAGS) -O1 $
 
Thanks Jim.After a lot of trial and error, I have found the offending file: $CCSMROOT/models/atm/cam/src/dynamics/fv/spmd_dyn.F90I can now compile and run successfully with optimization set to -O2 globally, and the following line in Depends.intelspmd_dyn.o: spmd_dyn.F90$(FC) -c $(INCLDIR) $(INCS) $(FFLAGS) $(FREEFLAGS) -O1 $
 
Thanks Jim.After a lot of trial and error, I have found the offending file: $CCSMROOT/models/atm/cam/src/dynamics/fv/spmd_dyn.F90I can now compile and run successfully with optimization set to -O2 globally, and the following line in Depends.intelspmd_dyn.o: spmd_dyn.F90$(FC) -c $(INCLDIR) $(INCS) $(FFLAGS) $(FREEFLAGS) -O1 $
 
Top