Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

new machine build, run error

I am working to get a new build (cesm1_2_1) running on a linux cluster. The build appears to finish successfully as I get the message:

- Locking file env_build.xml
CESM BUILDEXE SCRIPT HAS FINISHED SUCCESSFULLY

after building the test case. However, upon submitting the job to our que (we use the PBS queuing system) I get a fairly quick error and the expected run files are not appearing:

wythersk@node1084 [~/cases/testcase] $ ls
archive_metadata.sh env_derived README.case user_nl_cam
Buildconf env_mach_pes.xml README.science_support user_nl_cice
CaseDocs env_mach_specific run user_nl_clm
CaseStatus #env_run.xml# SourceMods user_nl_cpl
cesm_setup env_run.xml testcase.build user_nl_pop2
check_case exedir testcase.clean_build user_nl_rtm
check_input_data inputdata testcase.o1084694 xmlchange
create_production_test LockedFiles testcase.run xmlquery
Depends.intel logs testcase.submit
env_build.xml Macros Tools
env_case.xml preview_namelists

I think I’ve tracked down to a problem opening one of the USGS-gtopo30_4x5_remap_c05020.nc file. Here is the section of the cesm.log file that I am referring to: Does this look familiar to anyone, or am I chasing the wrong thing here? Thank you in advance

8 pes participating in computation
-----------------------------------
TASK# NAME
0 node0316
1 node0316
2 node0316
3 node0316
4 node0316
5 node0316
6 node0316
7 node0316
Opened existing file
/home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/inic/fv/cami_0001-01-01
_4x5_L26_c060608.nc 65536
Opened existing file
/home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/topo/USGS-gtopo30_4x5_r
emap_c050520.nc 131072
forrtl: severe (174): SIGSEGV, segmentation fault occurred
 

jedwards

CSEG and Liaisons
Staff member
Hi,
First please consider cesm1.2.2 instead of 1.2.1.    It's new and improved.  :-)    A lot of times when portring to a new machine the problem is your user environment settings, in particular the user stack size limit and data limit.   We recommend setting them both to unlimited.   Use the limit command in csh or the ulimit command in bash to check the limit settings.   
 

jedwards

CSEG and Liaisons
Staff member
Hi,
First please consider cesm1.2.2 instead of 1.2.1.    It's new and improved.  :-)    A lot of times when portring to a new machine the problem is your user environment settings, in particular the user stack size limit and data limit.   We recommend setting them both to unlimited.   Use the limit command in csh or the ulimit command in bash to check the limit settings.   
 
Here is the results from ulimit. Disk was unlimited. I changed stack to unlimited wythersk@node1084 [~/cases/testcase/run] $ ulimit -acore file size          (blocks, -c) 0data seg size           (kbytes, -d) unlimitedscheduling priority             (-e) 0file size               (blocks, -f) unlimitedpending signals                 (-i) 191956max locked memory       (kbytes, -l) unlimitedmax memory size         (kbytes, -m) unlimitedopen files                      (-n) 10000pipe size            (512 bytes, -p) 8POSIX message queues     (bytes, -q) 819200real-time priority              (-r) 0stack size              (kbytes, -s) unlimitedcpu time               (seconds, -t) unlimitedmax user processes              (-u) 1024virtual memory          (kbytes, -v) unlimitedfile locks                      (-x) unlimited wythersk@node1084 [~/cases/testcase/run] $  However, same error on the USGS netCDF file:  Opened existing file  /home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/inic/fv/cami_0001-01-01 _4x5_L26_c060608.nc       65536 Opened existing file  /home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/topo/USGS-gtopo30_4x5_r emap_c050520.nc      131072 forrtl: severe (174): SIGSEGV, segmentation fault occurred Other ideas?
 
Here is the results from ulimit. Disk was unlimited. I changed stack to unlimited wythersk@node1084 [~/cases/testcase/run] $ ulimit -acore file size          (blocks, -c) 0data seg size           (kbytes, -d) unlimitedscheduling priority             (-e) 0file size               (blocks, -f) unlimitedpending signals                 (-i) 191956max locked memory       (kbytes, -l) unlimitedmax memory size         (kbytes, -m) unlimitedopen files                      (-n) 10000pipe size            (512 bytes, -p) 8POSIX message queues     (bytes, -q) 819200real-time priority              (-r) 0stack size              (kbytes, -s) unlimitedcpu time               (seconds, -t) unlimitedmax user processes              (-u) 1024virtual memory          (kbytes, -v) unlimitedfile locks                      (-x) unlimited wythersk@node1084 [~/cases/testcase/run] $  However, same error on the USGS netCDF file:  Opened existing file  /home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/inic/fv/cami_0001-01-01 _4x5_L26_c060608.nc       65536 Opened existing file  /home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/topo/USGS-gtopo30_4x5_r emap_c050520.nc      131072 forrtl: severe (174): SIGSEGV, segmentation fault occurred Other ideas?
 

jedwards

CSEG and Liaisons
Staff member
Can you dump the file using ncdump?  Check that the md5sum matches the expected value:78bff47e307c5fb2395204c9f833a480  /glade/p/cesmdata/cseg/inputdata/atm/cam/topo/USGS-gtopo30_128x256_c050520.ncYou may also get more information by compiling with DEBUG=TRUE and setting core file size to a non-zero value.    
 

jedwards

CSEG and Liaisons
Staff member
Can you dump the file using ncdump?  Check that the md5sum matches the expected value:78bff47e307c5fb2395204c9f833a480  /glade/p/cesmdata/cseg/inputdata/atm/cam/topo/USGS-gtopo30_128x256_c050520.ncYou may also get more information by compiling with DEBUG=TRUE and setting core file size to a non-zero value.    
 
My md5sums look right:md5sum USGS-gtopo30_4x5_remap_c050520.nc 0a0b1d5f9403dd00eebc18c521f27234  USGS-gtopo30_4x5_remap_c050520.nc Here is the dump file:
 
My md5sums look right:md5sum USGS-gtopo30_4x5_remap_c050520.nc 0a0b1d5f9403dd00eebc18c521f27234  USGS-gtopo30_4x5_remap_c050520.nc Here is the dump file:
 

jedwards

CSEG and Liaisons
Staff member
This all looks fine, you are going to need to dig deeper:You may also get more information by compiling with DEBUG=TRUE and setting core file size to a non-zero value. 
 

jedwards

CSEG and Liaisons
Staff member
This all looks fine, you are going to need to dig deeper:You may also get more information by compiling with DEBUG=TRUE and setting core file size to a non-zero value. 
 
Confirming that you mean line 133 in env_run.xml. Change value="0" to value="TRUE"? In additon, I'm not sure where the "core file size" option is changed to a "non-zero" valuewythersk@node1082 [~/cases/testcase] $ grep -n DEBUG env_run.xml  133: 
 
Confirming that you mean line 133 in env_run.xml. Change value="0" to value="TRUE"? In additon, I'm not sure where the "core file size" option is changed to a "non-zero" valuewythersk@node1082 [~/cases/testcase] $ grep -n DEBUG env_run.xml  133: 
 

jedwards

CSEG and Liaisons
Staff member
core file size is one of the limits in your environment, you printed it out a few posts ago.   DEBUG is set in env_build.xml and you should change the value using thexmlchange utility ./xmlchange DEBUG=TRUE  
 

jedwards

CSEG and Liaisons
Staff member
core file size is one of the limits in your environment, you printed it out a few posts ago.   DEBUG is set in env_build.xml and you should change the value using thexmlchange utility ./xmlchange DEBUG=TRUE  
 
Now (with DEBUG TRUE, and core file size set to 1024) I am having trouble with the build process. From: more exedir/atm.bldlog.141007-095202 catastrophic error: **Internal compiler error: segmentation violation signal raised** Please report this error along with the circumstances in which it occurred in a Software Problem Report.  Note: File and line given may not be explicit cause of this error.compilation aborted for /home/reichpb/wythersk/cesm/dev/1.2.1/models/atm/cam/src/dynamics/fv/sw_core.F90 (code 1)gmake: *** [sw_core.o] Error 1gmake: *** Waiting for unfinished jobs.... wythersk@node1084 [~/cases/f45g37_B1850CN] $  Any chance this is related to my original issue?
 
Now (with DEBUG TRUE, and core file size set to 1024) I am having trouble with the build process. From: more exedir/atm.bldlog.141007-095202 catastrophic error: **Internal compiler error: segmentation violation signal raised** Please report this error along with the circumstances in which it occurred in a Software Problem Report.  Note: File and line given may not be explicit cause of this error.compilation aborted for /home/reichpb/wythersk/cesm/dev/1.2.1/models/atm/cam/src/dynamics/fv/sw_core.F90 (code 1)gmake: *** [sw_core.o] Error 1gmake: *** Waiting for unfinished jobs.... wythersk@node1084 [~/cases/f45g37_B1850CN] $  Any chance this is related to my original issue?
 

jedwards

CSEG and Liaisons
Staff member
You failed to report what compiler you are using and you failed to update to cesm1.2.2 as requested.  Please update to 1.2.2 and let us know what compiler you are using.
 

jedwards

CSEG and Liaisons
Staff member
You failed to report what compiler you are using and you failed to update to cesm1.2.2 as requested.  Please update to 1.2.2 and let us know what compiler you are using.
 
Top