Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

segmentation fault error using "mksurfdata.pl"

xgao304

Member
Dear Scientists,

My previous post is probably missed, so I am re-posting the question.

I am trying to create surface data on 0.5 degree with the following command, but keep getting the error message "forrtl: severe (174): SIGSEGV, segmentation fault occurred".

./mksurfdata.pl -res 360x720cru -y 2000

....
(gridmap_map_read) * file name : /net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/mappingdata/maps/360x720/map_0.25x0.25_MODIS_to_360x720cru_nomask_aave_da_c170321.nc
* matrix dimensions rows x cols : 1036800 x 259200
* number of non-zero elements: 545989
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
mksurfdata_map 000000000055E841 Unknown Unknown Unknown
mksurfdata_map 000000000055C97B Unknown Unknown Unknown
mksurfdata_map 00000000005136C4 Unknown Unknown Unknown
mksurfdata_map 00000000005134D6 Unknown Unknown Unknown
mksurfdata_map 00000000004AC059 Unknown Unknown Unknown
mksurfdata_map 00000000004B0786 Unknown Unknown Unknown
libpthread-2.23.s 00007FCC4F1C1C20 Unknown Unknown Unknown
mksurfdata_map 000000000044CE57 mkpftmod_mp_mkpft 563 mkpftMod.F90
mksurfdata_map 000000000046B81D MAIN__ 564 mksurfdat.F90
mksurfdata_map 000000000040A2AE Unknown Unknown Unknown
libc-2.23.so 00007FCC4EE0F731 __libc_start_main Unknown Unknown
mksurfdata_map 000000000040A1A9 Unknown Unknown Unknown
ERROR in mksurfdata_map: 44544

I also tried to add "ulimit -s unlimited" to avoid memory issue based on forrtl: severe (174): SIGSEGV, segmentation fault occurred.
However, it does not help.

I am running the job on our own cluster, and the job is submitted to the queue. My submitting script is as follows:

-------
#!/bin/csh
#
#SBATCH --partition=edr
#SBATCH -n 32
#SBATCH --mem=0
#SBATCH --reservation=xgao_test
#SBATCH --time=1:00:00 # format is DAYS-HOURS:MINUTES:SECONDS
#SBATCH --job-name=testmksurf
#SBATCH --output=testmksurf.out
# End of options
#=======================================================================

cd /net/fs05/d1/xgao/cesm2.1.3/cesm/components/clm/tools/mksurfdata_map
source /etc/profile.d/modules.csh
module load intel
module load netcdf

ulimit -s unlimited

./mksurfdata.pl -res 360x720cru -y 2000

exit
-------

I am not sure if the problem is related to our system. Any information is greatly appreciated.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I tried this on cheyenne using cesm2.1.3 and it worked fine, so I guess it must be a system problem.
 

xgao304

Member
Based on your experience, do you think that could be a memory issue? Any insight that could help us diagnose
the issue? Thanks.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Do the input pft and map files exist and are valid netcdf files?
I'm not sure what to suggest otherwise. Pinging @erik in case he has any ideas.
 

xgao304

Member
Wouldn't that give the error message like "there is no such file" or "file is not found" if that is the case?

I checked and at least the following pft and map files all exist:


map_ftopostats = '/net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/mappingdata/maps/360x720/map_1km-merge-10min_HYDRO1K-merge-nomask_to_360x720_nomask_aave_da_c130403.nc'
mksrf_ftopostats = '/net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/rawdata/mksrf_topostats_1km-merge-10min_HYDRO1K-merge-nomask_simyr2000.c130402.nc'
mksrf_fvegtyp = '/net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/rawdata/pftcftdynharv.0.25x0.25.LUH2.histsimyr1850-2015.c170629/mksrf_landuse_histclm50_LUH2_2000.c170629.nc'
mksrf_fhrvtyp = '/net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/rawdata/pftcftdynharv.0.25x0.25.LUH2.histsimyr1850-2015.c170629/mksrf_landuse_histclm50_LUH2_2000.c170629.nc'
mksrf_fsoicol = '/net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/rawdata/pftcftlandusedynharv.0.25x0.25.MODIS.simyr1850-2015.c170412/mksrf_soilcolor_CMIP6_simyr2005.c170623.nc'
mksrf_flai = '/net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/rawdata/pftcftlandusedynharv.0.25x0.25.MODIS.simyr1850-2015.c170412/mksrf_lai_78pfts_simyr2005.c170413.nc'


Thanks.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I was just wondering if one or more of your input files was corrupted.
Does an ncdump -h on these two files work:

/net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/mappingdata/maps/360x720/map_0.25x0.25_MODIS_to_360x720cru_nomask_aave_da_c170321.nc

/net/fs05/d1/xgao/cesm2.1.3/inputdata/lnd/clm2/rawdata/pftcftdynharv.0.25x0.25.LUH2.histsimyr1850-2015.c170629/mksrf_landuse_histclm50_LUH2_2000.c170629.nc
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
A seg-fault implys some type of memory error. I'd suggest for one to make sure you compile with the debugging flags turned on so it will give you a better traceback. Although it does look like you should look at line 563 of mkpftMod.F90 to see what's happening there. If you aren't sure about how to do that, it should be documented in the README files and User's Guide, but if you have trouble with it ping me again, and I can look it up.

The other thing to try is to see if this is just a problem with that resolution or with all resolutions. So try some courser grids like 10x15 as well as single point grids such as 1x1_brazil. Our standard grid is 0.9x1.25 so try that as well. If everything fails it's more likely to be a problem with your particular machine, but if only some grids fail it might be something to do with the specific grid. And perhaps the issue is something for higher resolution grids.

The other thing to try is a different compiler or machine. We only test this on the cheyenne supercomputer with the intel compiler. So if you can get access to it, that would be a good solution. But, also getting something as close to that configuration as possible could help.

The last thing is to get help from your system admins on the machine you are using. They might have some ideas to try for your particular machine.
 

xgao304

Member
Thanks for the detail instructions.

Do you have a rough estimate about what amount of RAM is needed to run at 720x360 resolution?
 
Top