albert_jornet@ic3_cat
New Member
Hi all,
We want to run cesm 1.2.2 model at our Cluster. For this purpose we downloaded it and compile it for intel 2013_sp1.1.16 with netcdf 4.3.2, netcdf-fortran 4.2 and intel mpi 4.1.3.045 and no threading.
For this purpose we use an example found in the guide:
We want to run cesm 1.2.2 model at our Cluster. For this purpose we downloaded it and compile it for intel 2013_sp1.1.16 with netcdf 4.3.2, netcdf-fortran 4.2 and intel mpi 4.1.3.045 and no threading.
For this purpose we use an example found in the guide:
Code:
./create_newcase -case ~/cesm/EXAMPLE_CASE -compset B_1850_CN -res 0.9x1.25_gx1v6 -mach userdefined<br /><br />And then we ported to our Cluster. So it "successfully" compiles. Then we run with 40 cores with only mpi in order to check that it really works but it throws the error below:<br /><br />...<br />Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokes<br />seq_flds_mod: seq_flds_w2x_fluxes= <br /><br />seq_flds_mod: seq_flds_x2w_states= <br />Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepth<br />seq_flds_mod: seq_flds_x2w_fluxes= <br /><br /> 40 pes participating in computation<br /> -----------------------------------<br /> TASK# NAME<br /> 0 ithaca01<br /> 1 ithaca01<br /> 2 ithaca01<br /> 3 ithaca01<br /> 4 ithaca01<br /> 5 ithaca01<br /> 6 ithaca01<br /> 7 ithaca01<br /> 8 ithaca02<br /> 9 ithaca02<br /> 10 ithaca02<br /> 11 ithaca02<br /> 12 ithaca02<br /> 13 ithaca02<br /> 14 ithaca02<br /> 15 ithaca02<br /> 16 ithaca03<br /> 17 ithaca03<br /> 18 ithaca03<br /> 19 ithaca03<br /> 20 ithaca03<br /> 21 ithaca03<br /> 22 ithaca03<br /> 23 ithaca03<br /> 24 ithaca04<br /> 25 ithaca04<br /> 26 ithaca04<br /> 27 ithaca04<br /> 28 ithaca04<br /> 29 ithaca04<br /> 30 ithaca04<br /> 31 ithaca04<br /> 32 ithaca05<br /> 33 ithaca05<br /> 34 ithaca05<br /> 35 ithaca05<br /> 36 ithaca05<br /> 37 ithaca05<br /> 38 ithaca05<br /> 39 ithaca05<br /> Opened existing file b40.1850.track1.1deg.006.cam.i.0863-01-01-00000.nc<br /> 65536<br /> Opened existing file <br /> /share/data/udic/cesm/inputdata/atm/cam/topo/USGS-gtopo30_0.9x1.25_remap_c05102<br /> 7.nc 131072<br /> NetCDF: Invalid dimension ID or name<br />forrtl: severe (174): SIGSEGV, segmentation fault occurred<br />Image PC Routine Line Source <br />cesm.exe 0000000001B20D99 Unknown Unknown Unknown<br />cesm.exe 0000000001B1F710 Unknown Unknown Unknown<br />cesm.exe 0000000001ABA282 Unknown Unknown Unknown<br />cesm.exe 0000000001A46133 Unknown Unknown Unknown<br />cesm.exe 0000000001A4FB4B Unknown Unknown Unknown<br />libpthread.so.0 00002B82D5C38800 Unknown Unknown Unknown<br />libmpi.so.4 00002B82D6022D1E Unknown Unknown Unknown<br />libmpi.so.4 00002B82D5FDDD56 Unknown Unknown Unknown<br />libmpi.so.4 00002B82D608638F Unknown Unknown Unknown<br />libmpi.so.4 00002B82D607BFBE Unknown Unknown Unknown<br />libmpi.so.4 00002B82D605EE50 Unknown Unknown Unknown<br />libmpi.so.4 00002B82D5F10ADC Unknown Unknown Unknown<br />libmpi.so.4 00002B82D6124BE4 Unknown Unknown Unknown<br />libmpi.so.4 00002B82D6124983 Unknown Unknown Unknown<br />libmpigf.so.4 00002B82D64DB70A Unknown Unknown Unknown<br />cesm.exe 0000000001980CC7 pio_spmd_utils_mp 345 pio_spmd_utils.F90.in<br />cesm.exe 000000000197A4A9 box_rearrange_mp_ 1372 box_rearrange.F90.in<br />cesm.exe 000000000197184D box_rearrange_mp_ 1029 box_rearrange.F90.in<br />cesm.exe 000000000185941D piolib_mod_mp_pio 1166 piolib_mod.F90<br />cesm.exe 00000000004E0031 cam_pio_utils_mp_ 449 cam_pio_utils.F90<br />cesm.exe 0000000000978B65 ncdio_atm_mp_infl 279 ncdio_atm.F90<br />cesm.exe 000000000087BDFA inidat_mp_read_in 230 inidat.F90<br />cesm.exe 0000000000606AE2 startup_initialco 54 startup_initialconds.F90<br />cesm.exe 00000000005177C4 inital_mp_cam_ini 51 inital.F90<br />cesm.exe 00000000004A8E91 cam_comp_mp_cam_i 164 cam_comp.F90<br />cesm.exe 00000000004A4DB0 atm_comp_mct_mp_a 276 atm_comp_mct.F90<br />cesm.exe 0000000000434AEB ccsm_comp_mod_mp_ 1058 ccsm_comp_mod.F90<br />cesm.exe 00000000004372E6 MAIN__ 90 ccsm_driver.F90<br />cesm.exe 0000000000413D56 Unknown Unknown Unknown<br />libc.so.6 00002B82D6D87C36 Unknown Unknown Unknown<br />cesm.exe 0000000000413C49 Unknown Unknown Unknown<br /><br />As it is a first try with the model we assign 40 cores to each components. We do not expected performance at this moment rather than check if the compilation is well done.<br /><br />As we do not know where to look at, we compile with DEBUG mode to true. But the model finishes with no error.<br /><br />It is confusing. I presume one of the debug flags changes its execution behaviour. <br /><br />Does someone has a clue on how to proceed?<br /><br /><br /><strong>Macros:</strong><br /><br />CPPDEFS+= -DFORTRANUNDERSCORE -DNO_R16 -Dlinux -DCPRINTEL <br />SLIBS+=$(shell $(NETCDF_PATH)/bin/nc-config --flibs)<br />CFLAGS:= -O2 -fp-model precise <br />CXX_LDFLAGS:= -cxxlib <br />CXX_LINKER:=FORTRAN<br />FC_AUTO_R8:= -r8 <br />FFLAGS:= -fp-model source -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs <br />FFLAGS_NOOPT:= -O0 <br />FIXEDFLAGS:= -fixed -132 <br />FREEFLAGS:= -free <br />MPICC:=mpiicc<br />MPICXX:= mpiicpc<br />MPIFC:= mpiifort <br />MPI_LIB_NAME:=mpi<br />MPI_PATH:=/share/software/impi/4.1.3.045/intel64<br />NETCDF_PATH:=/share/software/netCDF/4.3.2-ictce-6.1.5<br />PNETCDF_PATH:=<br />SCC:= icc <br />SCXX:= icpc <br />SFC:= ifort <br />SUPPORTS_CXX:=TRUE<br />ifeq ($(DEBUG), TRUE) <br /> FFLAGS += -O0 -g -check uninit -check bounds -check pointers -fpe0 <br />endif<br />ifeq ($(DEBUG), FALSE) <br /> FFLAGS += -O2 <br />endif<br />ifeq ($(compile_threaded), true) <br /> LDFLAGS += -openmp <br /> CFLAGS += -openmp <br /> FFLAGS += -openmp <br />endif<br /><br />ifeq ($(MODEL), pop2) <br /> CPPDEFS += -D_USE_FLOW_CONTROL <br />endif<br /><br /><strong>atm log:</strong><br /><br /> 2.705257020083618E-004<br /> initcom: lat, clat, w 192 1.57079632679490 <br /> 3.381742815944389E-005<br /> Number of longitudes per latitude = 288<br /> PHYS_GRID_INIT: Using PCOLS= 16 phys_loadbalance= 2 <br /> phys_twin_algorithm= 1 phys_alltoall= -1 <br /> chunks_per_thread= 1<br /> chem_surfvals_init: ghg surface values are fixed as follows<br /> co2 volume mixing ratio = 2.847000000000000E-004<br /> ch4 volume mixing ratio = 7.916000000000000E-007<br /> n2o volume mixing ratio = 2.756800000000000E-007<br /> f11 volume mixing ratio = 1.248000000000000E-011<br /> f12 volume mixing ratio = 0.000000000000000E+000<br /> INITIALIZE_RADBUFFER: ntoplw = 1 pressure: 354.463800000001 <br /> Creating new decomp: 2602192288<br /><br /><strong>general log:</strong><br /><br /> CESM BUILDNML SCRIPT STARTING<br /> - To prestage restarts, untar a restart.tar file into /scratch/udic/ajornet/example9_2/run<br /> infile is /home/ajornet/cesm/example9_2/Buildconf/cplconf/cesm_namelist <br />CAM writing dry deposition namelist to drv_flds_in <br />CAM writing namelist to atm_in <br />CLM configure done.<br />CLM adding use_case 1850_control defaults for var sim_year with val 1850 <br />CLM adding use_case 1850_control defaults for var sim_year_range with val constant <br />CLM adding use_case 1850_control defaults for var stream_year_first_ndep with val 1850 <br />CLM adding use_case 1850_control defaults for var stream_year_last_ndep with val 1850 <br />CLM adding use_case 1850_control defaults for var use_case_desc with val Conditions to simulate 1850 land-use <br />CICE configure done.<br />Getting init_ts_file_fmt from /share/data/udic/cesm/inputdata/ccsm4_init/b40.1850.track1.1deg.006/0863-01-01/rpointer.ocn.restart<br />POP2 build-namelist: ocn_grid is gx1v6 <br />POP2 build-namelist: ocn_tracer_modules are iage <br /> CESM BUILDNML SCRIPT HAS FINISHED SUCCESSFULLY<br />-------------------------------------------------------------------------<br />-------------------------------------------------------------------------<br /> CESM PRESTAGE SCRIPT STARTING<br /> - Case input data directory, DIN_LOC_ROOT, is /share/data/udic/cesm/inputdata<br /> - Checking the existence of input datasets in DIN_LOC_ROOT<br /> <br />Any files with "status unknown" below were not found in the<br />expected location, and are not from the input data repository.<br />This is informational only; this script will not attempt to<br />find these files. If CESM can find (or does not need) these files<br />at run time, no error will result.<br />Input Data List Files Found:<br />/home/ajornet/cesm/example9_2/Buildconf/cpl.input_data_list<br />/home/ajornet/cesm/example9_2/Buildconf/cice.input_data_list<br />/home/ajornet/cesm/example9_2/Buildconf/rtm.input_data_list<br />/home/ajornet/cesm/example9_2/Buildconf/clm.input_data_list<br />/home/ajornet/cesm/example9_2/Buildconf/pop2.input_data_list<br />/home/ajornet/cesm/example9_2/Buildconf/cam.input_data_list<br />File status unknown: b40.1850.track1.1deg.006.clm2.r.0863-01-01-00000.nc <br />File status unknown: b40.1850.track1.1deg.006.clm2.r.0863-01-01-00000.nc <br />File status unknown: b40.1850.track1.1deg.006.cam.i.0863-01-01-00000.nc <br /> <br /> - Prestaging REFCASE (ccsm4_init/b40.1850.track1.1deg.006/0863-01-01) to /scratch/udic/ajornet/example9_2/run<br /> CESM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY<br />-------------------------------------------------------------------------<br />Wed Nov 19 13:45:33 CET 2014 -- CSM EXECUTION BEGINS HERE<br />Wed Nov 19 13:45:46 CET 2014 -- CSM EXECUTION HAS FINISHED<br />Model did not complete - see /scratch/udic/ajornet/example9_2/run/cesm.log.141119-134455