seungbu@pusan_ac_kr
New Member
Hello, I've recently installed CESM1.2.2 on HPC server in the University of Southern California.We're using Intel compilers, and our MPICH and NetCDF work.Aquaplanet, ne30_g16 & F1850C5, T31_g37 & B1850C5 simulations are OK.But ne120_t12 simulation stops during initialization.We're using 63 nodes having 16 cores (1008 cores no hyperthreading) and each node has ~64GB RAM memory. cpl.log ends with (seq_mct_drv) : Initialize each component: atm, lnd, rof, ocn, ice, glc, wav(seq_mct_drv) : Initialize atm component ATM(seq_mct_drv) : Initialize lnd component LND(seq_mct_drv) : Initialize rof component ROF(seq_mct_drv) : Initialize ocn component OCN(seq_mct_drv) : Initialize ice component ICE In cesm.log, forrtl: severe (174): SIGSEGV, segmentation fault occurredImage PC Routine Line Sourcecesm.exe 000000000244B511 Unknown Unknown Unknowncesm.exe 000000000244964B Unknown Unknown Unknowncesm.exe 00000000023E02F4 Unknown Unknown Unknowncesm.exe 00000000023E0106 Unknown Unknown Unknowncesm.exe 000000000235C439 Unknown Unknown Unknowncesm.exe 0000000002366EC6 Unknown Unknown Unknownlibpthread-2.17.s 00007F23781CE370 Unknown Unknown Unknowncesm.exe 0000000001648250 ice_probability_m 259 ice_probability.F90cesm.exe 000000000160ABBD ice_grid_mp_init_ 248 ice_grid.F90cesm.exe 00000000016C29A4 cice_initmod_mp_c 106 CICE_InitMod.F90cesm.exe 00000000015C777B ice_comp_mct_mp_i 254 ice_comp_mct.F90cesm.exe 000000000043EC0B ccsm_comp_mod_mp_ 1155 ccsm_comp_mod.F90cesm.exe 00000000004426ED MAIN__ 90 ccsm_driver.F90cesm.exe 0000000000412A5E Unknown Unknown Unknownlibc-2.17.so 00007F2377B1DB35 __libc_start_main Unknown Unknowncesm.exe 0000000000412969 Unknown Unknown Unknown 259 line of cesm1_2_2/models/ice/cice/src/source/ice_probability.F90 just allocates work_gr array. 195 subroutine CalcWorkPerBlock(distribution_wght, KMTG,ULATG,work_per_block, prob_per_block,blockType,bStats)... 259 allocate(work_gr(nx_global,ny_global))260 allocate(prob(nblocks_tot),work(nblocks_tot))261 allocate(nocn(nblocks_tot))262 allocate(nice005(nblocks_tot),nice010(nblocks_tot),nice050(nblocks_tot), &263 nice100(nblocks_tot),nice250(nblocks_tot),nice500(nblocks_tot)) And ice.log ends with the following.Domain Information Horizontal domain: nx = 3600 ny = 2400 No. of categories: nc = 5 No. of ice layers: ni = 4 No. of snow layers:ns = 1 Processors: total = 448 Processor shape: null Distribution type: blkrobin Distribution weight: latitude {min,max}Blocks = 1 8 Number of ghost cells: 1 read_global 99 1 -1.36924091988070 1.57079616109396 read_global 98 1 0.000000000000000E+000 42.0000000000000 Is this a problem of trigrid (tx0.1v2)?
Sure, we add "limit stacksize unlimited" in env_mach_specific.USC.I'd appreciate solution or any hint on this. Thanks,Seungbu Our compiler options are like this. mpiifort mpiicc mpiicpc -heap-arrays -heap-arrays -mcmodel=medium -O2 -O2 /home/rcf-proj/lds1/sp_582/ncdf-f-4.4.4 mpi /usr/usc/intel/17.2/compilers_and_libraries_2017.2.174/linux/mpi/intel64 $(shell $(NETCDF_PATH)/bin/nf-config --flibs) and our pes 8961 0 1 448$NTHRDS_ATM$ROOTPE_ATM1 448$NTHRDS_ATM$NTASKS_LND1 112$NTHRDS_ATM$NTASKS_ATM1 $NTASKS_ATM$NTHRDS_ATM$ROOTPE_ATM $NTASKS_ATM$NTHRDS_ATM$ROOTPE_ATM1 -1-1netcdf1 0
Sure, we add "limit stacksize unlimited" in env_mach_specific.USC.I'd appreciate solution or any hint on this. Thanks,Seungbu Our compiler options are like this. mpiifort mpiicc mpiicpc -heap-arrays -heap-arrays -mcmodel=medium -O2 -O2 /home/rcf-proj/lds1/sp_582/ncdf-f-4.4.4 mpi /usr/usc/intel/17.2/compilers_and_libraries_2017.2.174/linux/mpi/intel64 $(shell $(NETCDF_PATH)/bin/nf-config --flibs) and our pes 8961 0 1 448$NTHRDS_ATM$ROOTPE_ATM1 448$NTHRDS_ATM$NTASKS_LND1 112$NTHRDS_ATM$NTASKS_ATM1 $NTASKS_ATM$NTHRDS_ATM$ROOTPE_ATM $NTASKS_ATM$NTHRDS_ATM$ROOTPE_ATM1 -1-1netcdf1 0