liptak@umich_edu
New Member
I'm currently trying to run the CAM5 with CAM4 physics using the CESM1_0_3 scripts (F_2000 compset with some extra specifications for the CAM) on Linux. The model is built with MVAPICH2 using pgf90, and is configured to run on 2 nodes with 8 processes per node (each node has 4 dual-core processos). While this configuration was successful with the CCSM4 scripts, the run fails when it attempts to distribute the clm4 among the different processors.
The error occurs after the file surfdata_1.9x2.5_simyr2000_c091005.nc is opened. The function "decompinitglcp" is called in "decompInitMod.F90", and an error occurs at line 657, resulting in a core dump.
Using totalview debugger, the output from the stack frame indicates that the following variables have bad addresses:
start: (integer(:))
count: (integer(:))
tarr1: (integer(:))
tarr2: (integer(:))
Changing the task and thread counts before , and configuring in serial on one processor, all produce the same error. Thus, it is unlikely that this issue has to do with the MPI settings.
The folks in the IT department at my institution suggested that this might be a function argument mismatch in the new version of the CESM. Has anyone encountered this error, and if so, is there a workaround? If push comes to shove, I can always run the CAM4 with the old ccsm4 scripts. However, I would eventually like to run my experiments with the CAM5.
Below, I've included part of the ccsm.log file .
The CCSM log shows:
(seq_comm_setcomm) initialize ID ( 7 GLOBAL ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 2 ATM ) pelist = 0 1 1 ( npes = 2) ( nthreads = 2)
(seq_comm_setcomm) initialize ID ( 1 LND ) pelist = 0 1 1 ( npes = 2) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 4 ICE ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 5 GLC ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 3 OCN ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 6 CPL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 8 CPLATM ) join IDs = 6 2 ( npes = 2) ( nthreads = 2)
(seq_comm_joincomm) initialize ID ( 9 CPLLND ) join IDs = 6 1 ( npes = 2) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 10 CPLICE ) join IDs = 6 4 ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 11 CPLOCN ) join IDs = 6 3 ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 12 CPLGLC ) join IDs = 6 5 ( npes = 1) ( nthreads = 1)
(seq_comm_printcomms) ID layout : global pes vs local pe for each ID
gpe LND ATM OCN ICE GLC CPL GLOBAL CPLATM CPLLND CPLICE CPLOCN CPLGLC nthrds
--- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
0 : 0 0 0 0 0 0 0 0 0 0 0 0 2
1 : 1 1 1 1 1 2
2 : 2 1
3 : 3 1
4 : 4 1
5 : 5 1
6 : 6 1
7 : 7 1
8 : 8 1
10 : 10 1
12 : 12 1
9 : 9 1
11 : 11 1
13 : 13 1
14 : 14 1
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 1 0
(seq_io_init) pio init parameters for CPL:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 2 0
(seq_io_init) pio init parameters for ATM:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 2
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 2 0
15 : 15 1
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 1 0
(seq_io_init) pio init parameters for ICE:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for OCN:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 2 0
(seq_io_init) pio init parameters for LND:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 2
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 2 0
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 1 0
(seq_io_init) pio init parameters for GLC:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(t_initf) Read in prof_inparm namelist from: drv_in
*********************
Memory block size conversion in bytes is 4094.00
2 pes participating in computation
-----------------------------------
TASK# NAME
0 sda235
1 sda236
2 pes participating in computation for CLM
-----------------------------------
NODE# NAME
( 0) sda235
( 1) sda236
proc= 1 beg atmcell = 2833 end atmcell = 5663
total atmcells per proc = 2831
proc= 1 atm ngseg = 369 atm nlseg = 194
proc= 1 nclumps = 1
rank 1 in job 1 sda235_51758 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
The error occurs after the file surfdata_1.9x2.5_simyr2000_c091005.nc is opened. The function "decompinitglcp" is called in "decompInitMod.F90", and an error occurs at line 657, resulting in a core dump.
Using totalview debugger, the output from the stack frame indicates that the following variables have bad addresses:
start: (integer(:))
count: (integer(:))
tarr1: (integer(:))
tarr2: (integer(:))
Changing the task and thread counts before , and configuring in serial on one processor, all produce the same error. Thus, it is unlikely that this issue has to do with the MPI settings.
The folks in the IT department at my institution suggested that this might be a function argument mismatch in the new version of the CESM. Has anyone encountered this error, and if so, is there a workaround? If push comes to shove, I can always run the CAM4 with the old ccsm4 scripts. However, I would eventually like to run my experiments with the CAM5.
Below, I've included part of the ccsm.log file .
The CCSM log shows:
(seq_comm_setcomm) initialize ID ( 7 GLOBAL ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 2 ATM ) pelist = 0 1 1 ( npes = 2) ( nthreads = 2)
(seq_comm_setcomm) initialize ID ( 1 LND ) pelist = 0 1 1 ( npes = 2) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 4 ICE ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 5 GLC ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 3 OCN ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 6 CPL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 8 CPLATM ) join IDs = 6 2 ( npes = 2) ( nthreads = 2)
(seq_comm_joincomm) initialize ID ( 9 CPLLND ) join IDs = 6 1 ( npes = 2) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 10 CPLICE ) join IDs = 6 4 ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 11 CPLOCN ) join IDs = 6 3 ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 12 CPLGLC ) join IDs = 6 5 ( npes = 1) ( nthreads = 1)
(seq_comm_printcomms) ID layout : global pes vs local pe for each ID
gpe LND ATM OCN ICE GLC CPL GLOBAL CPLATM CPLLND CPLICE CPLOCN CPLGLC nthrds
--- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
0 : 0 0 0 0 0 0 0 0 0 0 0 0 2
1 : 1 1 1 1 1 2
2 : 2 1
3 : 3 1
4 : 4 1
5 : 5 1
6 : 6 1
7 : 7 1
8 : 8 1
10 : 10 1
12 : 12 1
9 : 9 1
11 : 11 1
13 : 13 1
14 : 14 1
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 1 0
(seq_io_init) pio init parameters for CPL:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 2 0
(seq_io_init) pio init parameters for ATM:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 2
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 2 0
15 : 15 1
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 1 0
(seq_io_init) pio init parameters for ICE:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for OCN:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 2 0
(seq_io_init) pio init parameters for LND:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 2
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 2 0
(seq_io_ namelist_set)
pio_stride, iotasks or root out of bounds - resetting to defaults:
1 1 0
(seq_io_init) pio init parameters for GLC:
(seq_io_init) pio_stride = 1
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(t_initf) Read in prof_inparm namelist from: drv_in
*********************
Memory block size conversion in bytes is 4094.00
2 pes participating in computation
-----------------------------------
TASK# NAME
0 sda235
1 sda236
2 pes participating in computation for CLM
-----------------------------------
NODE# NAME
( 0) sda235
( 1) sda236
proc= 1 beg atmcell = 2833 end atmcell = 5663
total atmcells per proc = 2831
proc= 1 atm ngseg = 369 atm nlseg = 194
proc= 1 nclumps = 1
rank 1 in job 1 sda235_51758 caused collective abort of all ranks
exit status of rank 1: killed by signal 9