dfeijat@126_com
Member
dfeijat said:Dear eaton,
After ignoring the avbove warnings and errors, the "testname.generic_linux_intel.build" yeild the output file ccsm.exe.
However, when I "testname.generic_linux_intel.run" ccsm.exe manually such as following steps, it abort quikly.
I don't know what's going wrong with it.
Can you give me some advices?
Best Regards.
ZhenCai DU
from
cmsr.iap.cas.cn
manual runing steps:
[dzc@n000 testname]$ limit coredumpsize 1000000
[dzc@n000 testname]$ limit stacksize unlimited
[dzc@n000 testname]$ cd /disk1/dzc/cesm/test/testname
[dzc@n000 testname]$ ./Tools/ccsm_check_lockedfiles
[dzc@n000 testname]$ source ./Tools/ccsm_getenv
[dzc@n000 testname]$ setenv LBQUERY FALSE
[dzc@n000 testname]$ setenv LBSUBMIT FALSE
[dzc@n000 testname]$ setenv LID "`date +%y%m%d-%H%M%S`"
[dzc@n000 testname]$ env | egrep '(MP_|LOADL|XLS|FPE|DSM|OMP|MPC)'
OMP_NUM_THREADS=1
COMPSET=B_1850-2000_WACCM_CN
CCSM_COMPSET=B_1850-2000_WACCM_CN (B20TRWCN)
COMP_ATM=cam
COMP_LND=clm
COMP_ICE=cice
COMP_GLC=sglc
COMP_OCN=pop2
COMP_CPL=cpl
CCSM_LCOMPSET=B_1850-2000_WACCM_CN
CCSM_SCOMPSET=B20TRWCN
COMP_INTERFACE=MCT
BUILD_COMPLETE=TRUE
SMP_BUILD=a0l0i0o0g0c0
SMP_VALUE=a0l0i0o0g0c0
POP_DECOMPTYPE=cartesian
CICE_DECOMPTYPE=cartesian
POP_AUTO_DECOMP=true
CICE_AUTO_DECOMP=true
[dzc@n000 testname]$ cd $CASEROOT
[dzc@n000 testname]$ source $CASETOOLS/ccsm_buildnml.csh
-------------------------------------------------------------------------
CCSM BUILDNML SCRIPT STARTING
- To prestage restarts, untar a restart.tar file into /disk1/dzc/cesm/test/testname/run
- Create modelio namelist input files
RESTART_FMT=bin
CCSM BUILDNML SCRIPT HAS FINISHED SUCCESSFULLY
[dzc@n000 run]$ cd $CASEROOT
[dzc@n000 testname]$ source $CASETOOLS/ccsm_prestage.csh
-------------------------------------------------------------------------
CCSM PRESTAGE SCRIPT STARTING
- CCSM input data directory, DIN_LOC_ROOT_CSMDATA, is /disk1/dzc/cesm/inputdata
- Case input data directory, DIN_LOC_ROOT, is /disk1/dzc/cesm/inputdata
- Checking the existence of input datasets in DIN_LOC_ROOT
The following files were not found, this is informational only
Input Data List Files Found:
/disk1/dzc/cesm/test/testname/Buildconf/clm.input_data_list
/disk1/dzc/cesm/test/testname/Buildconf/cam.input_data_list
/disk1/dzc/cesm/test/testname/Buildconf/cpl.input_data_list
/disk1/dzc/cesm/test/testname/Buildconf/pop2.input_data_list
/disk1/dzc/cesm/test/testname/Buildconf/cice.input_data_list
- Prestaging REFCASE (ccsm4_init/b40.1850.track1.2deg.wcm.007/0156-01-01) to /disk1/dzc/cesm/test/testname/run
CCSM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY
[dzc@n000 testname]$ if !(-d $RUNDIR/timing) mkdir $RUNDIR/timing
[dzc@n000 testname]$ if !(-d $RUNDIR/timing/checkpoints) mkdir $RUNDIR/timing/checkpoints
[dzc@n000 testname]$ rm -f $RUNDIR/timing/ccsm_timing*
rm: No match.
[dzc@n000 testname]$
[dzc@n000 testname]$ set sdate = `date +"%Y-%m-%d %H:%M:%S"`
[dzc@n000 testname]$ echo "run started $sdate" >>& $CASEROOT/CaseStatus
[dzc@n000 testname]$ sleep 25
[dzc@n000 testname]$ cd $RUNDIR
[dzc@n000 run]$ echo "`date` -- CSM EXECUTION BEGINS HERE"
Fri Sep 24 01:40:51 CST 2010 -- CSM EXECUTION BEGINS HERE
[dzc@n000 run]$ setenv OMP_NUM_THREADS 1
[dzc@n000 run]$ pwd
/disk1/dzc/cesm/test/testname/run
[dzc@n000 run]$ ll
-rw-r--r-- 1 dzc dzc 9523 Sep 24 01:39 atm_in
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 atm_modelio.nml
-rw-r--r-- 1 dzc dzc 710979300 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h0.0155-12.nc
-rw-r--r-- 1 dzc dzc 281835568 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h1.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 1646580108 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h2.0155-09-23-00000.nc
-rw-r--r-- 1 dzc dzc 22081768 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h3.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 627700 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h4.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 497297616 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.i.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 619875176 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.r.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 281839716 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cam2.rh1.0155-05-01-00000.nc
-rw-r--r-- 1 dzc dzc 149718200 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cam2.rh2.0155-05-01-00000.nc
-rw-r--r-- 1 dzc dzc 6972540 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cam2.rs.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 150407684 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cice.r.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 26072408 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.clm2.h0.0155-12.nc
-rw-r--r-- 1 dzc dzc 738707 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.clm2.r.0156-01-01-00000
-rw-r--r-- 1 dzc dzc 137752176 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.clm2.r.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 99688836 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cpl.r.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 603586560 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.pop.r.0156-01-01-00000
-rw-r--r-- 1 dzc dzc 12383 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.pop.r.0156-01-01-00000.hdr
-rw-r--r-- 1 dzc dzc 85217 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.pop.ro.0156-01-01-00000
-rwxr-xr-x 1 dzc dzc 123700887 Sep 23 22:11 ccsm.exe
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 cpl_modelio.nml
-rw-r--r-- 1 dzc dzc 126 Sep 24 01:39 drv_flds_in
-rw-r--r-- 1 dzc dzc 2530 Sep 24 01:39 drv_in
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 glc_modelio.nml
-rw-r--r-- 1 dzc dzc 2332 Sep 24 01:39 ice_in
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 ice_modelio.nml
-rw-r--r-- 1 dzc dzc 2967 Sep 24 01:39 lnd_in
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 lnd_modelio.nml
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 ocn_modelio.nml
-rw-r--r-- 1 dzc dzc 15449 Sep 24 01:39 pop2_in
-rw-r--r-- 1 dzc dzc 529 Sep 24 01:40 rpointer.atm
-rw-r--r-- 1 dzc dzc 257 Sep 24 01:40 rpointer.drv
-rw-r--r-- 1 dzc dzc 257 Sep 24 01:40 rpointer.ice
-rw-r--r-- 1 dzc dzc 257 Sep 24 01:40 rpointer.lnd
-rw-r--r-- 1 dzc dzc 55 Sep 24 01:40 rpointer.ocn.ovf
-rw-r--r-- 1 dzc dzc 70 Sep 24 01:40 rpointer.ocn.restart
-rw-r--r-- 1 dzc dzc 103 Sep 24 01:40 rpointer.ocn.tavg
-rw-r--r-- 1 dzc dzc 1864 Sep 24 01:39 seq_maps.rc
drwxr-xr-x 3 dzc dzc 4096 Sep 24 01:40 timing
[dzc@n000 run]$ which mpirun
/disk1/software/mvapich2-1.4-intel//bin/mpirun
[dzc@n000 run]$ mpirun -np 8 ./ccsm.exe
(seq_comm_setcomm) initialize ID ( 7 GLOBAL ) pelist = 0 7 1 ( npes = 8) ( nthreads = 1)
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff5e1b2a84) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff272bcb94) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fffa6d8f694) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fffa461bf14) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff1eb52414) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff1fcbe594) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff1eb52414) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff1488a294) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
rank 7 in job 1 n000_34297 caused collective abort of all ranks
exit status of rank 7: killed by signal 9
rank 3 in job 1 n000_34297 caused collective abort of all ranks
exit status of rank 3: killed by signal 9
rank 2 in job 1 n000_34297 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
rank 1 in job 1 n000_34297 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
I wonder if the mpirun errors are associated with the building warnings or not?
If not, what's wrong with the mpirun?
Can anyone tell me how to correct it?
Thanks in advance.
aDu
from
cmsr.iap.cas.cn