Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Can anyone tell me what's wrong with this?

dfeijat said:
Dear eaton,

After ignoring the avbove warnings and errors, the "testname.generic_linux_intel.build" yeild the output file ccsm.exe.
However, when I "testname.generic_linux_intel.run" ccsm.exe manually such as following steps, it abort quikly.
I don't know what's going wrong with it.
Can you give me some advices?

Best Regards.
ZhenCai DU
from
cmsr.iap.cas.cn


manual runing steps:

[dzc@n000 testname]$ limit coredumpsize 1000000
[dzc@n000 testname]$ limit stacksize unlimited
[dzc@n000 testname]$ cd /disk1/dzc/cesm/test/testname
[dzc@n000 testname]$ ./Tools/ccsm_check_lockedfiles
[dzc@n000 testname]$ source ./Tools/ccsm_getenv
[dzc@n000 testname]$ setenv LBQUERY FALSE
[dzc@n000 testname]$ setenv LBSUBMIT FALSE
[dzc@n000 testname]$ setenv LID "`date +%y%m%d-%H%M%S`"
[dzc@n000 testname]$ env | egrep '(MP_|LOADL|XLS|FPE|DSM|OMP|MPC)'
OMP_NUM_THREADS=1
COMPSET=B_1850-2000_WACCM_CN
CCSM_COMPSET=B_1850-2000_WACCM_CN (B20TRWCN)
COMP_ATM=cam
COMP_LND=clm
COMP_ICE=cice
COMP_GLC=sglc
COMP_OCN=pop2
COMP_CPL=cpl
CCSM_LCOMPSET=B_1850-2000_WACCM_CN
CCSM_SCOMPSET=B20TRWCN
COMP_INTERFACE=MCT
BUILD_COMPLETE=TRUE
SMP_BUILD=a0l0i0o0g0c0
SMP_VALUE=a0l0i0o0g0c0
POP_DECOMPTYPE=cartesian
CICE_DECOMPTYPE=cartesian
POP_AUTO_DECOMP=true
CICE_AUTO_DECOMP=true
[dzc@n000 testname]$ cd $CASEROOT
[dzc@n000 testname]$ source $CASETOOLS/ccsm_buildnml.csh
-------------------------------------------------------------------------
CCSM BUILDNML SCRIPT STARTING
- To prestage restarts, untar a restart.tar file into /disk1/dzc/cesm/test/testname/run
- Create modelio namelist input files
RESTART_FMT=bin
CCSM BUILDNML SCRIPT HAS FINISHED SUCCESSFULLY
[dzc@n000 run]$ cd $CASEROOT
[dzc@n000 testname]$ source $CASETOOLS/ccsm_prestage.csh
-------------------------------------------------------------------------
CCSM PRESTAGE SCRIPT STARTING
- CCSM input data directory, DIN_LOC_ROOT_CSMDATA, is /disk1/dzc/cesm/inputdata
- Case input data directory, DIN_LOC_ROOT, is /disk1/dzc/cesm/inputdata
- Checking the existence of input datasets in DIN_LOC_ROOT

The following files were not found, this is informational only
Input Data List Files Found:
/disk1/dzc/cesm/test/testname/Buildconf/clm.input_data_list
/disk1/dzc/cesm/test/testname/Buildconf/cam.input_data_list
/disk1/dzc/cesm/test/testname/Buildconf/cpl.input_data_list
/disk1/dzc/cesm/test/testname/Buildconf/pop2.input_data_list
/disk1/dzc/cesm/test/testname/Buildconf/cice.input_data_list

- Prestaging REFCASE (ccsm4_init/b40.1850.track1.2deg.wcm.007/0156-01-01) to /disk1/dzc/cesm/test/testname/run
CCSM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY
[dzc@n000 testname]$ if !(-d $RUNDIR/timing) mkdir $RUNDIR/timing
[dzc@n000 testname]$ if !(-d $RUNDIR/timing/checkpoints) mkdir $RUNDIR/timing/checkpoints
[dzc@n000 testname]$ rm -f $RUNDIR/timing/ccsm_timing*
rm: No match.
[dzc@n000 testname]$
[dzc@n000 testname]$ set sdate = `date +"%Y-%m-%d %H:%M:%S"`
[dzc@n000 testname]$ echo "run started $sdate" >>& $CASEROOT/CaseStatus
[dzc@n000 testname]$ sleep 25
[dzc@n000 testname]$ cd $RUNDIR
[dzc@n000 run]$ echo "`date` -- CSM EXECUTION BEGINS HERE"
Fri Sep 24 01:40:51 CST 2010 -- CSM EXECUTION BEGINS HERE
[dzc@n000 run]$ setenv OMP_NUM_THREADS 1
[dzc@n000 run]$ pwd
/disk1/dzc/cesm/test/testname/run
[dzc@n000 run]$ ll

-rw-r--r-- 1 dzc dzc 9523 Sep 24 01:39 atm_in
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 atm_modelio.nml
-rw-r--r-- 1 dzc dzc 710979300 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h0.0155-12.nc
-rw-r--r-- 1 dzc dzc 281835568 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h1.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 1646580108 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h2.0155-09-23-00000.nc
-rw-r--r-- 1 dzc dzc 22081768 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h3.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 627700 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.h4.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 497297616 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.i.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 619875176 Sep 24 01:39 b40.1850.track1.2deg.wcm.007.cam2.r.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 281839716 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cam2.rh1.0155-05-01-00000.nc
-rw-r--r-- 1 dzc dzc 149718200 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cam2.rh2.0155-05-01-00000.nc
-rw-r--r-- 1 dzc dzc 6972540 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cam2.rs.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 150407684 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cice.r.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 26072408 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.clm2.h0.0155-12.nc
-rw-r--r-- 1 dzc dzc 738707 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.clm2.r.0156-01-01-00000
-rw-r--r-- 1 dzc dzc 137752176 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.clm2.r.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 99688836 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.cpl.r.0156-01-01-00000.nc
-rw-r--r-- 1 dzc dzc 603586560 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.pop.r.0156-01-01-00000
-rw-r--r-- 1 dzc dzc 12383 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.pop.r.0156-01-01-00000.hdr
-rw-r--r-- 1 dzc dzc 85217 Sep 24 01:40 b40.1850.track1.2deg.wcm.007.pop.ro.0156-01-01-00000
-rwxr-xr-x 1 dzc dzc 123700887 Sep 23 22:11 ccsm.exe
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 cpl_modelio.nml
-rw-r--r-- 1 dzc dzc 126 Sep 24 01:39 drv_flds_in
-rw-r--r-- 1 dzc dzc 2530 Sep 24 01:39 drv_in
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 glc_modelio.nml
-rw-r--r-- 1 dzc dzc 2332 Sep 24 01:39 ice_in
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 ice_modelio.nml
-rw-r--r-- 1 dzc dzc 2967 Sep 24 01:39 lnd_in
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 lnd_modelio.nml
-rw-r--r-- 1 dzc dzc 155 Sep 24 01:39 ocn_modelio.nml
-rw-r--r-- 1 dzc dzc 15449 Sep 24 01:39 pop2_in
-rw-r--r-- 1 dzc dzc 529 Sep 24 01:40 rpointer.atm
-rw-r--r-- 1 dzc dzc 257 Sep 24 01:40 rpointer.drv
-rw-r--r-- 1 dzc dzc 257 Sep 24 01:40 rpointer.ice
-rw-r--r-- 1 dzc dzc 257 Sep 24 01:40 rpointer.lnd
-rw-r--r-- 1 dzc dzc 55 Sep 24 01:40 rpointer.ocn.ovf
-rw-r--r-- 1 dzc dzc 70 Sep 24 01:40 rpointer.ocn.restart
-rw-r--r-- 1 dzc dzc 103 Sep 24 01:40 rpointer.ocn.tavg
-rw-r--r-- 1 dzc dzc 1864 Sep 24 01:39 seq_maps.rc
drwxr-xr-x 3 dzc dzc 4096 Sep 24 01:40 timing
[dzc@n000 run]$ which mpirun
/disk1/software/mvapich2-1.4-intel//bin/mpirun
[dzc@n000 run]$ mpirun -np 8 ./ccsm.exe
(seq_comm_setcomm) initialize ID ( 7 GLOBAL ) pelist = 0 7 1 ( npes = 8) ( nthreads = 1)
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff5e1b2a84) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff272bcb94) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fffa6d8f694) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fffa461bf14) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff1eb52414) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff1fcbe594) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff1eb52414) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
Fatal error in MPI_Group_range_incl:
Invalid argument, error stack:
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0xbd6aad0, new_group=0x7fff1488a294) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 63 but must be nonnegative and less than 8
rank 7 in job 1 n000_34297 caused collective abort of all ranks
exit status of rank 7: killed by signal 9
rank 3 in job 1 n000_34297 caused collective abort of all ranks
exit status of rank 3: killed by signal 9
rank 2 in job 1 n000_34297 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
rank 1 in job 1 n000_34297 caused collective abort of all ranks
exit status of rank 1: killed by signal 9

I wonder if the mpirun errors are associated with the building warnings or not?
If not, what's wrong with the mpirun?
Can anyone tell me how to correct it?

Thanks in advance.

aDu
from
cmsr.iap.cas.cn
 
dfeijat said:
Dear eaton,
I edited the Macros.generic_linux_intel file as you told me.
However, there are still many warnings and errors happened.
I attached all the errors to ask you for you further help at your convenience.
Any advices are appreciated.

The attached file (2010Sep23.errors.txt) contains the following messages.
cat /disk1/dzc/cesm/test/testname/mct/mct.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/pio/pio.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/csm_share/csm_share.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/run/cpl.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/run/atm.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/run/lnd.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/run/ice.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/run/ocn.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/run/glc.bldlog.100923-195720
cat /disk1/dzc/cesm/test/testname/run/ccsm.bldlog.100923-195720

Some warnings and errors are listed in the following:
configure: WARNING: UNKNOWN FORTRAN 90 COMPILER
configure: WARNING: UNKNOWN FORTRAN 90 COMPILER
configure: WARNING: PNETCDF_PATH not found in environment, defaulting to /usr/local/pnetcdf
configure: WARNING: pnetcdf.inc not found in PNETCDF_PATH/include disabling pnetcdf support
configure: WARNING: libpnetcdf.a not found in PNETCDF_PATH/lib disabling pnetcdf support
....
piolib_mod.F90(400): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [IODESC]
subroutine PIO_initdecomp_dof_dof(iosystem,basepiotype,dims,compdof,iodesc,iodof)
----------------------------------------------------------------------^
piolib_mod.F90(365): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [IODESC]
....
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_BaseMod.F90(672): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [TYPE]
subroutine ESMF_AttributeGetbyNumber(anytype, number, name, type, value, rc)
------------------------------------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_BaseMod.F90(623): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [TYPE]
subroutine ESMF_AttributeGet(anytype, name, type, value, rc)
--------------------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_BaseMod.F90(699): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [NAMELIST]
subroutine ESMF_AttributeGetNameList(anytype, count, namelist, rc)
-----------------------------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_BaseMod.F90(672): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [VALUE]
subroutine ESMF_AttributeGetbyNumber(anytype, number, name, type, value, rc)
------------------------------------------------------------------------^
....
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_TimeMod.F90(297): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [YRL]
subroutine ESMF_TimeGet(time, YY, YRl, MM, DD, D, Dl, H, M, S, Sl, MS, &
----------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_TimeMod.F90(297): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [D]
subroutine ESMF_TimeGet(time, YY, YRl, MM, DD, D, Dl, H, M, S, Sl, MS, &
-----------------------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_TimeMod.F90(297): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [DL]
subroutine ESMF_TimeGet(time, YY, YRl, MM, DD, D, Dl, H, M, S, Sl, MS, &
--------------------------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_TimeMod.F90(297): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [SL]
subroutine ESMF_TimeGet(time, YY, YRl, MM, DD, D, Dl, H, M, S, Sl, MS, &
---------------------------------------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_TimeMod.F90(298): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [US]
US, NS, d_, h_, m_, s_, ms_, us_, ns_, Sn, Sd, &
------------------------------^
.......
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_AlarmMod.F90(904): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [RINGINTERVAL]
subroutine ESMF_AlarmWrite(alarm, RingInterval, RingTime, &
----------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_AlarmMod.F90(904): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [RINGTIME]
subroutine ESMF_AlarmWrite(alarm, RingInterval, RingTime, &
------------------------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_AlarmMod.F90(905): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [PREVRINGTIME]
PrevRingTime, StopTime, Ringing, &
----------------------------^
....
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_ClockMod.F90(1111): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [TIMESTEP]
subroutine ESMF_ClockWrite(clock, TimeStep, StartTime, StopTime, &
----------------------------------------^
/disk1/dzc/cesm/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_ClockMod.F90(1111): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [STARTTIME]
subroutine ESMF_ClockWrite(clock, TimeStep, StartTime, StopTime, &
--------------------------------------------------^
....
/disk1/dzc/cesm/cesm1_0/models/utils/timing/gptl.c(2156): warning #167: argument of type "int (*)(const char **, const char **)" is incompatible with parameter of type "__compar_fn_t"
qsort( sort[t], count[t], sizeof(char*), cmp );
/disk1/dzc/cesm/cesm1_0/models/utils/timing/gptl.c(2195): warning #167: argument of type "int (*)(const char **, const char **)" is incompatible with parameter of type "__compar_fn_t"
qsort( newtimers, num_newtimers, sizeof(char*), ncmp ); /*sorts by memory address to restore original order*/
/disk1/dzc/cesm/cesm1_0/models/utils/timing/gptl.c(2216): warning #167: argument of type "int (*)(const char **, const char **)" is incompatible with parameter of type "__compar_fn_t"
qsort( sort[0], count[0], sizeof(char*), cmp );
........
/disk1/dzc/cesm/cesm1_0/models/utils/timing/gptl.c(2377): warning #167: argument of type "int (*)(const char **, const char **)" is incompatible with parameter of type "__compar_fn_t"
qsort( newtimers, num_newtimers, sizeof(char*), ncmp ); /*sorts by memory address to get original order */
^

/disk1/dzc/cesm/cesm1_0/models/utils/timing/gptl.c(2468): warning #1011: missing return statement at end of non-void function "ncmp"
}
^
...
/disk1/dzc/cesm/cesm1_0/models/atm/cam/src/control/wrap_mpi.F90(1232): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [WIN]
subroutine mpiwincreate(base,size,comm,win)
------------------------------------------^
.../disk1/dzc/cesm/cesm1_0/models/lnd/clm/src/main/ncdio.F90(5059): warning #6384: The INTEGER(KIND=4) value is out-of-range. [O'0777610000000000000000']
cols(:) = nan
---------------------^
/disk1/dzc/cesm/cesm1_0/models/lnd/clm/src/main/ncdio.F90(5060): warning #6384: The INTEGER(KIND=4) value is out-of-range. [O'0777610000000000000000']
data_offset = nan
---------------------^
.........



Best Reguards.

ZhenCai DU
from
cmsr.iap.ac.cn
2010Sep24.

some warnings during compile and building runs are given above.
For further details, one can look into the original post.
Any hints are appreciated.
 
eaton said:
It appears that the messages from the build are all warnings and not fatal errors. So I suspect that you build is OK.

The output from trying to run the model indicates that the job is dying because by default the CESM scripts are setting things up to run using 64 tasks, but only 8 tasks are available:

[dzc@n000 run]$ mpirun -np 8 ./ccsm.exe

The mpirun command needs to use the argument "-np 64". Or if 64 cores are not available on your system then you need to edit the env_mach_pes.xml file to use fewer tasks. See the cesm user guide for more info on this.

dfeijat said:
After changing the env_mach_pes.xml file , the model works well.

Thanks a lot.

Dear ALL Users,
The problem addressed above was corrected by eaton.
Anyone who encounter the same problem can refer to the original post.
GoodLuck.
 
Top