Main menu

Navigation

Problem in port validation tests

14 posts / 0 new
Last post
sirajkhan78@...
Problem in port validation tests

Hi All,

I have ported CCSM model to my machine and for the pot validation I am performing the test given in the online documentation. I start to perform the test as

./create_test -testname ERS_D.f19_g16.B1850CN.bugsbunny -testid t03
and when I run the build script I get the following error in last few lines of log file. For the X compset case every thing is well but for this case i get error

mpif90 -c -I. -I/usr/apps/include -I/usr/apps/include -I. -I/home/siraj/ccsm4_working_copy/scripts/b40.B2000/SourceMods/src.cam -I/home/siraj/ccsm4_working_copy/models/atm/cam/src/chemistry/bulk_aero -I/home/siraj/ccsm4_working_copy/models/atm/cam/src/chemistry/utils -I/home/siraj/ccsm4_working_copy/models/atm/cam/src/physics/cam -I/home/siraj/ccsm4_working_copy/models/atm/cam/src/dynamics/fv -I/home/siraj/ccsm4_working_copy/models/atm/cam/src/cpl_mct -I/home/siraj/ccsm4_working_copy/models/atm/cam/src/control -I/home/siraj/ccsm4_working_copy/models/atm/cam/src/utils -I/home/siraj/ccsm4_working_copy/models/utils/pilgrim -I/home/siraj/ccsm4_working_copy/b40.B2000/lib/include -DCO2A -DMAXPATCH_PFT=numpft+1 -DLSMLAT=1 -DLSMLON=1 -DPLON=288 -DPLAT=192 -DPLEV=26 -DPCNST=3 -DPCOLS=16 -DPTRM=1 -DPTRN=1 -DPTRK=1 -DSTAGGERED -DSPMD -DMCT_INTERFACE -DHAVE_MPI -DCO2A -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_SHR_VMATH -DNO_R16 -i4 -gopt -Mlist -Mextend -byteswapio -O2 -Mvect=nosse -Kieee -Mfree /home/siraj/ccsm4_working_copy/models/atm/cam/src/utils/spmd_utils.F90
PGF90-S-0084-Illegal use of symbol mpi_complex16 - not public entity of module (/home/siraj/ccsm4_working_copy/models/atm/cam/src/utils/spmd_utils.F90: 26)
0 inform, 0 warnings, 1 severes, 0 fatal for spmd_utils
gmake: *** [spmd_utils.o] Error 2

Kindly help me to investigate this error.

Siraj

eaton

The problem being reported is that the symbol mpi_complex16 is not a public entity in the spmd_utils module. This symbol is supposed to come from the mpishorthand module, and if you look inside models/atm/cam/src/control/mpishorthand.F you'll see that this symbol is actually coming from the mpif.h include file. So look in your mpi distribution for this file and see if the mpi_complex16 symbol is defined there. If not then you may need to modify code to make use of the mpi_double_complex symbol instead. Both of those are defined in the mpif.h file that I'm looking at for the fairly old mpich-1.2.7p1 distribution.

sirajkhan78@...

Hi Eaton,

Thanks for the help. I checked the mpich as per your email. I change the mpich and the problem is solved now. Thanks again. Well I was looking at the output for the model run and came to know that my jobs are terminated when i submit it using qsub. Every thing else is working well and the model is build properly but when i submit the run command I get the following error for every test as well as for X run.

MPI Application rank 3 killed before MPI_Finalize() with signal 15

DO i need to changes some thing in env_mach_pes file as well?

Cheers
Siraj

eaton

I can't tell from the info you've provided whether your batch system is correctly submitting the job and allocating resources. It might be easier to get past the potential system problems by trying to run CAM standalone from the example script in
$CCSM_ROOT/models/atm/cam/bld/run-pc.csh
This script needs to be edited to provide things like the location of netcdf and mpich directories, and the location of the source tree, input data, and work directories. But everything needed to run cam is set in this one script so it's generally a much simpler testing environment than using the full ccsm scripts. Once you can successfully run CAM standalone then it shouldn't be much more work to run the full ccsm.

sirajkhan78@...

Hi again,

Well I have run the CAM model in this machine and it was working fine. But this time this model is not working. My job is kill when ever i submit it. Modeling build is complete always for X, A, B ..etc compsets but when i submit the job then i get the MPI rank problem.

I start working on another machine whcih is Intel. Here when i complie model for X comst model is build and is runing fine. But for all other compsets tests
ERS_D.T31_g37.A.bugaboo, ERS_D.f19_g16.B1850CN.bugaboo, ERS.f19_f19.F.bugaboo etc the model is not building. For example for test ERS_D.T31_g37.A.bugaboo i get the follwoing errot in the last lines atm.bldlog.100615-122443

mpif90 -c -I. -I/usr/local/intel/impi/4.0.0.025/intel64/include -I. -I/home/siraj/ccsm4_working_copy/scripts/ERS_D.T31_g37.A.bugaboo.t3_boo/SourceMods/src.datm -I/home/siraj/ccsm4_working_copy/models/atm/datm -I/home/siraj/ccsm4_working_copy/models/atm/datm/cpl_mct -I/home/siraj/ERS_D.T31_g37.A.bugaboo.t3_boo/lib/include -DMCT_INTERFACE -DHAVE_MPI -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -132 -fp-model precise -convert big_endian -assume byterecl -ftz -traceback -CU -check pointers -fpe0 -ftz /home/siraj/ccsm4_working_copy/models/atm/datm/datm_comp_mod.F90
/home/siraj/ccsm4_working_copy/models/atm/datm/datm_comp_mod.F90(26): warning #6536: Only rename information from the ONLY qualifiers for this external module will be processed since all entities from the external module have been declared public [SEQ_FLDS_MOD]
use seq_flds_mod
------^
/home/siraj/ccsm4_working_copy/models/atm/datm/datm_comp_mod.F90(33): error #6580: Name in only-list does not exist. [SEQ_FLDS_X2A_FIELDS]
seq_flds_x2a_fields
-------------------------------^
/home/siraj/ccsm4_working_copy/models/atm/datm/datm_comp_mod.F90(481): error #6404: This name does not have a type, and must have an explicit type. [SEQ_FLDS_X2A_FIELDS]
call mct_aVect_init(x2a, rList=seq_flds_x2a_fields, lsize=lsize)
-----------------------------------^
/home/siraj/ccsm4_working_copy/models/atm/datm/datm_comp_mod.F90(481): error #6285: There is no matching specific subroutine for this generic subroutine call. [MCT_AVECT_INIT]
call mct_aVect_init(x2a, rList=seq_flds_x2a_fields, lsize=lsize)
---------^
compilation aborted for /home/siraj/ccsm4_working_copy/models/atm/datm/datm_comp_mod.F90 (code 1)
gmake: *** [datm_comp_mod.o] Error 1

can you help me to investigate this as well.
Thanks

Siraj

sirajkhan78@...

And in the second run for ERT.f19_g16.B1850CN.bugaboo I get the following error in ice.bldlog.100615-125136

mpif90 -c -I. -I/usr/local/intel/impi/4.0.0.025/intel64/include -I. -I/home/siraj/ccsm4_working_copy/scripts/ERT.f19_g16.B1850CN.bugaboo.t12_boo/SourceMods/src.cice -I/home/siraj/ccsm4_working_copy/models/ice/cice/src/drivers/cpl_mct -I/home/siraj/ccsm4_working_copy/models/ice/cice/src/drivers/cpl_share -I/home/siraj/ccsm4_working_copy/models/ice/cice/src/mpi -I/home/siraj/ccsm4_working_copy/models/ice/cice/src/source -I/home/siraj/ERT.f19_g16.B1850CN.bugaboo.t12_boo/lib/include -DCCSMCOUPLED -Dcoupled -Dncdf -DNCAT=5 -DNXGLOB=320 -DNYGLOB=384 -DNTR_AERO=3 -DBLCKX=10 -DBLCKY=192 -DMXBLCKS=1 -DMCT_INTERFACE -DHAVE_MPI -DCO2A -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -132 -fp-model precise -convert big_endian -assume byterecl -ftz -traceback -O2 /home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90
/home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90(22): warning #6536: Only rename information from the ONLY qualifiers for this external module will be processed since all entities from the external module have been declared public [ICE_COMMUNICATE]
use ice_communicate
-------^
/home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90(23): error #6580: Name in only-list does not exist. [MASTER_TASK]
use ice_communicate, only : my_task, master_task, lprint_stats
----------------------------------------^
/home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90(23): error #6580: Name in only-list does not exist. [LPRINT_STATS]
use ice_communicate, only : my_task, master_task, lprint_stats
-----------------------------------------------------^
/home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90(166): error #6404: This name does not have a type, and must have an explicit type. [MASTER_TASK]
if(my_task == master_task) then
-----------------^
/home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90(1429): error #6404: This name does not have a type, and must have an explicit type. [LPRINT_STATS]
if(lprint_stats) then
------^
/home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90(1429): error #6341: A logical data type is required in this context. [LPRINT_STATS]
if(lprint_stats) then
------^
/home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90(1440): error #6341: A logical data type is required in this context. [LPRINT_STATS]
if(lprint_stats) then
------^
/home/siraj/ccsm4_working_copy/models/ice/cice/src/source/ice_distribution.F90(1474): error #6285: There is no matching specific subroutine for this generic subroutine call. [BROADCAST_ARRAY]
call broadcast_array(blockLocation,master_task)
--------------^

eaton

This looks like a compiler problem. This code has been built on many different platforms using many different compilers. So I don't have any reason to think that there is a problem in the Fortran syntax. It seems that you were having better luck with the PGI compiler.

sirajkhan78@...

Hi eaton,

Thanks for your continuous support. Well I did many things to change to compiler setting. And with my new settings compset X and B are working and compiling. But all other which include data model i.e A, F,D etc are have the same problem as show for one of the case below. Can you grab this out where is the problem. The same compiler is compiling other cases like B (where all components are active) I tried alot but by any mean using intel compiler i am getting this error. Please help me to solve this out.

mpif90 -c -I. -I/home/siraj/netcdf/include -I/home/siraj/netcdf/include -I. -I/home/siraj/ccsm4/scripts/ERS_D.T31_g37.A.bugaboo.t3_boo/SourceMods/src.datm -I/home/siraj/ccsm4/models/atm/datm -I/home/siraj/ccsm4/models/atm/datm/cpl_mct -I/home/siraj/ccsm4/ERS_D.T31_g37.A.bugaboo.t3_boo/lib/include -DMCT_INTERFACE -DHAVE_MPI -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -132 -fp-model precise -convert big_endian -assume byterecl -ftz -traceback -CU -check pointers -fpe0 -ftz /home/siraj/ccsm4/models/atm/datm/datm_comp_mod.F90
/home/siraj/ccsm4/models/atm/datm/datm_comp_mod.F90(26): warning #6536: Only rename information from the ONLY qualifiers for this external module will be processed since all entities from the external module have been declared public [SEQ_FLDS_MOD]
use seq_flds_mod
------^
/home/siraj/ccsm4/models/atm/datm/datm_comp_mod.F90(33): error #6580: Name in only-list does not exist. [SEQ_FLDS_X2A_FIELDS].....

eaton

I don't have access right now to a platform with an intel compiler, but I expect to have access in a week or so. Sorry I can't provide faster feedback, but I'll keep this thread in mind. Just for future reference, please provide info about what version of the intel compiler you're using.

sirajkhan78@...

Hi Eaton,

I am still waiting for your help about the data model (compset A or F) issue running on intel machine. Kindly help me to find this problem. I am using

64-bit Scientific Linux 5.3
4.0.1 Fortan: -lnetcdff C: -lnetcdf C++: -lnetcdf_c++
Intel 11.1 (ifort, f77,f90,f95, icc, cc) compiler

and the error for the case
create_newcase -case my_f2 -compset F -res T31_g37 -mach bugaboo

is

CCSM BUILDEXE SCRIPT STARTING
- Build Libraries: mct pio csm_share
Mon Jul 12 11:43:57 PDT 2010 /home/siraj/ccsm4/my_f2/mct/mct.bldlog.100712-114355
Mon Jul 12 11:44:48 PDT 2010 /home/siraj/ccsm4/my_f2/pio/pio.bldlog.100712-114355
Mon Jul 12 11:45:20 PDT 2010 /home/siraj/ccsm4/my_f2/csm_share/csm_share.bldlog.100712-114355
Mon Jul 12 11:46:07 PDT 2010 /home/siraj/ccsm4/my_f2/run/cpl.bldlog.100712-114355
Mon Jul 12 11:46:07 PDT 2010 /home/siraj/ccsm4/my_f2/run/atm.bldlog.100712-114355
Mon Jul 12 11:49:03 PDT 2010 /home/siraj/ccsm4/my_f2/run/lnd.bldlog.100712-114355
Mon Jul 12 12:07:10 PDT 2010 /home/siraj/ccsm4/my_f2/run/ice.bldlog.100712-114355
Mon Jul 12 12:08:27 PDT 2010 /home/siraj/ccsm4/my_f2/run/ocn.bldlog.100712-114355
ERROR: docn.buildexe.csh failed, see /home/siraj/ccsm4/my_f2/run/ocn.bldlog.100712-114355
ERROR: cat /home/siraj/ccsm4/my_f2/run/ocn.bldlog.100712-114355

and the ocn.bldlog.100712-114355 file show

Mon Jul 12 12:08:27 PDT 2010 /home/siraj/ccsm4/my_f2/run/ocn.bldlog.100712-114355
cat: Srcfiles: No such file or directory
/home/siraj/ccsm4/scripts/my_f2/Tools/mkSrcfiles > /home/siraj/ccsm4/my_f2/ocn/obj/Srcfiles
cp -f /home/siraj/ccsm4/my_f2/ocn/obj/Filepath /home/siraj/ccsm4/my_f2/ocn/obj/Deppath
/home/siraj/ccsm4/scripts/my_f2/Tools/mkDepends Deppath Srcfiles > /home/siraj/ccsm4/my_f2/ocn/obj/Depends
mpif90 -c -I. -I/home/siraj/netcdf/include -I/home/siraj/netcdf/include -I. -I/home/siraj/ccsm4/scripts/my_f2/SourceMods/src.docn -I/home/siraj/ccsm4/models/ocn/docn -I/home/siraj/ccsm4/models/ocn/docn/cpl_mct -I/home/siraj/ccsm4/my_f2/lib/include -DMCT_INTERFACE -DHAVE_MPI -DCO2A -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -132 -fp-model precise -convert big_endian -assume byterecl -ftz -traceback -O2 /home/siraj/ccsm4/models/ocn/docn/docn_comp_mod.F90
/home/siraj/ccsm4/models/ocn/docn/docn_comp_mod.F90(24): warning #6536: Only rename information from the ONLY qualifiers for this external module will be processed since all entities from the external module have been declared public [SEQ_FLDS_MOD]
use seq_flds_mod
------^
/home/siraj/ccsm4/models/ocn/docn/docn_comp_mod.F90(31): error #6580: Name in only-list does not exist. [SEQ_FLDS_X2O_FIELDS]
seq_flds_x2o_fields
-------------------------------^
/home/siraj/ccsm4/models/ocn/docn/docn_comp_mod.F90(373): error #6404: This name does not have a type, and must have an explicit type. [SEQ_FLDS_X2O_FIELDS]
call mct_aVect_init(x2o, rList=seq_flds_x2o_fields, lsize=lsize)
-----------------------------------^
call mct_aVect_init(x2o, rList=seq_flds_x2o_fields, lsize=lsize)
-----------------------------------^
/home/siraj/ccsm4/models/ocn/docn/docn_comp_mod.F90(373): error #6285: There is no matching specific subroutine for this generic subroutine call. [MCT_AVECT_INIT]
call mct_aVect_init(x2o, rList=seq_flds_x2o_fields, lsize=lsize)
---------^
compilation aborted for /home/siraj/ccsm4/models/ocn/docn/docn_comp_mod.F90 (code 1)
gmake: *** [docn_comp_mod.o] Error 1

eaton

I have access to a small linux cluster with intel-11.0.074 installed. On this system when I try the create_newcase command that you used, the build fails trying to build the mct library. I'm afraid I don't have time to track this down. The CSEG group doesn't test the ccsm on any intel platforms, so that port will be more challenging than to a pgi platform.

If you are interested in F compsets then I'd suggest that you consider running CAM in standalone mode. I was able to build CAM standalone using intel. One difference between the CAM standalone build and the F compset build is that CAM4 uses a CAM specific data ocean model rather than the docn component which is used in the F compset. The compilation failure you report was in the docn component, so this may be a way around that problem.

sirajkhan78@...

Hi eaton,

thanks for your support.

Well I am still interested to use CCSM4 for CAM stand alone case as well as all active model case. In last few days I worked on another machine which has Intell 11 compiler with MPHIC2. But this intel machine has the same probelm as i had in perivous case. This means that for intel machine, only X and B compset of CCSM are working and all other cases which include data model fails.

I wonder if any other user who has access to intel machine, use F case or A case to check the problem. Is this a bug in CCSM4 release?

ylu9@...

I solved a similar problem when using compset I_2000_CN in CCSM4. I am working on Linux cluster using intel compiler (ifort, icc). I guess the problem is because intel compiler may strict to use external module have been declared public. In datm_comp_mod.F90, you will find a duplicate use of seq_flds_mod.

Below are errors I had:

---------------------
/export/home/ylu/ccsm4_0/models/atm/datm/datm_comp_mod.F90(26): warning #6536: Only rename information from the ONLY qualifiers for this external module will be processed since all entities from the external module have been declared public [SEQ_FLDS_MOD]
use seq_flds_mod
------^
/export/home/ylu/ccsm4_0/models/atm/datm/datm_comp_mod.F90(33): error #6580: Name in only-list does not exist. [SEQ_FLDS_X2A_FIELDS]
seq_flds_x2a_fields
-------------------------------^

Here is my solution:
1. cp export/home/ylu/ccsm4_0/models/atm/datm/datm_comp_mod.F90 /home/ylu/ccsm4_0/aaa_intel_I2000CN/SourceMods/src.datm/.
2. Under SourceMods/src.datm/ directory, delete line 26 in datm_comp_mod.F90 (delete "use seq_flds_mod" ).

Then CCSM4 build successfully.

eaton

Thanks for this information. I see that in the CCSM4 code the docn_comp_mod.F90 file has the same problem that you found in the datm_comp_mod.F90 file.

Note that this problem has been fixed in the CESM1.0 release.

Log in or register to post comments

Who's new

  • anjanadevanand@...
  • hfrenzel@...
  • yunqian.zhu@...
  • acostar@...
  • Daniele.visioni@...