rambhari0123@gmail_com
Member
Dear CESM Users, I am trying to run cesm1.2.0 (CAM5.3) as CAM standalone on Linux machine with PGI comlier and parallely compiled NeTCDF4.2 libraries, I used the following commands lines to submit the run:Commands to configue:/home/opt/app/cesm1_2_0/models/atm/cam/bld/configure -fc_type pgi -fc mpif90 -cc mpicc -dyn fv -hgrid 1.9x2.5 -ntasks 24 -nosmp -testcommand to build the model:gmakeCommand to build the namelist:/home/opt/app/cesm1_2_0/models/atm/cam/bld/build-namelist -test -config /home/dilip/New_PerformanceTests/Test1_24/bld/config_cache.xmlCommandlines To submit the run:#!/bin/sh #$ -pe mpi 24#$ -cwd #$ -j y #$ -S /bin/bash # /opt/pgi/linux86-64/2013/mpi2/mpich/bin/mpirun -np 24 /home/dilip/New_PerformanceTests/Test1_24/bld/camTh Details about the machine specific i am using to run the model are as followings:No. of nodes -1 master node 9 compute nodes and 12 processor per node, 24 GB RAM P/N. where Master Node -Fujitsu Primergy RX 300S7, Intel Xeon ES260@ 2GHz, 24GB RAM, 8TB HDD and compute node(0-8)- Fujitsu Primergy RX 200S7, Intel Xeon ES-2620 @ 2GHz,24GB RAM, 500GB HDD. it is Rock Cluster 6 with Sun Grid Engine job scheduler (SGE 6) and Compiler- PGI. It is a Operating System Linux (CentOS 6.2).
I got the run terminated with following error message:/home/opt/inputdata/atm/cam/solar/solar_ave_sc19-sc23.c090810.nc solar_data_readnl: solar_data_type = SERIAL solar_data_readnl: solar_data_ymd = 0 solar_data_readnl: solar_data_tod = 0PGFIO/stdio: Input/output errorPGFIO-F-/OPEN/unit=99/error code returned by host stdio - 5. File name = atm_in In source file /home/opt/app/cesm1_2_0/models/atm/cam/src/chemistry/utils/solar_data.F90, at line number 94Fatal error in PMPI_Bcast: Other MPI error, error stack:PMPI_Bcast(1478)......................: MPI_Bcast(buf=0x24fe2e0, count=1, MPI_LOGICAL, root=0, comm=0xc4000002) failedMPIR_Bcast_impl(1321).................:MPIR_Bcast_intra(1119)................:MPIR_Bcast_scatter_ring_allgather(962):MPIR_Bcast_binomial(213)..............: Failure during collectiveMPIR_Bcast_scatter_ring_allgather(955):MPIR_Bcast_binomial(189)..............:MPIC_Send(63).........................:MPIDI_EagerContigShortSend(262).......: failure occurred while attempting to send an eager messageMPIDI_CH3_iStartMsg(36)...............: Communication error with rank 12---------------------- I also tried the to create the similler kind of case with 18 processors but the run got terminated with similler kind of error,I am attaching the the log files for both cases here, The number in file name indicate the numbers of processors,Thanking you anticipations,Please help me out in running a successful CAM run.
I got the run terminated with following error message:/home/opt/inputdata/atm/cam/solar/solar_ave_sc19-sc23.c090810.nc solar_data_readnl: solar_data_type = SERIAL solar_data_readnl: solar_data_ymd = 0 solar_data_readnl: solar_data_tod = 0PGFIO/stdio: Input/output errorPGFIO-F-/OPEN/unit=99/error code returned by host stdio - 5. File name = atm_in In source file /home/opt/app/cesm1_2_0/models/atm/cam/src/chemistry/utils/solar_data.F90, at line number 94Fatal error in PMPI_Bcast: Other MPI error, error stack:PMPI_Bcast(1478)......................: MPI_Bcast(buf=0x24fe2e0, count=1, MPI_LOGICAL, root=0, comm=0xc4000002) failedMPIR_Bcast_impl(1321).................:MPIR_Bcast_intra(1119)................:MPIR_Bcast_scatter_ring_allgather(962):MPIR_Bcast_binomial(213)..............: Failure during collectiveMPIR_Bcast_scatter_ring_allgather(955):MPIR_Bcast_binomial(189)..............:MPIC_Send(63).........................:MPIDI_EagerContigShortSend(262).......: failure occurred while attempting to send an eager messageMPIDI_CH3_iStartMsg(36)...............: Communication error with rank 12---------------------- I also tried the to create the similler kind of case with 18 processors but the run got terminated with similler kind of error,I am attaching the the log files for both cases here, The number in file name indicate the numbers of processors,Thanking you anticipations,Please help me out in running a successful CAM run.