Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CCSM3 TestCase Compile Good Cannot Find Executable

I'm establishing CCSM3 for the saguaro linux cluster at Arizona State University and I've run into an error that I have been unable to fix.

saguaro is infiniband
CCSM3 is build with pgi-6.2 fortran, gnu c and c++ 3.4.6 from red hat, and pgi-openmpi-1.2.3 all of which are preinstalled before my attempt

Per general testing instructions I downloaded the source code and inputdata, created a machine file and fixed on scripting errors that had to be dealt with.

I created a test case of SM.01a, using resolution T31 and gx3v5, and model set B. The script returned with a message of successful build. I then prepared a short torque script for job submission.

#!/bin/bash
#PBS -l nodes=32:ppn=1
#PBS -q medium
#PBS -N CCSM3_0_Smoke_Run
#PBS -l walltime=05:00:00
#PBS -o test.output
#PBS -j oe

cd $PBS_O_WORKDIR

./TSM.01a.T31_gx3v5.B.saguaro.100708.test

The executable goes fine for several seconds until the script says it cannot find certain executables. What gets interesting is which executables are nondeterministic, they differ from build to build.

-------------------------------------------------------
------------------
Preparing T31_gx3v5 component models for execution
-------------------------------------------------------
------------------
- Create execution directories for atm,cpl,lnd,ice,ocn
- If a restart run then copy restart files into execut
able directory
ccsm_getrestart: get /home/mfruchtm/ccsm3_0/exe/TSM.01a
.T31_gx3v5.B.saguaro.100708 restarts from /home/mfrucht
m/ccsm3_0/archive/TSM.01a.T31_gx3v5.B.saguaro.100708/re
start
- Check validity of configuration
- Determine if build must happen (env variable BLDTYPE
)
- Build flag (BLDTYPE) is TRUE
- Build Libraries: esmf, mph, mct
Wed Jul 30 11:12:51 MST 2008 esmf.buildlib.080730-11125
1
Wed Jul 30 11:12:55 MST 2008 mph.buildlib.080730-111251
Wed Jul 30 11:12:55 MST 2008 mct.buildlib.080730-111251
- Create model directories for each platform
- Determine if models must be rebuilt
- Build model executables, create namelist files, pres
tage input data
Wed Jul 30 11:12:59 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/cpl/cpl.log.080730-
111251
Wed Jul 30 11:12:59 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/ice/ice.log.080730-
111251
Wed Jul 30 11:13:06 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/ice/ice.buildexe.08
0730-111251
Wed Jul 30 11:13:11 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/lnd/lnd.log.080730-
111251
Wed Jul 30 11:13:11 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/lnd/lnd.buildexe.08
0730-111251
Wed Jul 30 11:13:15 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/ocn/ocn.log.080730-
111251
Wed Jul 30 11:13:15 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/ocn/ocn.buildexe.08
0730-111251
Wed Jul 30 11:13:20 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/atm/atm.log.080730-
111251
Wed Jul 30 11:13:20 MST 2008 /home/mfruchtm/ccsm3_0/exe
/TSM.01a.T31_gx3v5.B.saguaro.100708/atm/atm.buildexe.08
0730-111251
- Create MPH input file and link into all model dirs
- Create shr_msg_stdio chdir/stdin/stdout data file
-------------------------------------------------------
------------------
- CCSM BUILD HAS FINISHED SUCCESSFULLY

-------------------------------------------------------
------------------
skipping first model
PBS_MOMPORT=15003
COMP_ATM=cam
COMP_LND=clm
COMP_ICE=csim
COMP_OCN=pop
COMP_CPL=cpl
RAMP_CO2_START_YMD=00000000
Wed Jul 30 11:13:33 MST 2008 -- CSM EXECUTION BEGINS HE
RE
[saguaro-11-9.local:27617] [0,0,1] ORTE_ERROR_LOG: Not
found in file odls_default_module.c at line 1191
-------------------------------------------------------
-------------------
Failed to find the following executable:

Host: saguaro-11-9.local
Executable: -pg

Cannot continue.
-------------------------------------------------------
-------------------
[saguaro-11-9.local:27617] [0,0,1] ORTE_ERROR_LOG: Not
found in file orted.c at line 594
[saguaro-3-3.local:08330] [0,0,4] ORTE_ERROR_LOG: Not f
ound in file odls_default_module.c at line 1191
-------------------------------------------------------
-------------------
Failed to find the following executable:

Host: saguaro-3-3.local
Executable: -pg

Cannot continue.
-------------------------------------------------------
-------------------
[saguaro-3-3.local:08330] [0,0,4] ORTE_ERROR_LOG: Not f
ound in file orted.c at line 594
[saguaro-11-1.local:08318] [0,0,2] ORTE_ERROR_LOG: Not
found in file odls_default_module.c at line 1191
-------------------------------------------------------
-------------------
Failed to find the following executable:

Host: saguaro-11-1.local
Executable: -pg

Cannot continue.
-------------------------------------------------------
-------------------
[saguaro-11-1.local:08318] [0,0,2] ORTE_ERROR_LOG: Not
found in file orted.c at line 594
[saguaro-3-8.local:30923] [0,0,3] ORTE_ERROR_LOG: Not f
ound in file odls_default_module.c at line 1191
-------------------------------------------------------
-------------------
Failed to find the following executable:

Host: saguaro-3-8.local
Executable: -pg

Cannot continue.
-------------------------------------------------------
-------------------
[saguaro-3-8.local:30923] [0,0,3] ORTE_ERROR_LOG: Not f
ound in file orted.c at line 594
Wed Jul 30 11:13:33 MST 2008 -- CSM EXECUTION HAS FINIS
HED
Model did not complete - see cpl.log.080730-111251
/home/mfruchtm/ccsm3_0/archive/TSM.01a.T31_gx3v5.B.sagu
aro.100708/cpl: No such file or directory.


By the way the cpl.log.080730-111251 has empty except for the acknowledgment of its own creation


I was hoping someone had run across this error before since I am at a complete loss for why this error is occuring.
 
Nevermind, I'm sorry I bothered everyone's time.

The problem was that openmpi-1.2.3 does not use pgfiles to assign nodes and processors. A quick change to mpirun -np 32 ./$COMPONENTS[$n] solved the problem nicely.

If a moderator wants to delete this thread to do its triviality go ahead. I am rather ignorant of any mpi other than openmpi as a matter of chance.
 
Top