Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

parallel run problem

hello,

I want to run  cam5 in parallel (-nospmd -ntask  32) mode using pgf90 compiler.but after submitting job it shows error asPGFIO-F-209/OPEN/unit=99/'OLD' specified for file which does not exist.
File name = drv_in
In source file /home/saroj/cam/cam5/cesm1_0/models/drv/driver/seq_io_mod.F90, at line number 164
PGFIO-F-209/OPEN/unit=99/'OLD' specified for file which does not exist. I am attatching the log file for referenceIf anybody knows about this, Any kind of help will be appreciated.The specifications of server on which am trying to run cam5 in parallel mode is asHardware Specifications         ◦Master Node: Fujitsu PRIMERGY RX300 S7         ◦Compute Nodes (9): Fujitsu PRIMERGY RX200 S7         ◦Operating System: CentOS 6 Latest Version         ◦Compiler: pgf90        ◦Cluster Management Software: ROCKS or OSCAR         ◦Job Scheduler: Open PBS/Sun Grid EngineThank You


 
Hello,yes, drv_in file is created (by default) at the time of building namelist and am submitting the job (running model) in the same directory.job is running endlessly, giving the same line asFile name = drv_in In source file /home/2012asz8344/cam/cam5/cesm1_0/models/drv/driver/seq_io_mod.F90, at line number 164PGFIO-F-209/OPEN/unit=99/'OLD' specified for file which does not exist.for your reference, the content of the file drv_in is&ccsm_pes/&seq_infodata_inparm case_name              = 'camrun' ocean_tight_coupling           = .true. orb_iyear_ad           = 1990 start_type             = 'startup'/&seq_timemgr_inparm atm_cpl_dt             = 1800 restart_option         = 'monthly' start_ymd              = 101 stop_n         = 1 stop_option            = 'ndays'/&prof_inparm profile_single_file            = .true./&pio_inparm/  Thank You
 

santos

Member
I have experienced a similar problem sometimes when running PGI with MPICH on one of our test systems. The issue in that case is that CESM runs out of the wrong directory, but I'm still not sure why that happens. I can try to reproduce the error tomorrow.Which MPI implementation are you using?
 
Hello,This MPICH2 is in-built in the PGI compiler. so it get installed by default  along with the installation of PGI compiler.Actually if you look into carefully the earlier attatched logfile (createrd at the time of job submission or run), it tooks npes=1, even if we give ntasks more than two.so, is it like we have to mention any flags? or is it like server not getting what script we are providing while submitting the job?Thanks for your quick communication.Hopefully, it will get solved earlier.Thanks
 

eaton

CSEG and Liaisons
Typically a script by the name of mpirun or mpiexec is responsible for launching the mpi job and starting the execution in the requested number of tasks per node.  This is clearly not happening correctly in your run.  It may help to try running some simple mpi test code to resolve the problem. 
 

santos

Member
I hadn't noticed before, but this is a -nospmd -ntasks 32 run, which makes no sense, because nospmd is turning off MPI. I think that sbshewale meant to use "-spmd -nosmp -ntasks 32", which might actually work?
 

eaton

CSEG and Liaisons
When -ntasks is set then the -spmd or -nospmd settings are ignored by configure.  Basically, -ntasks N  implies -spmd. 
 
Hello,I have configure model as $CAMCFG/configure -dyn fv 1.9x2.5 -nosmp -ntasks 18 -fc pgf90 -cc pgcc -testIs it necessary to mention mpif90 here also?If there is problem in mpirun or mpiexecute command, then how model is able to run in serial mode on same server (or Is it like there is no need of mpirun in serial mode)We have  sun grid engine scheduler for submitting jobs on server.for your reference I have attatched run script.Thank You.
 

eaton

CSEG and Liaisons
The best way to build mpi executables is to use the mpif90 wrapper script since it knows the locations of the include and the library files.  So you should specify "-fc mpif90" to configure.  In addition, since configure has no reliable way of figuring out what fortran compiler is wrapped inside of mpif90 you need to specify this to configure, for example, "-fc_type pgi".
Serial runs do not need to be launched using mpirun, although I'm guessing that it would work. 
 
Hello,Actually am trying to run cam5.0 not cam5.1.In cam5.0 userguide, for parallel run, scrpit is$camcfg/configure -dyn fv -hgrid 1.9x2.5  -ntasks 6 -nosmp -testwhich am also giving and it configured correctly.but I mention mpif90, it gives error regarding netcdf library linkingso, may be there is no need of mentioning "-fc mpif90" at the time of configuration (in case of cam5.0).Please any suggestions regarding my earlier post.It is not solved yet.Thanks
 

eaton

CSEG and Liaisons
In cam5.0 the -fc_type option is not available.  That version will use pgf90 as the default compiler on a linux OS.  If that gave you a successful build when -ntasks is specified then I assume this works because the mpi is part of your compiler installation (which you stated in #5).  Since the CAM log output indicates the run is only started in 1 task, even though you requested more, this is still looking like a problem with your system.  The only thing I can think to suggest is to run some simple mpi tests to try and verify that your system is working with mpi.  There is an example of this kind of test in this thread: http://bb.cgd.ucar.edu/parallel-run-fatal-error-mpiirecv
 
Hello,Problem has resolved through mpirun.thanks for your quick communication.now I just  want  to install/test model on my personal computer (I mean right from installing all the things required (libraries) for 1 day test run (serial and parallel) along with setting .bashrc).so, I Installed gfortran,gcc, netcdf (compiled with gfortran),lapack.but when I configure like$CAMCFG/configure -dyn fv -hgrid 1.9x2.5 -nospmd -nosmp  -debug  >& config.log &(please find the attatched log file named 'config' for error)and when I forcefully mentioned fc and cc like$CAMCFG/configure -dyn fv -hgrid 1.9x2.5 -nospmd -nosmp -fc gfortran -cc gcc -debug  >& config.log &it says** ERROR: -fc option only recognized when target OS is linux. My system is mac os.The details of the system on which I want to install model is System Software Overview:System Version: OS X 10.8.2 (12C2034)Kernel Version: Darwin 12.2.1Boot Volume: Macintosh HDHardware Overview:Model Name: Mac miniModel Identifier: Macmini6,2Processor Name: Intel Core i7Processor Speed: 2.3 GHzNumber of Processors: 1Total Number of Cores: 4L2 Cache (per Core): 256 KBL3 Cache: 6 MBMemory: 4 GBBoot ROM Version: MM61.0106.B00
In user guide 5.1, for mac os, xlf90 compiler has suggested (but am trying to install CAM5.0).so when I tried for xlf90, I dint get any link to install xlf90.also my question is, can we really install cam5.0 using gfortran compiler?Thanks


 

eaton

CSEG and Liaisons
CAM5.0 was never validated with gfortran.  You'll have much better luck running CAM5.0 with the xlf or pgi compilers which were the main compilers used for its development.  However, if I were going to attempt to run CAM5.0 with gfortran I would work directly in the file models/atm/cam/bld/Makefile.in and set the appropriate macro flags in the Darwin section.  Don't try to set the -fc flag to configure, just let it use the default compiler, and set up the default compiler to be gfortran.CESM1_2_0/CAM5.3 is known to build with gfortran, so updating to the latest released code will be your best option. 
 

mai

Member
A long time ago Macs used PowerPC chips from IBM and there was a version of xlf for that platform. There is no xlf for Intel chips and, from what I have heard, likely never will be.
 
Top