Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

MPI_BCAST failure on opteron cluster, compiler problem?

bardeenc

New Member
Hi,

I am just getting started with CAM3, and am having problems getting it to build and run successfully on our opteron cluster with MPI. The single threaded version seems to work fine. When I try to run the SMPD version, I get the following error:

Attempting to initialize run control settings .....

0 - MPI_BCAST : Invalid count argument is -32420864
[0] Aborting program !
[0] Aborting program!
p0_6624: p4_error: : 8258
Killed by signal 2.

Using debug print statements, I found that the error is coming from models/lnd/clm2/src/main/controlMod.F90 line 724. It appears that the count sent to mpi_bcast() is getting generated improperly by the compiler. The line is:

call mpi_bcast(hist_type1d_pertape, max_namlen*size(hist_type1d_pertape), MPI_CHARACTER, 0, mpicom, ier)

If I print out max_namlen before the call it shows a value of 32. If I print out the value of size(hist_type1d_pertape) before the call is shows a value of 6. However, if I print the product of these 2, I get the same bad negative number. I am using the Portland Group pgf90 compiler version 5.2-2.

Has anyone seen any problems like this? Any recommendations for more stable versions of the compiler?

Thanks,

Chuck
 

gcarr@ucar_edu

New Member
The Opteron processor is not yet supported by CCSM. Work is underway. We do not know when we might have things working, tested, and available.
 
Top