Hi,
I am just getting started with CAM3, and am having problems getting it to build and run successfully on our opteron cluster with MPI. The single threaded version seems to work fine. When I try to run the SMPD version, I get the following error:
Attempting to initialize run control settings .....
0 - MPI_BCAST : Invalid count argument is -32420864
[0] Aborting program !
[0] Aborting program!
p0_6624: p4_error: : 8258
Killed by signal 2.
Using debug print statements, I found that the error is coming from models/lnd/clm2/src/main/controlMod.F90 line 724. It appears that the count sent to mpi_bcast() is getting generated improperly by the compiler. The line is:
call mpi_bcast(hist_type1d_pertape, max_namlen*size(hist_type1d_pertape), MPI_CHARACTER, 0, mpicom, ier)
If I print out max_namlen before the call it shows a value of 32. If I print out the value of size(hist_type1d_pertape) before the call is shows a value of 6. However, if I print the product of these 2, I get the same bad negative number. I am using the Portland Group pgf90 compiler version 5.2-2.
Has anyone seen any problems like this? Any recommendations for more stable versions of the compiler?
Thanks,
Chuck
I am just getting started with CAM3, and am having problems getting it to build and run successfully on our opteron cluster with MPI. The single threaded version seems to work fine. When I try to run the SMPD version, I get the following error:
Attempting to initialize run control settings .....
0 - MPI_BCAST : Invalid count argument is -32420864
[0] Aborting program !
[0] Aborting program!
p0_6624: p4_error: : 8258
Killed by signal 2.
Using debug print statements, I found that the error is coming from models/lnd/clm2/src/main/controlMod.F90 line 724. It appears that the count sent to mpi_bcast() is getting generated improperly by the compiler. The line is:
call mpi_bcast(hist_type1d_pertape, max_namlen*size(hist_type1d_pertape), MPI_CHARACTER, 0, mpicom, ier)
If I print out max_namlen before the call it shows a value of 32. If I print out the value of size(hist_type1d_pertape) before the call is shows a value of 6. However, if I print the product of these 2, I get the same bad negative number. I am using the Portland Group pgf90 compiler version 5.2-2.
Has anyone seen any problems like this? Any recommendations for more stable versions of the compiler?
Thanks,
Chuck