Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Running CESM on NCAR's Yellowstone

16:Abort(1) on node 16 (rank 16 in comm 1140850688): Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
16:PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x74b1c00, scount=2496, dtype=0x4c000829, rbuf=0x74b1c00, rcnts=0x6b290f0, displs=0x6b291d0, dtype=0x4c000829, root=0, comm=0x84000001) failed
16:MPIR_Gatherv_impl(210):
16:MPIR_Gatherv(104).....:
16:MPIR_Localcopy(357)...: memcpy arguments alias each other, dst=0x74b1c00 src=0x74b1c00 len=19968
 
16:Abort(1) on node 16 (rank 16 in comm 1140850688): Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
16:PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x74b1c00, scount=2496, dtype=0x4c000829, rbuf=0x74b1c00, rcnts=0x6b290f0, displs=0x6b291d0, dtype=0x4c000829, root=0, comm=0x84000001) failed
16:MPIR_Gatherv_impl(210):
16:MPIR_Gatherv(104).....:
16:MPIR_Localcopy(357)...: memcpy arguments alias each other, dst=0x74b1c00 src=0x74b1c00 len=19968
 
16:Abort(1) on node 16 (rank 16 in comm 1140850688): Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
16:PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x74b1c00, scount=2496, dtype=0x4c000829, rbuf=0x74b1c00, rcnts=0x6b290f0, displs=0x6b291d0, dtype=0x4c000829, root=0, comm=0x84000001) failed
16:MPIR_Gatherv_impl(210):
16:MPIR_Gatherv(104).....:
16:MPIR_Localcopy(357)...: memcpy arguments alias each other, dst=0x74b1c00 src=0x74b1c00 len=19968
 
16:Abort(1) on node 16 (rank 16 in comm 1140850688): Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
16:PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x74b1c00, scount=2496, dtype=0x4c000829, rbuf=0x74b1c00, rcnts=0x6b290f0, displs=0x6b291d0, dtype=0x4c000829, root=0, comm=0x84000001) failed
16:MPIR_Gatherv_impl(210):
16:MPIR_Gatherv(104).....:
16:MPIR_Localcopy(357)...: memcpy arguments alias each other, dst=0x74b1c00 src=0x74b1c00 len=19968
 
16:Abort(1) on node 16 (rank 16 in comm 1140850688): Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
16:PMPI_Gatherv(398).....: MPI_Gatherv failed(sbuf=0x74b1c00, scount=2496, dtype=0x4c000829, rbuf=0x74b1c00, rcnts=0x6b290f0, displs=0x6b291d0, dtype=0x4c000829, root=0, comm=0x84000001) failed
16:MPIR_Gatherv_impl(210):
16:MPIR_Gatherv(104).....:
16:MPIR_Localcopy(357)...: memcpy arguments alias each other, dst=0x74b1c00 src=0x74b1c00 len=19968
 

jedwards

CSEG and Liaisons
Staff member
I found another file with the same problem, please get /glade/scratch/jedwards/cesmtests/ccsm3_test/SourceMods/src.cam/history.F90 and build again.

It turns out that there is another, easier solution.

Set the environment variable:
MP_EUIDEVELOP=min
and all should work fine.

This variable was recently removed from the default environment which explains why the errors only started appearing now.

- Jim
 

jedwards

CSEG and Liaisons
Staff member
I found another file with the same problem, please get /glade/scratch/jedwards/cesmtests/ccsm3_test/SourceMods/src.cam/history.F90 and build again.

It turns out that there is another, easier solution.

Set the environment variable:
MP_EUIDEVELOP=min
and all should work fine.

This variable was recently removed from the default environment which explains why the errors only started appearing now.

- Jim
 

jedwards

CSEG and Liaisons
Staff member
I found another file with the same problem, please get /glade/scratch/jedwards/cesmtests/ccsm3_test/SourceMods/src.cam/history.F90 and build again.

It turns out that there is another, easier solution.

Set the environment variable:
MP_EUIDEVELOP=min
and all should work fine.

This variable was recently removed from the default environment which explains why the errors only started appearing now.

- Jim
 

jedwards

CSEG and Liaisons
Staff member
I found another file with the same problem, please get /glade/scratch/jedwards/cesmtests/ccsm3_test/SourceMods/src.cam/history.F90 and build again.

It turns out that there is another, easier solution.

Set the environment variable:
MP_EUIDEVELOP=min
and all should work fine.

This variable was recently removed from the default environment which explains why the errors only started appearing now.

- Jim
 

jedwards

CSEG and Liaisons
Staff member
I found another file with the same problem, please get /glade/scratch/jedwards/cesmtests/ccsm3_test/SourceMods/src.cam/history.F90 and build again.

It turns out that there is another, easier solution.

Set the environment variable:
MP_EUIDEVELOP=min
and all should work fine.

This variable was recently removed from the default environment which explains why the errors only started appearing now.

- Jim
 
Hi Jim,

The CCSM3 has problems with resubmitting jobs automatically, though the manually submitted continue-run would be fine.
Could you please take a look at /glade/u/home/yafang/test1? There are two stdout files from Mar 1st. ccsm.stdout.891572
was from a manually submitted continue-run, ccsm.stdout.891581 was from the one submitted automatically. env_run was emptied
as a result of the aborted run.

Thanks,

Yafang
 
Hi Jim,

The CCSM3 has problems with resubmitting jobs automatically, though the manually submitted continue-run would be fine.
Could you please take a look at /glade/u/home/yafang/test1? There are two stdout files from Mar 1st. ccsm.stdout.891572
was from a manually submitted continue-run, ccsm.stdout.891581 was from the one submitted automatically. env_run was emptied
as a result of the aborted run.

Thanks,

Yafang
 
Hi Jim,

The CCSM3 has problems with resubmitting jobs automatically, though the manually submitted continue-run would be fine.
Could you please take a look at /glade/u/home/yafang/test1? There are two stdout files from Mar 1st. ccsm.stdout.891572
was from a manually submitted continue-run, ccsm.stdout.891581 was from the one submitted automatically. env_run was emptied
as a result of the aborted run.

Thanks,

Yafang
 
Hi Jim,

The CCSM3 has problems with resubmitting jobs automatically, though the manually submitted continue-run would be fine.
Could you please take a look at /glade/u/home/yafang/test1? There are two stdout files from Mar 1st. ccsm.stdout.891572
was from a manually submitted continue-run, ccsm.stdout.891581 was from the one submitted automatically. env_run was emptied
as a result of the aborted run.

Thanks,

Yafang
 
Top