Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

MPI Invalid Tag Error

hsjung

Hee-Sung Jung
New Member
What version of the code are you using?
cesm 2.1.5

Have you made any changes to files in the source tree?
No

Describe every step you took leading up to the problem:
Compset: 2000_DATM%CPLHIST_SLND_CICE_DOCN%SOM_SROF_SGLC_SWAV_TEST
Grid: Grid resolution
Number of instances: 20
Compiler: intel, intelmpi

Describe your problem or question:
Hi everyone,

I am trying to run a multi-instance standalone CICE run with data atmosphere and ocean.
The atmospheric forcing is a coupler history output from a fully coupled control simulation.
The ocean forcing is slab ocean model forcing also generated from the fully coupled control simulation.

I have perturbed the atmospheric forcing files to generate a 20-member ensemble run.
When I submit the case, the run fails due to an invalid tag MPI error (the log files and env_mach_pes.xml are attached).
No errors are seen from the component log files, but only in the cesm.log file (even with the DEBUG=TRUE).

Is there a way to solve this issue?

Best,
Hee-Sung
 

Attachments

  • logs.zip
    224.2 KB · Views: 1
  • env_mach_pes.zip
    1.8 KB · Views: 0

jedwards

CSEG and Liaisons
Staff member
I don't see any obvious reason for the error. You might try using fewer instances - maybe start with 2 and work your way up?
 

hsjung

Hee-Sung Jung
New Member
Hi jedwards,

Thanks for the hint.
I tried with a fewer instances and it seems to work well for up to 10 instances!

Best
Hee-Sung
 

jedwards

CSEG and Liaisons
Staff member
There is a variable in MPI MPI_TAB_UB, you might check the value of that variable - it might explain why tag values of 1060075 and
larger are failing. Maybe if you can find where that mpi_irecv call is we can replace the tag with something that will work in your case.
 
Top