Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

MPI error in all dead set

Hi,I've been trying to port CESM 1.2.2.1 to a cluster with SLURM. After adding all the required entries in the Machines folder (based on both userdefined and edison, which also uses SLURM), building a basic case with the all dead compset goes well. However, when I try to run it, after a very short while some kind of MPI communication error is raised:[...] (complete log attached)seq_flds_mod: seq_flds_x2w_states=
Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepth
seq_flds_mod: seq_flds_x2w_fluxes=

[sdumont1027:6618] *** An error occurred in MPI_Comm_create_keyval
[sdumont1027:6618] *** reported by process [23337304065,4294967296]
[sdumont1027:6618] *** on communicator MPI_COMM_WORLD
[sdumont1027:6618] *** MPI_ERR_ARG: invalid argument of some other kind
[sdumont1027:6618] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[sdumont1027:6618] ***    and potentially your MPI job)

This happens even though I'm trying to run it with a single PE and task. The same also happens if I try it to run it with 24 PEs (setting it on env_mach_pes after a cesm_setup -clean and in the submission script) and with openmpi 1.1,2.0.1,2.1 and 3.0, all compiled with the same compiler used in the model (Inter XE 2017).Could anyone please help me with at least a hint of where the problem can be? Attached is the log file of a run, the submission script and the env_mach files to give a better idea of the environment.Thanks in advance,Gabriel Abrahão
 

yangx2

xinyi yang
Member
Hi,I've been trying to port CESM 1.2.2.1 to a cluster with SLURM. After adding all the required entries in the Machines folder (based on both userdefined and edison, which also uses SLURM), building a basic case with the all dead compset goes well. However, when I try to run it, after a very short while some kind of MPI communication error is raised:[...] (complete log attached)seq_flds_mod: seq_flds_x2w_states=
Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepth
seq_flds_mod: seq_flds_x2w_fluxes=

[sdumont1027:6618] *** An error occurred in MPI_Comm_create_keyval
[sdumont1027:6618] *** reported by process [23337304065,4294967296]
[sdumont1027:6618] *** on communicator MPI_COMM_WORLD
[sdumont1027:6618] *** MPI_ERR_ARG: invalid argument of some other kind
[sdumont1027:6618] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[sdumont1027:6618] *** and potentially your MPI job)

This happens even though I'm trying to run it with a single PE and task. The same also happens if I try it to run it with 24 PEs (setting it on env_mach_pes after a cesm_setup -clean and in the submission script) and with openmpi 1.1,2.0.1,2.1 and 3.0, all compiled with the same compiler used in the model (Inter XE 2017).Could anyone please help me with at least a hint of where the problem can be? Attached is the log file of a run, the submission script and the env_mach files to give a better idea of the environment.Thanks in advance,Gabriel Abrahão
Hi did you solve the problem? I meet the same problem with you.
 
Top