gabriel_abrahao@ufv_br
New Member
Hi,I've been trying to port CESM 1.2.2.1 to a cluster with SLURM. After adding all the required entries in the Machines folder (based on both userdefined and edison, which also uses SLURM), building a basic case with the all dead compset goes well. However, when I try to run it, after a very short while some kind of MPI communication error is raised:[...] (complete log attached)seq_flds_mod: seq_flds_x2w_states=
Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepth
seq_flds_mod: seq_flds_x2w_fluxes=
[sdumont1027:6618] *** An error occurred in MPI_Comm_create_keyval
[sdumont1027:6618] *** reported by process [23337304065,4294967296]
[sdumont1027:6618] *** on communicator MPI_COMM_WORLD
[sdumont1027:6618] *** MPI_ERR_ARG: invalid argument of some other kind
[sdumont1027:6618] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[sdumont1027:6618] *** and potentially your MPI job)
This happens even though I'm trying to run it with a single PE and task. The same also happens if I try it to run it with 24 PEs (setting it on env_mach_pes after a cesm_setup -clean and in the submission script) and with openmpi 1.1,2.0.1,2.1 and 3.0, all compiled with the same compiler used in the model (Inter XE 2017).Could anyone please help me with at least a hint of where the problem can be? Attached is the log file of a run, the submission script and the env_mach files to give a better idea of the environment.Thanks in advance,Gabriel Abrahão
Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepth
seq_flds_mod: seq_flds_x2w_fluxes=
[sdumont1027:6618] *** An error occurred in MPI_Comm_create_keyval
[sdumont1027:6618] *** reported by process [23337304065,4294967296]
[sdumont1027:6618] *** on communicator MPI_COMM_WORLD
[sdumont1027:6618] *** MPI_ERR_ARG: invalid argument of some other kind
[sdumont1027:6618] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[sdumont1027:6618] *** and potentially your MPI job)
This happens even though I'm trying to run it with a single PE and task. The same also happens if I try it to run it with 24 PEs (setting it on env_mach_pes after a cesm_setup -clean and in the submission script) and with openmpi 1.1,2.0.1,2.1 and 3.0, all compiled with the same compiler used in the model (Inter XE 2017).Could anyone please help me with at least a hint of where the problem can be? Attached is the log file of a run, the submission script and the env_mach files to give a better idea of the environment.Thanks in advance,Gabriel Abrahão