The B1850 compset with reso "f19_g17" results in the UCX error below when I submit. I tried the compset QPC4 --res f45_f45_mg37, and got no UCX erorr - (at least after manually exporting UCX_TLS=ud,sm,self). Has this got anything to do this resolution? I understand this may be a system issue- or maybe not. But someone must have encountered it before.
[1721139988.509109] [uagc20-12:212195:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509109] [uagc20-12:212196:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509189] [uagc21-03:158607:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509140] [uagc21-01:261909:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509401] [uagc21-04:56013:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509179] [uagc21-05:127821:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509343] [uagc21-02:2655724:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509188] [uagc21-03:158608:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
MPIDI_OFI_mpi_init_hook(1602)....:
insert_addr_table_roots_only(451): OFI get address vector map failed
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)............:
MPID_Init(1532)..................:
MPIDI_OFI_mpi_init_hook(1602)....:
insert_addr_table_roots_only(451): OFI get address vector map failed
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)............:
MPID_Init(1532)..................:
MPIDI_OFI_mpi_init_hook(1602)....:
insert_addr_table_roots_only(451): OFI get address vector map failed
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)............:
[1721139988.509109] [uagc20-12:212195:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509109] [uagc20-12:212196:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509189] [uagc21-03:158607:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509140] [uagc21-01:261909:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509401] [uagc21-04:56013:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509179] [uagc21-05:127821:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509343] [uagc21-02:2655724:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
[1721139988.509188] [uagc21-03:158608:0] select.c:630 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, sysv/memory - Destination is unreachable, posix/memory - Destination is unreachable
MPIDI_OFI_mpi_init_hook(1602)....:
insert_addr_table_roots_only(451): OFI get address vector map failed
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)............:
MPID_Init(1532)..................:
MPIDI_OFI_mpi_init_hook(1602)....:
insert_addr_table_roots_only(451): OFI get address vector map failed
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)............:
MPID_Init(1532)..................:
MPIDI_OFI_mpi_init_hook(1602)....:
insert_addr_table_roots_only(451): OFI get address vector map failed
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)............: