Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

mpiexec error -- submitted case wouldn't stop

baih

New Member
Hi all,I'm running CAM5 with component FC5 and resolution 0.9*1.25 on TAMU supercomputer. The case build was successful, and no error appeared. But after I submitted my job, the run wouldn't stop and there was no "run FAILED" message. But I was able to run 1.9*2.5 with the same component on the machine before.After I killed the job, I checked the cesm.log file, and found the following nessages in the end of the log file. I also attched the whole cesm.log file here in case you want to see it.[mpiexec@nxt2143] Sending Ctrl-C to processes as requested[mpiexec@nxt2143] Press Ctrl-C again to force abortCtrl-C caught... cleaning up processes[mpiexec@nxt2143] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor)[mpiexec@nxt2143] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:244): unable to write data to proxy[mpiexec@nxt2143] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:136): unable to send SIGINT downstream[mpiexec@nxt2143] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status[mpiexec@nxt2143] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:501): error waiting for event[mpiexec@nxt2143] main (../../ui/mpich/mpiexec.c:1059): process manager error waiting for completionDoes it mean there is something wrong with mpi?I'd appreciate it if you have any idea about what the problem is with my case.Qiu
 
Top