Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM case aborted with PIO error in cesm.log

yifanc17

Yifan Cheng
Member
What version of the code are you using?
CESM2.2.2

Describe your problem or question:
I am running a regional refined case using MUSICA grid & configurations. The simulation successfully ran from 2024-01-01 to 2024-02-06, and all previous resubmissions completed without issues. However, after archiving the 2024-02-06 outputs and restart files, and starting to run for 2024-02-07, I got below PIO errors in cesm.log. There are no errors in any other component log files:

Code:
dec2257.hsn.de.hpc.ucar.edu 1797: Obtained 10 stack frames.
dec0800.hsn.de.hpc.ucar.edu 298: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0x41dcf3]
dec0805.hsn.de.hpc.ucar.edu 386: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0x41dcf3]
dec0818.hsn.de.hpc.ucar.edu 521: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0xf6065e]
dec1040.hsn.de.hpc.ucar.edu 642: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0xf6065e]
dec1085.hsn.de.hpc.ucar.edu 770: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0x43a4fe]
dec1123.hsn.de.hpc.ucar.edu 899: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0xf6982d]
dec1124.hsn.de.hpc.ucar.edu 1030: /glade/u/apps/derecho/23.09/spack/opt/spack/parallelio/2.6.2/cray-mpich/8.1.27/oneapi/2023.2.1/zyhu/lib/libpiof.so(piodarray_mp_write_darray_1d_double_+0xe6) [0x14c42702c846]
dec1159.hsn.de.hpc.ucar.edu 1162: /glade/u/apps/derecho/23.09/spack/opt/spack/parallelio/2.6.2/cray-mpich/8.1.27/oneapi/2023.2.1/zyhu/lib/libpiof.so(piodarray_mp_write_darray_1d_double_+0xe6) [0x15468858f846]
dec1500.hsn.de.hpc.ucar.edu 1281: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0x51dae4]
dec1516.hsn.de.hpc.ucar.edu 1416: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0x509b81]
dec1534.hsn.de.hpc.ucar.edu 1561: /var/run/palsd/87e4d01e-5d2d-4f93-9ea1-ce698c059f0a/files/cesm.exe() [0xf6982d]
dec2257.hsn.de.hpc.ucar.edu 1797: /glade/u/apps/derecho/23.09/spack/opt/spack/parallelio/2.6.2/cray-mpich/8.1.27/oneapi/2023.2.1/zyhu/lib/libpioc.so(pio_err+0x7a) [0x153652b7e10a]
dec0594.hsn.de.hpc.ucar.edu 142: Obtained 10 stack frames.
dec0800.hsn.de.hpc.ucar.edu 298: MPICH ERROR [Rank 298] [job id 87e4d01e-5d2d-4f93-9ea1-ce698c059f0a] [Mon Oct 13 15:38:14 2025] [dec0800] - Abort(-1) (rank 298 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 298
dec0805.hsn.de.hpc.ucar.edu 386: MPICH ERROR [Rank 386] [job id 87e4d01e-5d2d-4f93-9ea1-ce698c059f0a] [Mon Oct 13 15:38:14 2025] [dec0805] - Abort(-1) (rank 386 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 386

The cesm.log itself is not very informative (attached below), so I am not sure what went wrong since the previous one-month simulation completed successfully. Disk quota is not an issue, and the library path '/glade/u/apps/derecho/23.09/spack/opt/spack/parallelio/2.6.2/cray-mpich/8.1.27/oneapi/2023.2.1/zyhu/lib/libpioc.so' is also valid.

Any suggestions or insights on this PIO failure would be greatly appreciated!
 

Attachments

  • cesm.log.3344682.desched1.251013-152319.txt.zip
    559.9 KB · Views: 1
Top