Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

MPI error (MPI_File_write_at_all) : I/O error (CESM2.1.1 on Cheyenne)

liyue1

Yue
New Member
Hello there!

Recently, I'm running the CESM2.1.1 on Cheyenne with a compset of the piControl. The simulation has successfully run for 55 years, and gets stuck in MPI error during last weekend. Below shows the MPI error in the log file:

1: Opened file b.e21.B1850.f09_g17.CMIP6-deforest-trop.002.cam.h0.1925-09.nc
1: to write 46
577:MPI error (MPI_File_write_at_all) : I/O error
MPT: Received signal 15

I don't think it's the nc file issue because I've tried to resubmit this run starting from 1925-01, the resubmit has a similar error:
613:MPI error (MPI_File_write_at_all) : I/O error
649:MPI error (MPI_File_write_at_all) : I/O error

Does anyone have ideas on why this happens suddenly, given that the simulations are going well for the first 55 years?
I'm attaching the cesm.log of first time crash and my second try of the resubmit. My run directory is under: /glade/scratch/liyue1/b.e21.B1850.f09_g17.CMIP6-deforest-trop.002/run/
Thanks in advance!
 

jedwards

CSEG and Liaisons
Staff member
Have you checked your glade quota? You may need to recompile with DEBUG=TRUE and resubmit that segment to get more information in the log.
 

liyue1

Yue
New Member
Have you checked your glade quota? You may need to recompile with DEBUG=TRUE and resubmit that segment to get more information in the log.
Thanks for your timely response! This is my first question regarding to CESM and I'm gonna love this place. It should be the quota problem since I didn't notice that the temporary storage has exceeded 10T storage space for myself. Great to know that I could use DEBUG=TRUE to see more details for next time!
 
Top