Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Error Running the CESM Model

jjoseph

Jibin Joseph
New Member
Hello:

I am trying to run a case with no land use forcing from 1850 to 2014. The run length was set as 2 years and resubmitted multiple times. The run was successful from 1850 to 1916. But, now the run is failing with the following error:
"MPT ERROR: Rank 535(g:535) is aborting with error code 1001."

I am not sure about the error. The entire log file is attached for further details. I also tried to replace the files from rest directory in the archive folder and re-run the model from 1915. But that gave another error listed in the post: known problem with mpiexec_mpt on cheyenne

Can anyone let me know what could be the error?

Thanks!
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
Hi Jiblin

I don't actually see the log file you intended to attach here. The error you are seeing is too generic to tell what's happening. MPT just means the Message Passing Toolkit which is the Intel MPI package on cheyenne. Task 535 is dying, but it's not clear why. This could be a multi-processing issue, memory issue, or a simple model error for a particular point, or a problem in I/O.

General debugging type things apply here. Use the log files to figure which component is dying and see if you can find the cause. Also try to simplify the case in terms of processors. Try running on a lower resolution grid and/or with fewer processors. Although if it's a memory issue you'll actually need to use more processors.
 

jjoseph

Jibin Joseph
New Member
Thanks for your response. As the log file was large, it did not get attached to the post. The log file can be found at /glade/scratch/jjoseph/b.e21.BHISTsmbb.f09_g17.LE2-1231.011_noLU/run/cesm.log.8556862.chadmin1.ib0.cheyenne.ucar.edu.230214-202833

I tried to look at the errors for a few days and was still not able to figure out the problem. Any help is appreciated.
 
Top