Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Generic MPT ERROR without apparent component error

Status
Not open for further replies.

Cathy Li

Xinchang 'Cathy' Li
New Member
Hi all,

I am trying to run a case using clm5.0.dev010 and compset IHistClm50Sp driven by GSWP3. The run has failed in three attempts with the same error in the cesm log (only the number following "rank" is different each time):
MPT ERROR: MPI_COMM_WORLD rank XXX has terminated without calling MPI_Finalize()
aborting job
Based on other threads in the forum, this seems to be a generic error that could be caused by many different things. Unfortunately, none of the component logs seems to have any error messages, so I am struggling to pinpoint the cause of this error.

The case directory is: /glade/u/home/xinchang/cases/test_withfeedback_clm5.0.dev010_GSWP3_hist
The run directory with log files are here: /glade/scratch/xinchang/test_withfeedback_clm5.0.dev010_GSWP3_hist/run

I have attached a summary of all the commands I used and changes I made to the run in sequence.

I would appreciate any insights on this issue. Many thanks!

Best,
Cathy
 

Attachments

  • case_commands_and_changes.pdf
    44.7 KB · Views: 2

oleson

Keith Oleson
CSEG and Liaisons
Staff member
It looks like it is dying in the process of interpolating the initial file. I see this in the cesm log:

pio_support::pio_die:: myrank= -1 : ERROR: pionfatt_mod.F90:
435 : NetCDF: Attribute not found

and a traceback in the code:

cesm.exe 00000000005A4426 initinterpmod_mp_ 288 initInterp.F90

Line 288 in initInterp is:

status = pio_get_att(ncidi, pio_global, &
'ilun_landice_multiple_elevation_classes', &
subgrid_special_indices%ilun_landice_multiple_elevation_classes)

So it is looking for the global attribute ilun_landice_multiple_elevation_classes, which is not on your initial file:

finidat = '/glade/p/cgd/tss/people/oleson/CLM5_restarts/ctsm51_ctsm51d090_1deg_CPLHIST_2000SPIN.clm2.r.0141-01-01-00000.nc'

I'm not sure why that wouldn't be on the initial file, maybe that attribute was removed in later versions of the code, as it looks like that initial file is generated from ctsm5.1.dev090.
I think it would be safe to add that global attribute to the initial file like:

ilun_landice_multiple_elevation_classes = 4

and then try your run again.
 

Cathy Li

Xinchang 'Cathy' Li
New Member
Hi Keith,

I see! Thank you so much for identifying the error. I saw those error messages in the CESM log but it completely went over my head as of what they meant. I will learn from this and continue to improve my troubleshooting skills!

I replaced the initial file with the one below instead, following the reference case you provided in the other thread (it has the attribute 'ilun_landice_multiple_elevation_classes = 4'): /glade/p/cgd/tss/people/oleson/CLM5_restarts/clm50_r265_1deg_GSWP3V1_iso_400i_ccrit_accum_1850pAD.clm2.r.1366-01-01-00000.nc

I had meant to use that one for this run but forgot to change it. I apologize for this oversight.

The run is now in the queue. I will report back on how it goes!

Thanks,
Cathy
 

Cathy Li

Xinchang 'Cathy' Li
New Member
Reporting back to say the model ran successfully! It took almost a day for the 20-min run to start but there was no more errors.

Thanks very much again for your help, Keith @oleson!

Best,
Cathy
 
Status
Not open for further replies.
Top