Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CLM5 case failure

James King

James King
Member
Hi all,

I have a CLM5 case which sometimes runs, but mostly doesn't. I wanted to ask if there is anything I can do to reduce the high crash rate. I am running over a custom domain (sub-Saharan Africa) at a 0.5 degree resolution and outputting CLM5 history fields per PFT as well as on the lon/lat grid. When the model fails, it seem to be after the completion of the first timesteps, implying that the error is related to the process of writing the history files.

The error message in the CESM log is

1369:MPT ERROR: Rank 1369(g:1369) received signal SIGSEGV(11).
1369: Process ID: 28314, Host: r14i7n22, Program: /glade/scratch/jamesking/i.clm5.AfrSSP126_allforcings.000/bld/cesm.exe
1369: MPT Version: HPE MPT 2.21 11/28/19 04:21:40

There are also some NetCDF: variable not found errors but I don't think these are pointing me towards the cause of the problem. I'm running CESM2.2.0 on Cheyenne and have attached my log files. Any insight into what's problematic about this case would be much appreciated.

Thanks,

James
 

Attachments

  • atm.log.3137410.chadmin1.ib0.cheyenne.ucar.edu.220304-053704.txt
    111 KB · Views: 3
  • cesm.log.3137410.chadmin1.ib0.cheyenne.ucar.edu.220304-053704.txt
    81 KB · Views: 1
  • cpl.log.3137410.chadmin1.ib0.cheyenne.ucar.edu.220304-053704.txt
    53 KB · Views: 0
  • lnd.log.3137410.chadmin1.ib0.cheyenne.ucar.edu.220304-053704.txt
    305.3 KB · Views: 1

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I don't see anything wrong with your case. The traceback is pointing to this line in histFileMod.F90:

hist1do(beg1d_out:end1d_out) = histo(beg1d_out:end1d_out,1)

which does have to do with the writing of history files.
I see that history files h1-h9 are all of the same type (monthly average, pft-level output).
You could try combining all of those into one history file.
You should be able to generate 10 history files but I myself have had occasional problems with doing that.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Actually, I think one potential problem is that you are requesting FAREA_BURNED at the pft-level and it is a column-level variable:

call hist_addfld1d (fname='FAREA_BURNED', units='s-1', &
avgflag='A', long_name='timestep fractional area burned', &
ptr_col=this%farea_burned_col)
 
Top