Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM 1.2.2 with PIO exhibits strange ABORT error when writing CAM history files on Edison/Hopper

Hi everyone,
Michael Wehner and I are trying to get back to business after all of the upgrades/moves on Edison and Cori at NERSC, and we're encountering a problem with one of his cases involving PIO. The short story is that CAM's history file writer seems to be confused about the numerical precision it is supposed to be using--part of it seems to think that it should be writing single-precision data, when the rest of it thinks double precision data is appropriate. This problem occurs on both Edison and Cori. The details and debris can be found in /scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101 on Edison. The simulation begins, and early on we see in the log file /scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101/runs/cesm.log.160217-202125:
 
0001: Opened file seCAM5v2_2_prescribed_c0101.cam.rh0.2000-01-06-00000.nc to write
0001: 14
0001: pio_support::pio_die:: myrank= -1 : ERROR: pionfatt_mod.F90:
0001: Rank 1 [Wed Feb 17 20:33:01 2016] [c3-0c0s15n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
0001: 170 : NetCDF: Not a valid data type or _FillValue type mismatch
0001: forrtl: error (76): Abort trap signal
0001: Image PC Routine Line Source
0001: cesm.exe 00000000026242F1 Unknown Unknown Unknown
0001: cesm.exe 0000000002622A47 Unknown Unknown Unknown
0001: cesm.exe 00000000025D3404 Unknown Unknown Unknown

0001: cesm.exe 0000000001B982E1 pio_support_mp_pi 114 pio_support.F90
0001: cesm.exe 0000000001B966D5 pio_utils_mp_chec 59 pio_utils.F90
0001: cesm.exe 0000000001BA39D3 pionfatt_mod_mp_p 283 pionfatt_mod.F90.in
0001: cesm.exe 00000000004CB452 cam_history_mp_h_ 3875 cam_history.F90
0001: cesm.exe 00000000004C1DE6 cam_history_mp_ws 4575 cam_history.F90
0001: cesm.exe 00000000004BFB64 cam_history_mp_wr 871 cam_history.F90
0001: cesm.exe 00000000004FA980 cam_restart_mp_ca 241 cam_restart.F90
0001: cesm.exe 00000000004B773D cam_comp_mp_cam_r 414 cam_comp.F90
0001: cesm.exe 00000000004A5012 atm_comp_mct_mp_a 547 atm_comp_mct.F90
0001: cesm.exe 00000000004061ED ccsm_comp_mod_mp_ 4079 ccsm_comp_mod.F90
0001: cesm.exe 00000000004232BB MAIN__ 91 ccsm_driver.F90
0001: cesm.exe 000000000040090E Unknown Unknown Unknown
0001: cesm.exeThis error is reported on each process involved in I/O. Digging in, we see that this error occurs at line 3875 of cam_history.F90, in which CESM appears to be attempting to write a single-precision value, in spite of the fact that the rest of the simulation uses double precision data. We are a little confused. The variable that appears to govern the single/double precision choice is tape(t)%hlist(f)%hwrt_prec in line 3871, and is evidently not set to 8 (for double precision).Meanwhile, we can see in the ATM log (/scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101/run/atm.log.160217-202125) that, in writing the history files, there are several lines saying "Output precision:                single" and one saying "Output precision:                double", which suggests that there's going to be data of both types in the file(s)?
I'm a bit out of my depth here, but this and other issues we have encountered make us think that we are perhaps among the first to try running CESM 1.2.2 on Edison/Cori since their reconfiguration, at least with PIO. Has anyone gotten the PIO configuration to work properly on these machines lately?
Please let me know if you need us to change permissions on these files, or if we can provide any more information. We have filed a ticket with NERSC on this, but our experience is that they just don't use PIO very much.
 

jedwards

CSEG and Liaisons
Staff member
At line 3871 of cam_history.F90 replace:if (tape(t)%hlist(f)%hwrt_prec == 8) then

with if ((tape(t)%hlist(f)%hwrt_prec == 8) .or. restart) then
 
Top