jnjohnson@lbl_gov
New Member
Hi everyone,
Michael Wehner and I are trying to get back to business after all of the upgrades/moves on Edison and Cori at NERSC, and we're encountering a problem with one of his cases involving PIO. The short story is that CAM's history file writer seems to be confused about the numerical precision it is supposed to be using--part of it seems to think that it should be writing single-precision data, when the rest of it thinks double precision data is appropriate. This problem occurs on both Edison and Cori. The details and debris can be found in /scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101 on Edison. The simulation begins, and early on we see in the log file /scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101/runs/cesm.log.160217-202125:
0001: Opened file seCAM5v2_2_prescribed_c0101.cam.rh0.2000-01-06-00000.nc to write
0001: 14
0001: pio_support::pio_die:: myrank= -1 : ERROR: pionfatt_mod.F90:
0001: Rank 1 [Wed Feb 17 20:33:01 2016] [c3-0c0s15n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
0001: 170 : NetCDF: Not a valid data type or _FillValue type mismatch
0001: forrtl: error (76): Abort trap signal
0001: Image PC Routine Line Source
0001: cesm.exe 00000000026242F1 Unknown Unknown Unknown
0001: cesm.exe 0000000002622A47 Unknown Unknown Unknown
0001: cesm.exe 00000000025D3404 Unknown Unknown Unknown
0001: cesm.exe 0000000001B982E1 pio_support_mp_pi 114 pio_support.F90
0001: cesm.exe 0000000001B966D5 pio_utils_mp_chec 59 pio_utils.F90
0001: cesm.exe 0000000001BA39D3 pionfatt_mod_mp_p 283 pionfatt_mod.F90.in
0001: cesm.exe 00000000004CB452 cam_history_mp_h_ 3875 cam_history.F90
0001: cesm.exe 00000000004C1DE6 cam_history_mp_ws 4575 cam_history.F90
0001: cesm.exe 00000000004BFB64 cam_history_mp_wr 871 cam_history.F90
0001: cesm.exe 00000000004FA980 cam_restart_mp_ca 241 cam_restart.F90
0001: cesm.exe 00000000004B773D cam_comp_mp_cam_r 414 cam_comp.F90
0001: cesm.exe 00000000004A5012 atm_comp_mct_mp_a 547 atm_comp_mct.F90
0001: cesm.exe 00000000004061ED ccsm_comp_mod_mp_ 4079 ccsm_comp_mod.F90
0001: cesm.exe 00000000004232BB MAIN__ 91 ccsm_driver.F90
0001: cesm.exe 000000000040090E Unknown Unknown Unknown
0001: cesm.exeThis error is reported on each process involved in I/O. Digging in, we see that this error occurs at line 3875 of cam_history.F90, in which CESM appears to be attempting to write a single-precision value, in spite of the fact that the rest of the simulation uses double precision data. We are a little confused. The variable that appears to govern the single/double precision choice is tape(t)%hlist(f)%hwrt_prec in line 3871, and is evidently not set to 8 (for double precision).Meanwhile, we can see in the ATM log (/scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101/run/atm.log.160217-202125) that, in writing the history files, there are several lines saying "Output precision: single" and one saying "Output precision: double", which suggests that there's going to be data of both types in the file(s)?
I'm a bit out of my depth here, but this and other issues we have encountered make us think that we are perhaps among the first to try running CESM 1.2.2 on Edison/Cori since their reconfiguration, at least with PIO. Has anyone gotten the PIO configuration to work properly on these machines lately?
Please let me know if you need us to change permissions on these files, or if we can provide any more information. We have filed a ticket with NERSC on this, but our experience is that they just don't use PIO very much.
Michael Wehner and I are trying to get back to business after all of the upgrades/moves on Edison and Cori at NERSC, and we're encountering a problem with one of his cases involving PIO. The short story is that CAM's history file writer seems to be confused about the numerical precision it is supposed to be using--part of it seems to think that it should be writing single-precision data, when the rest of it thinks double precision data is appropriate. This problem occurs on both Edison and Cori. The details and debris can be found in /scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101 on Edison. The simulation begins, and early on we see in the log file /scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101/runs/cesm.log.160217-202125:
0001: Opened file seCAM5v2_2_prescribed_c0101.cam.rh0.2000-01-06-00000.nc to write
0001: 14
0001: pio_support::pio_die:: myrank= -1 : ERROR: pionfatt_mod.F90:
0001: Rank 1 [Wed Feb 17 20:33:01 2016] [c3-0c0s15n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
0001: 170 : NetCDF: Not a valid data type or _FillValue type mismatch
0001: forrtl: error (76): Abort trap signal
0001: Image PC Routine Line Source
0001: cesm.exe 00000000026242F1 Unknown Unknown Unknown
0001: cesm.exe 0000000002622A47 Unknown Unknown Unknown
0001: cesm.exe 00000000025D3404 Unknown Unknown Unknown
0001: cesm.exe 0000000001B982E1 pio_support_mp_pi 114 pio_support.F90
0001: cesm.exe 0000000001B966D5 pio_utils_mp_chec 59 pio_utils.F90
0001: cesm.exe 0000000001BA39D3 pionfatt_mod_mp_p 283 pionfatt_mod.F90.in
0001: cesm.exe 00000000004CB452 cam_history_mp_h_ 3875 cam_history.F90
0001: cesm.exe 00000000004C1DE6 cam_history_mp_ws 4575 cam_history.F90
0001: cesm.exe 00000000004BFB64 cam_history_mp_wr 871 cam_history.F90
0001: cesm.exe 00000000004FA980 cam_restart_mp_ca 241 cam_restart.F90
0001: cesm.exe 00000000004B773D cam_comp_mp_cam_r 414 cam_comp.F90
0001: cesm.exe 00000000004A5012 atm_comp_mct_mp_a 547 atm_comp_mct.F90
0001: cesm.exe 00000000004061ED ccsm_comp_mod_mp_ 4079 ccsm_comp_mod.F90
0001: cesm.exe 00000000004232BB MAIN__ 91 ccsm_driver.F90
0001: cesm.exe 000000000040090E Unknown Unknown Unknown
0001: cesm.exeThis error is reported on each process involved in I/O. Digging in, we see that this error occurs at line 3875 of cam_history.F90, in which CESM appears to be attempting to write a single-precision value, in spite of the fact that the rest of the simulation uses double precision data. We are a little confused. The variable that appears to govern the single/double precision choice is tape(t)%hlist(f)%hwrt_prec in line 3871, and is evidently not set to 8 (for double precision).Meanwhile, we can see in the ATM log (/scratch1/scratchdirs/mwehner/seCAM5v2_2_prescribed_c0101/run/atm.log.160217-202125) that, in writing the history files, there are several lines saying "Output precision: single" and one saying "Output precision: double", which suggests that there's going to be data of both types in the file(s)?
I'm a bit out of my depth here, but this and other issues we have encountered make us think that we are perhaps among the first to try running CESM 1.2.2 on Edison/Cori since their reconfiguration, at least with PIO. Has anyone gotten the PIO configuration to work properly on these machines lately?
Please let me know if you need us to change permissions on these files, or if we can provide any more information. We have filed a ticket with NERSC on this, but our experience is that they just don't use PIO very much.