Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Porting CESM1.3 to Cheyenne: PIO error after submission

Hi all, Recently we are porting CESM with a version of 1.3 to Cheyenne. We follow the guidelines (https://docs.google.com/document/d/1V5_oIA_ZPmLsMKp0rZlQ99CqQshx2pqcZsVQhT7lCb0/edit) provided by the development team and successfully compile the model on Cheyenne. However, After one-month simulation, the model crashes and returns PIO error. Any suggestions are appreciated.The PE layout is customized by ourselves../xmlchange NTASKS_ATM=468,NTHRDS_ATM=2,ROOTPE_ATM=0./xmlchange NTASKS_CPL=468,NTHRDS_CPL=2,ROOTPE_CPL=0./xmlchange NTASKS_ICE=324,NTHRDS_ICE=1,ROOTPE_ICE=144./xmlchange NTASKS_LND=144,NTHRDS_LND=1,ROOTPE_LND=0./xmlchange NTASKS_ROF=144,NTHRDS_ROF=1,ROOTPE_ROF=0 ./xmlchange NTASKS_OCN=128,NTHRDS_OCN=2,ROOTPE_OCN=468 Case directory:/glade/u/home/che43/cases/cheyenne.20ka.itrace.ice_ghg_orb_mwtr.01Log fie:/glade/p/cwis0001/iTRACE/cheyenne.20ka.itrace.ice_ghg_orb_mwtr.01/run/cesm.log.171010-091845 Error message:89:Process ID: 67315, Host: r12i4n19, Program: /glade/p/cwis0001/iTRACE/cheyenne.20ka.itrace.ice_ghg_orb_mwtr.01/bld/cesm.exe89:MPT Version: SGI MPT 2.15  09/03/16 04:15:5489:89:MPT: --------stack traceback-------1:Image              PC                Routine            Line        Source             1:cesm.exe           000000000309B32D  Unknown               Unknown  Unknown1:cesm.exe           0000000002AA29A1  pio_support_mp_pi         120  pio_support.F901:cesm.exe           0000000002AA0F7E  pio_utils_mp_chec          74  pio_utils.F901:cesm.exe           0000000002BAA2C7  pionfwrite_mod_mp         249  pionfwrite_mod.F90.in1:cesm.exe           0000000002B79F2F  piodarray_mp_writ         643  piodarray.F90.in1:cesm.exe           0000000002B7CB41  piodarray_mp_writ         221  piodarray.F90.in1:cesm.exe           000000000221BFCE  ncdio_pio_mp_ncd_        1482  ncdio_pio.F90.in1:cesm.exe           000000000218AA46  histfilemod_mp_hf        2443  histFileMod.F901:cesm.exe           0000000002182B9A  histfilemod_mp_hi        2922  histFileMod.F901:cesm.exe           00000000020E19D1  clm_driver_mp_clm         852  clm_driver.F901:cesm.exe           00000000020A476C  lnd_comp_mct_mp_l         449  lnd_comp_mct.F901:cesm.exe           000000000041F932  component_mod_mp_        1022  component_mod.F901:cesm.exe           000000000040B2E4  cesm_comp_mod_mp_        2345  cesm_comp_mod.F901:cesm.exe           000000000041D57B  MAIN__                     93  cesm_driver.F901:cesm.exe           000000000040915E  Unknown               Unknown  Unknown1:libc-2.19.so       00002AAAB04C7B25  __libc_start_main     Unknown  Unknown1:cesm.exe           0000000000409069  Unknown               Unknown  Unknown  
 

jedwards

CSEG and Liaisons
Staff member
1: NetCDF: Numeric conversion not representable1: pio_support::pio_die:: myrank=          -1 : ERROR: 1: pionfwrite_mod::write_nfdarray_double:         249 : 1: NetCDF: Numeric conversion not representable
This error indicates that you are trying to write a value that cannot be represented by the data type specified.  Often this is because you are trying to write a NaN or a value to big for a 4-byte real into the file.   
 
Thanks, jedwardsThis error occurs accidentally on Yellowstone when we do simulations. But usually, a resubmission or two would solve this issue. Right now, we always see it on Cheyenne. If the value cannot be represented by the data type of its own, how could it possible to be solved by a resubmission?
 

jedwards

CSEG and Liaisons
Staff member
Solution by resubmission is not a solution it's a bandaid - you need to go into the model and figure out what field it is that is out of spec and how to correct the problem.Look at your stack trace and figure out what the problem variable is and print out the values.
 
Top