Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

a sudden abort error after running waccmx for a month in CESM2.1.1

tckm@whu_edu_cn

New Member
hi,i create a case with FXHIST compset and f19_f19_mg16 res with my porting machine in CESM2.1.1.the strating date is 20060101,and i can run this case successfully, but when the case run after a month , a sudden stop occured. part of the content from the cesm.log is pio_support::pio_die:: myrank=          -1 : ERROR: pionfput_mod.F90:         256 : The specified netCDF file does not exist. pio_support::pio_die:: myrank=          -1 : ERROR: pionfput_mod.F90:         256 : The specified netCDF file does not exist. pio_support::pio_die:: myrank=          -1 : ERROR: pionfput_mod.F90:         256 : The specified netCDF file does not exist. pio_support::pio_die:: myrank=          -1 : ERROR: pionfput_mod.F90:         256 : The specified netCDF file does not exist. pio_support::pio_die:: myrank=          -1 : ERROR: pionfput_mod.F90:         256 : The specified netCDF file does not exist. pio_support::pio_die:: myrank=          -1 : ERROR: pionfput_mod.F90:         256 : The specified netCDF file does not exist. pio_support::pio_die:: myrank=          -1 : ERROR: pionfput_mod.F90:         256 : The specified netCDF file does not exist. pio_support::pio_die:: myrank=          -1 : ERROR: pionfput_mod.F90:         256 : The specified netCDF file does not exist.--------------------------------------------------------------------------MPI_ABORT was invoked on rank 97 in communicator MPI_COMM_WORLDwith errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.You may or may not see output from other processes, depending onexactly when Open MPI kills them.--------------------------------------------------------------------------[n0049:26174] 7 more processes have sent help message help-mpi-api.txt / mpi-abort[n0049:26174] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
can somebody know what cause this abort and how to solve the problem. thanks very much for helping and replying.
 

jedwards

CSEG and Liaisons
Staff member
It's not clear from your log files which file is the problem.   So first you need to identify which file is the problem - is it an input or an output?   Sometimes errors like this occur when the filesystem is not mounted on the compute node.  You might also run ./check_input_data in your case directory to see if it finds any files missing. 
 

tckm@whu_edu_cn

New Member
Thanks for your reply, i run the ./check_input_data ,and there is no data files missing../check_input_data Setting resource.RLIMIT_STACK to -1 from (-1, -1)Loading input file list: 'Buildconf/mosart.input_data_list'Loading input file list: 'Buildconf/docn.input_data_list'Loading input file list: 'Buildconf/clm.input_data_list'Loading input file list: 'Buildconf/cam.input_data_list'Loading input file list: 'Buildconf/cpl.input_data_list'Loading input file list: 'Buildconf/cice.input_data_list'
Then i ask the administrator of the computing center,he tells me the filesystem is mounted on the compute node. and i run some cases to test.  i find whenever does it start , the case can't run throughout the first day of the next month. For example, i set the starting date 20060130, and run this case ,after the case run 20060130 and 20060131 successfully, it abort when the case start the 20060201 with the same error "pio_support::pio_die:: myrank=    -1 : ERROR: pionfput_mod.F90:   256 : The specified netCDF file does not exist". i attach the pionfput_mod.F90, could you help me to find what is the specified netCDF file.Thanks for your helping.
 

jedwards

CSEG and Liaisons
Staff member
You'll need to figure out what file it's trying to open.   It's usually printed in a log someplace.  You might try setting DEBUG=TRUE and recompiling.   
 

tckm@whu_edu_cn

New Member
Hi,i set the DEBUG and recompile ,but still didn't find  the problem file. I also run setting the RESUBMIT 1 、RUN_STARTDATE 20060301 and STOP_N 1 MONTH ,but the abort probelm still occured. Then i run the ./create_test,and there were only a few cases created successfully, most tests were fail. whether these errors lead the running abort error , and do you know how to solve these problems.thank you.
 
Top