Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

E3SM-FATES single-point simulation case submit ERROR

xxyii

xiuyi wu
New Member
Dear all,

I tried to perform the E3SM-FATES single-point simulation on Perlmutter. Recently when I submitted the case, I got the following ERROR. Strangely, none of the previous simulations reported errors, and the models were all able to run and output results.

ERROR: RUN FAIL: Command 'srun --label -n 1 -N 1 -c 2 --cpu_bind=cores -m plane=128 /pscratch/sd/x/myuser/e3sm_scratch/pm-cpu/Spin_up_1x1_mysite.IELMBGC.ELM_USRDAT.001.2024-07-16/bld/e3sm.exe >> e3sm.log.$LID 2>&1 ' failed
See log file for details: /pscratch/sd/x/myuser/e3sm_scratch/pm-cpu/Spin_up_1x1_mysite.IELMBGC.ELM_USRDAT.001.2024-07-16/run/e3sm.log.28453129.240722-232021

Find the ERROR keyword in the above log file, and the main errors are as follows.

PE 0: MPICH_ABORT_ON_ERROR = 0
PE 0: MPICH_MPIIO_ABORT_ON_RW_ERROR= disable
ERROR: Unknown error submitted to shr_abort_abort
MPICH ERROR [Rank 0] [job id 28453133.0] [Mon Jul 22 23:20:42 2024] [nid004682] - Abort(1001) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1001) - process 0
srun: error: nid004682: task 0: Exited with exit code 233
srun: Terminating StepId=28453133.0

The attached file is my log error file and the sh file that created the case.

Has anyone encountered a similar issue when submitting a case? Any suggestions and comments would be greatly appreciated!!
 

Attachments

  • e3sm.log.28453129.txt
    15.5 KB · Views: 5
  • create_run1_1x1tanguroMTBR_fates_spinup.txt
    4.6 KB · Views: 2

katec

CSEG and Liaisons
Staff member
HI, these boards are generally just for CESM simulations and issues, and your problem with E3SM is probably best discussed with experts in that model. However, I can see from this log that it looks like the error occurred in the initialization of the land model, so I suspect there's a problem with reading in the land datasets. You'll probably find more information if you look at the log file specific to the land model.
 

xxyii

xiuyi wu
New Member
HI, these boards are generally just for CESM simulations and issues, and your problem with E3SM is probably best discussed with experts in that model. However, I can see from this log that it looks like the error occurred in the initialization of the land model, so I suspect there's a problem with reading in the land datasets. You'll probably find more information if you look at the log file specific to the land model.
Thank you for your reply. I will check the land model specifically.
 
Top