Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM.EXE runs but does not finish, does not output log files

joe_s

Joe Salamone
New Member
Hello.
I am using CESM 2.1.5 and am having difficulty getting cesm.exe to run as ported to my machine. I have not made any code changes or namelist changes.

I created the case with:

./create_newcase --case f2000climo_testcase --compset F2000climo --res f09_f09_mg17 --mach lake

I have set NTASKS=32 and NTASKS_ESP=1 based on other threads I have read where the number of tasks needs to be less than the total number of cores to see if that resolved the issue (but alas it did not). I am running on 8 nodes with 32 cores each, and made sure to set max tasks per core to 32 in my machine config file (attached).

I run case setup, preview namelists, check input data with download option, then build the case. The code builds successfully without errors.

I can successfully submit the job to our PBS queuing system (file attached). The job will run on a cluster of intel xeon gold processors, and the PBS job script has the execution line of:

mpirun --report-bindings --bind-to core --map-by socket:PE=1 -n 256 -N 32 bld/cesm.exe

The job runs and uses all of the requested 256 cores. But there are no log files generated in the run directory and the cesm.exe just runs endlessly (I have tried up to 10 hours and it does not finish). There are no errors output in the PBS logfile generated when the job is accepted and run by the queuing system.

If there are build steps I am missing or other porting steps I have omitted please offer recommendations, checks and steps to follow. Any help or support is greatly appreciated.

Thank you
 

jedwards

CSEG and Liaisons
Staff member
If there are no log files then it is likely you are hanging in the mpi_init step - can you run a basic hello world mpi program on 256 tasks?
Try running cesm on a single node and work your way up to your goal of 32.
 
Top