Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

B cases hanging during initialization of atm

usha k

Usha K H
New Member
Hello,

I'm running CESM v2.1.5 with B1850 compset. The run keeps hanging near the end of the initialization irrespective of number of nodes. None of the log files show an error message. I have come across similar issues in the forum but none help. I am using intel compiler v 2023.1.0, with impi as my mpi lib and pbs job submission. I have used upto 9 nodes, with max task per node =128 (it is our institute system). As suggested by other discussions I have put export OMP_STACKSIZE=256M, ulimit -c unlimited etc. the pe layout is the default one :
./pelayout
Comp NTASKS NTHRDS ROOTPE
CPL : 1024/ 1; 0
ATM : 1024/ 1; 0
LND : 512/ 1; 0
ICE : 512/ 1; 512
OCN : 128/ 1; 1024
ROF : 512/ 1; 0
GLC : 1024/ 1; 0
WAV : 256/ 1; 0
ESP : 1/ 1; 0

Kindly help. I have attached my run log and machine files (machine name is champ).
 

Attachments

  • atm_log.txt
    2.1 KB · Views: 1
  • cpl_log.txt
    41 KB · Views: 1
  • cesm_log.txt
    11 KB · Views: 1
  • config_compilers.txt
    46.9 KB · Views: 0
  • config_machines.txt
    125.5 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
There is no indication of a problem here. I notice that you are not setting a batch system (eg pbs or slurm)
Are you sharing the compute tasks with other jobs? Have you had a champ system expert look at the problem with you?
 

usha k

Usha K H
New Member
There is no indication of a problem here. I notice that you are not setting a batch system (eg pbs or slurm)
Are you sharing the compute tasks with other jobs? Have you had a champ system expert look at the problem with you?
Hi, Actually I have set the batch system as pbs in both config_machines and config_batch. I have attached the env_mach_specific file now . The compute tasks are not shared. Also, I am giving maximum number of cores per node available (i.e. 128 cores). I even tried running with debug =true. Since there are no errors champ system people are not able to help at this point.

What am I missing here?
 

Attachments

  • env_mach_specific.txt
    1.3 KB · Views: 0
Top