Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Questions on NTASKS, ROOTPE, and submission

liuys

liuyusong
New Member
I'm running an experiment,compset is BSSP126, but when I submit the job, I run into this issue where the coupling merge node calculates incorrectly. Here are the logs:

mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[63637,1],36]
Exit code: 174
--------------------------------------------------------------------------
this is my job


#!/bin/bash
#SBATCH -J ssp126_1pct
#SBATCH -p cpu_parallel
#SBATCH -N 4
#SBATCH -n 256
#SBATCH --exclusive
#SBATCH --mem=200G
#SBATCH --ntasks-per-node=64
#SBATCH -o cesm.o%j
#SBATCH -e cesm.e%j

module purge

module load compiler/intel/2017.5.239
module load mpi/hpcx/2.7.4/intel-2017.5.239
module load mathlib/hdf5/intel/1.8.20
module load mathlib/netcdf/intel/4.4.1

module load mathlib/lapack/intel/3.9.1
module load mathlib/pnetcdf/intel/1.12.1

ulimit -s unlimited

NP=$SLURM_NPROCS
mpirun -np $NP ./cesm.exe
 

fischer

CSEG and Liaisons
Staff member
Hi Liuys,

Depending on the configuration of your run, and the configuration of the computer your running on. You maybe running out of system resources.

Chris
 

liuys

liuyusong
New Member
Hi Liuys,

Depending on the configuration of your run, and the configuration of the computer your running on. You maybe running out of system resources.

Chris
Hi Fischer, thank you for your reply.But I I'm only running for two days, how can I expand the system resources, adjust the script, or make a request to the administrator?
 

Attachments

  • drv_in.txt
    6.4 KB · Views: 1

fischer

CSEG and Liaisons
Staff member
Hi Liuys,

You'll need to go to your case directory and do the following.

./case.build --clean-all

The following commands will increase the number of nodes you're using. I'm assuming that you have
32 processors per node and your running a f09_g17 resolution. You might need to talk to your administrator
to find out how many nodes you have available.

./xmlchange NTASKS_ATM=512
./xmlchange NTASKS_CPL=512
./xmlchange NTASKS_GLC=512
./xmlchange NTASKS_LND=256
./xmlchange NTASKS_ROF=256
./xmlchange NTASKS_ICE=256
./xmlchange NTASKS_WAV=128
./xmlchange NTASKS_OCN=64
./xmlchange ROOTPE_OCN=512
./xmlchange ROOTPE_ICE=256


Then rebuild and submit
./case.setup --reset
./case.build
./case.submit



Chris
 

liuys

liuyusong
New Member
Hi Liuys,

You'll need to go to your case directory and do the following.

./case.build --clean-all

The following commands will increase the number of nodes you're using. I'm assuming that you have
32 processors per node and your running a f09_g17 resolution. You might need to talk to your administrator
to find out how many nodes you have available.

./xmlchange NTASKS_ATM=512
./xmlchange NTASKS_CPL=512
./xmlchange NTASKS_GLC=512
./xmlchange NTASKS_LND=256
./xmlchange NTASKS_ROF=256
./xmlchange NTASKS_ICE=256
./xmlchange NTASKS_WAV=128
./xmlchange NTASKS_OCN=64
./xmlchange ROOTPE_OCN=512
./xmlchange ROOTPE_ICE=256


Then rebuild and submit
./case.setup --reset
./case.build
./case.submit



Chris
Hi,Fischer, thank you, I will try your method, and I will tell you the result.
 

liuys

liuyusong
New Member
Hi Liuys,

You'll need to go to your case directory and do the following.

./case.build --clean-all

The following commands will increase the number of nodes you're using. I'm assuming that you have
32 processors per node and your running a f09_g17 resolution. You might need to talk to your administrator
to find out how many nodes you have available.

./xmlchange NTASKS_ATM=512
./xmlchange NTASKS_CPL=512
./xmlchange NTASKS_GLC=512
./xmlchange NTASKS_LND=256
./xmlchange NTASKS_ROF=256
./xmlchange NTASKS_ICE=256
./xmlchange NTASKS_WAV=128
./xmlchange NTASKS_OCN=64
./xmlchange ROOTPE_OCN=512
./xmlchange ROOTPE_ICE=256


Then rebuild and submit
./case.setup --reset
./case.build
./case.submit



Chris
Hi,Fischer,I try your method,but it still doesn't work. Here are my logs. Similar error to last time.
 

Attachments

  • cesm.o12150207.txt
    593 bytes · Views: 6
  • cesm.e12150207.txt
    682.2 KB · Views: 6

fischer

CSEG and Liaisons
Staff member
Not sure what's going on. Can you provide me with the steps you took to get this error. You should also try
running a simple X compset with debug turned on.
 
Top