Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Getting error when resubmitting cesm1.2.2

Hello,

I run resubmit in cesm1.2.2. There is a message at the end of the completion of the first run. Could you help with it?

RESUBMIT is now 1
socket_connect_unix failed: 15137
qsub: cannot connect to server (null) (errno=15137) could not connect to trqauthd
ccsm_postrun error: problem sourcing tempres


Run Log:

-------------------------------------------------------------------------
CESM BUILDNML SCRIPT STARTING
- To prestage restarts, untar a restart.tar file into /home/test/csmruns/cam53_default/run
infile is /home/test/CSM/cam53_default/Buildconf/cplconf/cesm_namelist
CAM writing dry deposition namelist to drv_flds_in
Writing ocean component namelist to ./docn_in
CAM writing namelist to atm_in
CLM configure done.
CLM adding use_case 2000_control defaults for var sim_year with val 2000
CLM adding use_case 2000_control defaults for var sim_year_range with val constant
CLM adding use_case 2000_control defaults for var use_case_desc with val Conditions to simulate 2000 land-use
CICE configure done.
CESM BUILDNML SCRIPT HAS FINISHED SUCCESSFULLY
-------------------------------------------------------------------------
-------------------------------------------------------------------------
CESM PRESTAGE SCRIPT STARTING
- Case input data directory, DIN_LOC_ROOT, is /home/opt/app/csm_collections/inputdata
- Checking the existence of input datasets in DIN_LOC_ROOT
CESM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY
-------------------------------------------------------------------------
Mon Sep 15 15:37:38 IST 2014 -- CSM EXECUTION BEGINS HERE
Mon Sep 15 15:39:09 IST 2014 -- CSM EXECUTION HAS FINISHED
(seq_mct_drv): =============== SUCCESSFUL TERMINATION OF CPL7-CCSM ===============
RESUBMIT is now 1
socket_connect_unix failed: 15137
qsub: cannot connect to server (null) (errno=15137) could not connect to trqauthd
ccsm_postrun error: problem sourcing tempres


I am not able to understand why this qsub error is coming, while our run script is running fine.
Error is coming when script is trying to resubmit the job. Please suggest.

Thanking you in anticipation.
 
Hello, Thanks for replying.Yes you are right. Our scheduler configuration does not allow to submit jobs from compute nodes. So, I just wanted to confirm that if I make the all compute nodes submit host, will it effect our cluster performance or is it gud for our machine? While we have a small cluster with 10 nodes. By default scheduler have 1 submit node (i.s master node) and others are compute nodes. Thanking you in anticipation.   
 
Hello, Thanks for replying.Yes you are right. Our scheduler configuration does not allow to submit jobs from compute nodes. So, I just wanted to confirm that if I make the all compute nodes submit host, will it effect our cluster performance or is it gud for our machine? While we have a small cluster with 10 nodes. By default scheduler have 1 submit node (i.s master node) and others are compute nodes. Thanking you in anticipation.   
 

santos

Member
I'm not an expert on setting up clusters, but I believe that it is fine to allow job submission from all hosts. Most of the clusters I've used allow you to use qsub from any node.
 

santos

Member
I'm not an expert on setting up clusters, but I believe that it is fine to allow job submission from all hosts. Most of the clusters I've used allow you to use qsub from any node.
 
Top