Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

task geometry - CCSM on bluevista

aswann

New Member
I am trying to run CCSM on bluevista with a modified input file and I have encountered problems with task geometry assignments.

I have made the following modifications to the generic CCSM3 'T42_gx1v3 compset K ' configuration on bluevista
1. I have changed clm.buildnml_prestage.csh to use a different input file for "fsurdat."
2. I have modified the nodes requested by the case.bluevista.run file

I suspect the problem is with #2

The case.bluevista.run file has the following LSF commands:
#==============================================================================
# This is a CCSM coupled model Load Leveler batch job script for bluevista
#==============================================================================
#BSUB -n 50
#BSUB -R "span[ptile=8]"
#BSUB -q regular
#BSUB -N
#BSUB -x
#BSUB -a poe
#BSUB -o poe.stdout.%J
#BSUB -e poe.stderr.%J
#BSUB -J case
#BSUB -W 0:30
##BSUB -P projectnum

setenv LSB_PJL_TASK_GEOMETRY "{(0,1,2,3,4,5,6,7)(8,9,10,11)(12,13,14)(15,16)(17,18)(19,20)(21)}"


From my understanding of TASK_GEOMETRY assignment, which is extremely limited, I believe that 8 nodes are required.

For BSUB -n values of 8, 16, and 22 (as opposed to the default value of 50) I receive the following error message (example from the BSUB -n 22):

Mon Jun 18 16:29:07 MDT 2007 -- CSM EXECUTION BEGINS HERE
LSB_PJL_TASK_GEOMETRY requests more hosts than LSB_MCPU_HOSTS can offer.
Please
1. Re-define LSB_PJL_TASK_GEOMETRY {(0,1,2,3,4,5,6,7)(8,9,10,11)(12,13,14)(15,16)(17,18)(19,20)(21)}, or
2. Modify LSF host selection criteria, or
3. Change job execution host order.
Exit...
Mon Jun 18 16:29:07 MDT 2007 -- CSM EXECUTION HAS FINISHED
Model did not complete - see cpl.log.070618-162847


I was hoping that someone might be able to see where I have misunderstood the TASK_GEOMETRY assignment or made some other simple error. Alternatively, is there some way my change of input file would affect the nodes necessary?

Any insights you have would be greatly appreciated!

Thanks-
Abby Swann
 
Hi Abby

There seems to be a typo on the task geometry syntax on your LSF commands.
A digit 8 and a closing parenthesis are missing after "(17,1" :

setenv LSB_PJL_TASK_GEOMETRY "{(0,1,2,3,4,5,6,7)(8,9,10,11)(12,13,14)(15,16)(17 ,1(19,20)(21)}"

should actually be:

setenv LSB_PJL_TASK_GEOMETRY "{(0,1,2,3,4,5,6,7)(8,9,10,11)(12,13,14)(15,16)(17 ,18)(19,20)(21)}"

This was probably caused when you edited case.bluevista.run.
However, this doesn't correspond to eight nodes, as you say.
Each pair of parenthesis is one node, and there are seven pairs.

My understanding is that the default "(TASK,THREAD)" setup on bluevista for the K compset
at T42_gx1v3 is:

Component Task Thread
cpl=cpl ----- 8 ----- 1
ice=dice ---- 1 ----- 1
lnd=clm ----- 4 ----- 2
ocn=docn --- 1 ------ 1
atm=cam ---- 8 ----- 4

Total processors = 50 = 8x4 + 4x2 + 1x1 + 1x1 + 8x1

The standard bluevista setup also uses a maximum of 8 processors per node,
which gives you 50/8=6.25=7 nodes.

The calculation of task geometry uses the total processors and the ordered list above and
should result in:

{(0,1,2,3,4,5,6,7)(8,9,10,11)(12,13,14)(15,16)(17 ,18)(19,20)(21)}

Tasks 0-7 run on node 1 and are used by cpl, one processor per task (thread=1)
Task 8 runs on the node 2 and is used by dice, one processor per task (thread=1)
Tasks 9 ,10,11 run on the node 2 and are used clm, two processors per task (thread=2)
Note that this leaves one processor idle on node 2.
Task 12 runs on node 3 and is used by clm, two processors per task (thread=2)
Tasks 13 runs on node 3 and is used by docn, one processor per task (thread=1)
Task 14 runs on node 3 and is used by cam, four processors per task (thread=4)
Note that this leaves one processor idel on node 3.
The remaining tasks, 15-21, run on nodes 4-7, four processors per task (thread=4).
Note that this leaves four processors idle on node 7.
Seven nodes are used, and a total of six processors get idle.

Your task geometry must be compatible with the total number of processors requsted (#BSUB -n 50).
and with the "(TASK,THREAD)" ordered list above.
A script called taskmaker.pl calculates the task geometry automatically for you when you build the "case"
using "create_newcase".

Task geometry and total number of processors cannot be changed when
the active ocean or ice models (pop and csim) are in use,
because they seem to hardwire the geometry of domain decomposition when they are compiled.
I am not sure if cam and clm also hardwire the domain decomposition, maybe yes, maybe not.
However, since you are not using pop or csim, this restriction may not apply,
and there is a chance that you could edit those items by hand,
but I am not sure about that.
However, this would require changing other things, like the poe.cmd file, for instance.
The safe and easy thing to do is just not to edit these things by hand after you build a "case".

Sorry for such a late answer.
I've been away from this bulletin board for a long time.
I just started to use the NCAR machines.
You may have already solved the problem.
Bluevista is now decomissioned!
You may be using another machine, or running CCSM3 somewhere else.
Anyway, I hope this helps.

Regards,
Gus Correa
 
Top