Tutorial gets error on yellowstone: something to do with too many tasks

Hi!

I am teaching a course, running the CESM, using a tutorial I used two years ago.  Using a script I
used two years ago, which ran, I now get the following error:
Execute poe command line: poe  /glade/scratch/mahowald/TestCLM_1/bld/cesm.exe
ATTENTION: 0031-393  Ignoring -resd/MP_RESD specified for batch job
ATTENTION: 0031-408  64 tasks allocated by Resource Manager, continuing...
ATTENTION: 0031-606 Unrecognized environment variable, MP_EAGER_LIMIT_LOCAL.
ERROR: 0031-758 AFFINITY: [ys5922] Oversubscribe: 32 tasks in total, each task requires 1 resource,
but there are only 16 available resource. Affinity can not be applied
ERROR: 0031-161  EOF on socket connection with node ys5922-ib
INFO: 0031-639  Exit status from pm_respond = -1

The run directory for the CESM is:
~mahowald/TestCLM_1

the scratch directory is:
/glade/scratch/mahowald/TestCLM_1/

If you could help me figure out what changed on yellowstone in the last two years that might have
impacted this, and/or how to fix this problem?CISL suggested I add the following:"setenv MP_TASK_AFFINITY cpu

before submitting the CESM run script. Better yet you may just enter into your cesm run script
somewhere before the command mpirun.lsf."

 Which I did, but it still didn't work.  Does anyone have any other suggestions?

Thanks very much.
Natalie
 

jedwards

CSEG and Liaisons
Staff member
In env_mach_pes.xml change MAX_TASKS_PER_NODE to 16.   This should solve the problem.   Currently you are trying to use 32 MPI tasks per node.Each node has 16 cpu's  - they can run up to 32 threads, but we recommend not using more than 16 mpi tasks per node. 
 
Thanks, this fixed it!  I think I could also fix it by just changing ptile=16 (instead of 32) in the run script, as I tried that also, and it worked). Thanks!Natalie
 
Back
Top