Main menu


CESM2_0:CAM-Chem hanging in mpialltoallint

3 posts / 0 new
Last post
CESM2_0:CAM-Chem hanging in mpialltoallint

Benjamin Gaubert and I are trying to build a CAM-Chem
version from the released CESM2_0
I'm building a 1 degree model using compset
and giving it 6 nodes per instance in a 2 instance forecast.

A similar problem happens in single instance forecasts with 1, 2, or 3 nodes.

CAM stops progressing, although when I logged onto
the compute nodes, 'top' reports that all the CPUs were very busy.

I put in debug prints and narrowed the problem down to
call mpialltoallint(rdispls, 1, pdispls, 1, mpicom)
# if defined(MODCM_DP_TRANSPOSE)

I wouldn't be productive for me to try to pursue it any deeper,
so I'm hoping that someone else recognizes a mistake we're
making, or has ideas for things to change.

Jim Edwards suggested that this is a CAM problem,

rather than CIME, but there is an open CIME issue about it (2808).

Kevin and Benjamin


A test without chemistry succeeds.



Brian Eaton discovered that this was caused by a terrible definition of npr_yz (1,1,1,1)

in cam/cime_config/buildnml, which defined 1 subdomain, which ran the whole

dycore on 1 task.  Changing to the intended default,

build-namelist --ntasks $NTASKS_PER_INST_ATM

fixed this and the problem with initial file longitudes being set to 0 for large ensembles:

So this issue is resolved.

Log in or register to post comments

Who's new

  • jwolff
  • tinna.gunnarsdo...
  • sarthak2235@...
  • eolivares@...
  • shubham.gandhi@...