Main menu

Navigation

CESM1_2_2_1 runtime fail if change the pes

2 posts / 0 new
Last post
xlong@...
CESM1_2_2_1 runtime fail if change the pes

I run the CESM1_2_2_1 in f19_g16 configuration.

The default requested # of cores are 64.

If I set this numer as 64, 128 or 256, the model runs fine.

 

However, if I set it to 100, for example, the model fails and the error is something like:

 

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

aa area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

aa aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

ll area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

ll aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

rr area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

aa area ->MCT::m_AttrVect::indexRA_

rr aream->MCT::m_AttrVect::indexRA_

aa area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

aa aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

aa area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

aa aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

aa aream->MCT::m_AttrVect::indexRA_

 

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:


a area ->MCT::m_AttrVect::indexRA_

rr aream->MCT::m_AttrVect::indexRA_

aa area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

aa aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

aa area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

aa aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

aa aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

ll area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

ll aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

rr area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

rr aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

ll area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

ll aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

rr area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

rr aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

ll area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

ll aream->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "area" Traceback:

rr area ->MCT::m_AttrVect::indexRA_

MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "aream" Traceback:

rr aream->MCT::m_AttrVect::indexRA_

MCT::m_Router::initp_: GSMap indices not increasing...Will correct

MCT::m_Router::initp_: RGSMap indices not increasing...Will correct

MCT::m_Router::initp_: RGSMap indices not increasing...Will correct

MCT::m_Router::initp_: GSMap indices not increasing...Will correct

MCT::m_Rearranger::Rearrange_: Number of attributes in SourceAV and TargetAV do not match

MCT::m_Rearranger::Rearrange_: error, nRAttr(SourceAV)=1, nRAttr(TargetAV)=6.

063.MCT(MPEU)::die.: from MCT::m_Rearranger::Rearrange_()

application called MPI_Abort(MPI_COMM_WORLD, 2) - process 99

In: PMI_Abort(2, application called MPI_Abort(MPI_COMM_WORLD, 2) - process 99)

MCT::m_Rearranger::Rearrange_: Number of attributes in SourceAV and TargetAV do not match

MCT::m_Rearranger::Rearrange_: Number of attributes in SourceAV and TargetAV do not match

MCT::m_Rearranger::Rearrange_: error, nRAttr(SourceAV)=1, nRAttr(TargetAV)=6.

061.MCT(MPEU)::die.: from MCT::m_Rearranger::Rearrange_()

MCT::m_Rearranger::Rearrange_: Number of attributes in SourceAV and TargetAV do not match

MCT::m_Rearranger::Rearrange_: error, nRAttr(SourceAV)=1, nRAttr(TargetAV)=6.

062.MCT(MPEU)::die.: from MCT::m_Rearranger::Rearrange_()


application called MPI_Abort(MPI_COMM_WORLD, 2) - process 96

application called MPI_Abort(MPI_COMM_WORLD, 2) - process 97

In: PMI_Abort(2, application called MPI_Abort(MPI_COMM_WORLD, 2) - process 97)

In: PMI_Abort(2, application called MPI_Abort(MPI_COMM_WORLD, 2) - process 96)

application called MPI_Abort(MPI_COMM_WORLD, 2) - process 98

In: PMI_Abort(2, application called MPI_Abort(MPI_COMM_WORLD, 2) - process 98)

slurmstepd: error: *** STEP 9924.0 ON node-0086 CANCELLED AT 2019-05-20T23:52:49 ***

srun: Job step aborted: Waiting up to 122 seconds for job step to finish.

srun: error: node-0090: tasks 80-96,99: Killed

srun: Terminating job step 9924.0

srun: error: node-0088: tasks 40-59: Killed

srun: error: node-0090: tasks 97-98: Exited with exit code 2

srun: error: node-0087: tasks 20-39: Killed

srun: error: node-0089: tasks 60-79: Killed

 

srun: error: node-0086: tasks 0-19: Killed


 

xlong@...

I out debug option on, there is more information about the error.

forrtl: severe (408): fort: (3): Subscript #1 of the array RATTR has value 0 which is less than the lower bound of 1

 

Image              PC                Routine            Line        Source

cesm.exe           00000000069F0110  Unknown               Unknown  Unknown

cesm.exe           0000000000436079  ccsm_comp_mod_mp_        1428  ccsm_comp_mod.F90

cesm.exe           00000000004C10F5  MAIN__                     90  ccsm_driver.F90

cesm.exe           000000000041392E  Unknown               Unknown  Unknown

libc-2.17.so       00002B9694BDE3D5  __libc_start_main     Unknown  Unknown

cesm.exe           0000000000413829  Unknown               Unknown  Unknown

 

forrtl: severe (408): fort: (3): Subscript #1 of the array RATTR has value 0 which is less than the lower bound of 1


I have been running cesm1_2_2_1 on our older hpc system, and never had any issue.

This issue occur when I tried to port the model a new system.


Thanks

Log in or register to post comments

Who's new

  • zoe.gillett@...
  • jayaiisc@...
  • xianwu0403@...
  • jiangfeng@...
  • gonzalo-ferrada@...