Main menu

Navigation

running CCSM3 on yellowstone

6 posts / 0 new
Last post
wliu.ucsd@...
running CCSM3 on yellowstone

Hi, there

I have been stuck in the CCSM3 running on yellowstone, since my case fails due to
the tasks are unexpectedly killed. I tried to submit the job several times, and
the error information varies.

Below are contents of two error files:

ccsm.stderr.290482
ERROR: 0031-161 EOF on socket connection with node ys0157-ib
ERROR: 0031-028 pm_mgr_handle; can't send a signal message to remote nodes
ERROR: 0031-619 No such file or directory
ERROR: 0031-028 pm_mgr_handle; can't send a signal message to remote nodes
ERROR: 0031-619 No such file or directory

ccsm.stderr.232837
ERROR: 0031-250 task 17: Killed
ERROR: 0031-250 task 6: Killed
ERROR: 0031-250 task 11: Killed
ERROR: 0031-250 task 39: Killed
ERROR: 0031-250 task 32: Killed
ERROR: 0031-250 task 35: Killed
ERROR: 0031-250 task 46: Killed
ERROR: 0031-250 task 43: Killed
ERROR: 0031-250 task 33: Killed
ERROR: 0031-250 task 36: Killed
ERROR: 0031-250 task 45: Killed
ERROR: 0031-250 task 23: Killed
ERROR: 0031-250 task 16: Killed
ERROR: 0031-250 task 29: Killed
ERROR: 0031-250 task 31: Killed
ERROR: 0031-250 task 30: Killed
ERROR: 0031-250 task 51: Killed
ERROR: 0031-250 task 50: Killed
ERROR: 0031-250 task 52: Killed
ERROR: 0031-250 task 49: Killed
ERROR: 0031-250 task 63: Killed
ERROR: 0031-250 task 53: Killed
ERROR: 0031-250 task 61: Killed

Could someone help me to figure out this problem? Thanks a lot!

Best
W

Wei Liu

jedwards

First - do you need to run ccsm3?   If you can update to a newer version of the model I think that you will find things considerably easier.

If you send the path to your case directory I'll take a look and see if I can figure out the problem.

 

- Jim

CESM Software Engineer

wliu.ucsd@...

Dear Jim

    Thanks a lot! I have to use CCSM3 since I have conducted a series of experiments on this model.

My case is under 

    /glade/u/home/wliu/ccsm/scripts/b3031.2xco2.wspd

 

Best

W

Wei Liu

jedwards

Sorry Wei but I don't see anything obvious.   You might try building with a different compiler version

change

modules.yellowstone:module load pgi/12.5

to

modules.yellowstone:module load pgi/13.9

CESM Software Engineer

wliu.ucsd@...

Dear Jim

   Thanks. I chage the modules as version 13.9 and rebult the case.

The model still fails to run, with error information

ccsm.stderr.421037 

ERROR: 0031-161  EOF on socket connection with node ys1315-ib

Wei Liu

wliu.ucsd@...

Dear Jim

     I just checked that other cases of mine based on CCSM3 run smoothly.

Thus I suspect the failure of this case is related to what I changed.

Basically, I modified some source codes in the coupler (CPL6) by

using some I/O commands to output some data from coupler, such as

 call cpl_iobin_create(loc_fn,trim(cpl_control_caseDesc))

 call cpl_iobin_open(loc_fn,12)

 call cpl_iobin_appendBun(12,date,bun_atm) !save atm input for coupling

 call cpl_iobin_close(12)

 All the changes are within

/glade/u/home/wliu/ccsm/scripts/b3031.2xco2.wspd/SourceMods/src.cpl

 

So the error in nodes is related to the I/O in coupler?

Also, the puzzle is that, this case, b3031.2xco2.wspd ran smoothly couple

months ago but cannot run now. Was there some changes in the yellowstone on nodes setting?

 

Thanks.

 

Best

W

 

Wei Liu

Log in or register to post comments

Who's new

  • praveenmaniyatt@...
  • arjunbabun11@...
  • lama@...
  • sisi393@...
  • 1658093099@...