running CCSM3 on yellowstone

Hi, there

I have been stuck in the CCSM3 running on yellowstone, since my case fails due to
the tasks are unexpectedly killed. I tried to submit the job several times, and
the error information varies.

Below are contents of two error files:

ccsm.stderr.290482
ERROR: 0031-161 EOF on socket connection with node ys0157-ib
ERROR: 0031-028 pm_mgr_handle; can't send a signal message to remote nodes
ERROR: 0031-619 No such file or directory
ERROR: 0031-028 pm_mgr_handle; can't send a signal message to remote nodes
ERROR: 0031-619 No such file or directory

ccsm.stderr.232837
ERROR: 0031-250 task 17: Killed
ERROR: 0031-250 task 6: Killed
ERROR: 0031-250 task 11: Killed
ERROR: 0031-250 task 39: Killed
ERROR: 0031-250 task 32: Killed
ERROR: 0031-250 task 35: Killed
ERROR: 0031-250 task 46: Killed
ERROR: 0031-250 task 43: Killed
ERROR: 0031-250 task 33: Killed
ERROR: 0031-250 task 36: Killed
ERROR: 0031-250 task 45: Killed
ERROR: 0031-250 task 23: Killed
ERROR: 0031-250 task 16: Killed
ERROR: 0031-250 task 29: Killed
ERROR: 0031-250 task 31: Killed
ERROR: 0031-250 task 30: Killed
ERROR: 0031-250 task 51: Killed
ERROR: 0031-250 task 50: Killed
ERROR: 0031-250 task 52: Killed
ERROR: 0031-250 task 49: Killed
ERROR: 0031-250 task 63: Killed
ERROR: 0031-250 task 53: Killed
ERROR: 0031-250 task 61: Killed

Could someone help me to figure out this problem? Thanks a lot!

Best
W



 

jedwards

CSEG and Liaisons
Staff member
First - do you need to run ccsm3?   If you can update to a newer version of the model I think that you will find things considerably easier.If you send the path to your case directory I'll take a look and see if I can figure out the problem. - Jim
 
Dear Jim    Thanks a lot! I have to use CCSM3 since I have conducted a series of experiments on this model.My case is under     /glade/u/home/wliu/ccsm/scripts/b3031.2xco2.wspd BestW
 

jedwards

CSEG and Liaisons
Staff member
Sorry Wei but I don't see anything obvious.   You might try building with a different compiler versionchange modules.yellowstone:module load pgi/12.5tomodules.yellowstone:module load pgi/13.9
 
Dear Jim   Thanks. I chage the modules as version 13.9 and rebult the case.The model still fails to run, with error informationccsm.stderr.421037 ERROR: 0031-161  EOF on socket connection with node ys1315-ib
 
Dear Jim     I just checked that other cases of mine based on CCSM3 run smoothly.Thus I suspect the failure of this case is related to what I changed.Basically, I modified some source codes in the coupler (CPL6) byusing some I/O commands to output some data from coupler, such as call cpl_iobin_create(loc_fn,trim(cpl_control_caseDesc)) call cpl_iobin_open(loc_fn,12) call cpl_iobin_appendBun(12,date,bun_atm) !save atm input for coupling call cpl_iobin_close(12) All the changes are within/glade/u/home/wliu/ccsm/scripts/b3031.2xco2.wspd/SourceMods/src.cpl So the error in nodes is related to the I/O in coupler?Also, the puzzle is that, this case, b3031.2xco2.wspd ran smoothly couplemonths ago but cannot run now. Was there some changes in the yellowstone on nodes setting? Thanks. BestW 
 
Back
Top