Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

running CCSM3 on yellowstone

Hi, there

I have been stuck in the CCSM3 running on yellowstone, since my case fails due to
the tasks are unexpectedly killed. I tried to submit the job several times, and
the error information varies.

Below are contents of two error files:

ccsm.stderr.290482
ERROR: 0031-161 EOF on socket connection with node ys0157-ib
ERROR: 0031-028 pm_mgr_handle; can't send a signal message to remote nodes
ERROR: 0031-619 No such file or directory
ERROR: 0031-028 pm_mgr_handle; can't send a signal message to remote nodes
ERROR: 0031-619 No such file or directory

ccsm.stderr.232837
ERROR: 0031-250 task 17: Killed
ERROR: 0031-250 task 6: Killed
ERROR: 0031-250 task 11: Killed
ERROR: 0031-250 task 39: Killed
ERROR: 0031-250 task 32: Killed
ERROR: 0031-250 task 35: Killed
ERROR: 0031-250 task 46: Killed
ERROR: 0031-250 task 43: Killed
ERROR: 0031-250 task 33: Killed
ERROR: 0031-250 task 36: Killed
ERROR: 0031-250 task 45: Killed
ERROR: 0031-250 task 23: Killed
ERROR: 0031-250 task 16: Killed
ERROR: 0031-250 task 29: Killed
ERROR: 0031-250 task 31: Killed
ERROR: 0031-250 task 30: Killed
ERROR: 0031-250 task 51: Killed
ERROR: 0031-250 task 50: Killed
ERROR: 0031-250 task 52: Killed
ERROR: 0031-250 task 49: Killed
ERROR: 0031-250 task 63: Killed
ERROR: 0031-250 task 53: Killed
ERROR: 0031-250 task 61: Killed

Could someone help me to figure out this problem? Thanks a lot!

Best
W



 

jedwards

CSEG and Liaisons
Staff member
First - do you need to run ccsm3?   If you can update to a newer version of the model I think that you will find things considerably easier.If you send the path to your case directory I'll take a look and see if I can figure out the problem. - Jim
 
Dear Jim    Thanks a lot! I have to use CCSM3 since I have conducted a series of experiments on this model.My case is under     /glade/u/home/wliu/ccsm/scripts/b3031.2xco2.wspd BestW
 

jedwards

CSEG and Liaisons
Staff member
Sorry Wei but I don't see anything obvious.   You might try building with a different compiler versionchange modules.yellowstone:module load pgi/12.5tomodules.yellowstone:module load pgi/13.9
 
Dear Jim   Thanks. I chage the modules as version 13.9 and rebult the case.The model still fails to run, with error informationccsm.stderr.421037 ERROR: 0031-161  EOF on socket connection with node ys1315-ib
 
Dear Jim     I just checked that other cases of mine based on CCSM3 run smoothly.Thus I suspect the failure of this case is related to what I changed.Basically, I modified some source codes in the coupler (CPL6) byusing some I/O commands to output some data from coupler, such as call cpl_iobin_create(loc_fn,trim(cpl_control_caseDesc)) call cpl_iobin_open(loc_fn,12) call cpl_iobin_appendBun(12,date,bun_atm) !save atm input for coupling call cpl_iobin_close(12) All the changes are within/glade/u/home/wliu/ccsm/scripts/b3031.2xco2.wspd/SourceMods/src.cpl So the error in nodes is related to the I/O in coupler?Also, the puzzle is that, this case, b3031.2xco2.wspd ran smoothly couplemonths ago but cannot run now. Was there some changes in the yellowstone on nodes setting? Thanks. BestW 
 
Top