Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

T85_gx1v3 run in Cray X1E

I am trying to run CCSM3 with T85_gx1v3 B dataset in Cray X1E.
T42_gx1v3 B run was successful.
For T85, the compiling was Ok but running has some problem.
Could you give me some comments or suggestions to solve the problem?

Thanks,

Bong-Gen Song

The run log is:
.
.
.
(main) =========================================================================
(main) CCSM Coupler, version 6 (cpl6)
(main) CVS tag $Name$
(main) date & time: 2008-01-24 01:57:54
(main) =========================================================================
(cpl_comm_init) setting up communicators, name = cpl
===================================
(cpl_comm_init) cpl_comm_comp, size: 5 2
(cpl_comm_init) comm world : comm,npe,pid 4 56 1
(cpl_comm_init) comm component: comm,npe,pid 5 2 1
(cpl_comm_init) comm world pe0: atm,ice,lnd,ocn,cpl,me 40 2 10 16 0 0
(cpl_comm_init) mph cid : atm,ice,lnd,ocn,cpl,me 1 2 3 4 5 5
(shr_msg_chdir) read cpl_stdio.nml, changed cwd to /san_home/users/songbg/nohara/CCSM3/CCSM.T85.TEST/cpl
(shr_msg_chdir) read ice_stdio.nml, changed cwd to /san_home/users/songbg/nohara/CCSM3/CCSM.T85.TEST/ice
(shr_msg_chStdIn) read cpl_stdio.nml, unit 5 connected to cpl.stdin
(shr_msg_chStdIn) read ice_stdio.nml, unit 5 connected to ice.stdin
sh: msread: not found

Traceback for process 14745435(ssp mode) apid 14745435.0 on node 0
__open+0x018C (0x1351194) at open.c
_open+0x0290 (0x13517F8) at open.c:58
_do_open+0x0240 (0x14AC3E0) at fopn.c:983
_f_opn+0x07B0 (0x14AAD50) at fopn.c:434
_f_open+0x1A6C (0x13A2EEC) at open.c:620
__OPN+0x0C68 (0x1229CC0) at opn.c:389
_OPEN+0x00D0 (0x122A068) at opn.c:426
shr_msg_chstdout@shr_msg_mod_+0x0928 (0x116A8C8) at shr_msg_mod.f90:255
shr_msg_dirio@shr_msg_mod_+0x00D4 (0x11685D4) at shr_msg_mod.f90:148
cpl_+0x14A8 (0x110F008) at main.f90:174
Fault: memory fault on accelerated page: 0xE0FF5B
Segmentation fault (core dumped)
 
Hi,

I solved the problem.
Well, the Cray X1E I used is not "Phoenix" machine.
I use KMA/Cray X1E (Korean Meteorological Administration).

The problem were;

1. Not fully uploaded the inputdata. ^^;
So run log showed "sh: msread: not found".

2. Load balance in Cray X1E: I tried to use different CPU sets in Cray X1E.
For example,
(cpl: 4, cam: 32, clm: 4, csim: 8, pop: 16) total 64 was OK.
(cpl: 2, cam: 32, clm: 4, csim: 8, pop: 16) total 62 was OK. (slower than the above)
(cpl: 2, cam: 40, clm: 2, csim: 4, pop: 16) total 64 didn't work. (bus error)
(cpl: 8, cam: 64, clm: 8, csim: 16, pop: 32) total 128 didn't work. (bus error)

I'm still trying to find a good combination of CPUs (more than 100 or 200) but it is difficult without using the timing program in CCSM3 scripts.

Thank you.

Bong-Geun Song
 
Top