Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

pop crash on intel xeon cluster

I am a CCSM3 freshman, new to this list, and I am deeply sorry if anybody gets annoyed by a stupid newbie question.

Here is the problem: I have installed CCSM3 on a Xeon cluster with Portland Group compilers and Scali MPI. Things work fine, until the ocean tries to write out a restart file, then it simply crashes with SIGSEGV(11) and I cannot find any trace in any logfile of what goes wrong. If I simply switch off tavg output in pop_in namelist ( tavg_freq_opt = 'never' ) the run still crashes, but if I put in a very long averaging period (100 yrs) then everything works fine, but I don't get the ocean model output of course.

After some digging I found that adding a stupid print statement in models/ocn/pop/source/tavg.F does the trick:

  • if (lreset_tavg ) then
    if (moc) then

    print *,'tavg_if_moc',my_task,master_task
    & ,lreset_tavg,moc,gm_bolus

    if (gm_bolus) then
    call compute_moc ( TAVG_3D(:,:,:,tavg_bufloc(ntavg_WVEL )),
    & TAVG_3D(:,:,:,tavg_bufloc(ntavg_VVEL )),
    & W_I = TAVG_3D(:,:,:,tavg_bufloc(ntavg_WISOP)),
    & V_I = TAVG_3D(:,:,:,tavg_bufloc(ntavg_VISOP)) )
    else
    call compute_moc ( TAVG_3D(:,:,:,tavg_bufloc(ntavg_WVEL)),
    & TAVG_3D(:,:,:,tavg_bufloc(ntavg_VVEL)) )
    endif
    endif
    endif

With the print statement at this location everything works fine, and I get the output from pop at the desired frequency. But, adding print statements is not the fine art of programming, so I wonder if anybody out there has experience with that kind of error and came up with a real solution. Any help is greatly appreciated.

Klaus
 

murphys

Member
Our ocean liason has never seen this error before. As much as she
hates this approach, she suggests that you go forward
with the print statement embedded in your code and assume a compiler
problem. Unfortunately, there's no way to diagnose this for you.

An alternative you might try turning off the "MOC" computations
(meridonal overturning circulation). Of course, you will be losing
a valuable diagnostic, but if you will not be using this diagnostic
anyway, it might be a more esthetically pleasing
way to proceed. To do this, you should:

1) copy models/ocn/pop/input_templates/gx3v5_pop_in
(or gx1v3_pop_in, depending on resolution)
into your case directory SourceMods/src.pop
2) edit the namelist transports_nml in the
SourceMods/src.pop/gx3v5_pop_in such that
moc = .false. (this is the last namelist in the file)
3) if this does not eliminate the problem, you might also
try setting n_heat_trans and n_salt_trans to .false.

Again, do this only if these diagnostics are unimportant .
 
Top