Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

error occured when running case

tckm@whu_edu_cn

New Member
Hi,
Excuse me for asking help. I met a problem when running case with f19_f19_mg16 res、FXSD compset on my own ported mahcine. The case always aborted when running after some times. The cesm.log shows :


GHP7 NOT CONVERGING FOR PRESS 3.55E-10 NaN
GHP7 NOT CONVERGING FOR PRESS 3.55E-10 -1.78E-02
GHP7 NOT CONVERGING FOR PRESS 3.55E-10 -1.93E-01
GHP7 NOT CONVERGING FOR PRESS 3.55E-10 NaN
GHP7 NOT CONVERGING FOR PRESS 3.55E-10 -1.96E-03
BalanceCheck: soil balance error nstep = 3177 point = 12505 imbalance = -0.000000 W/m2

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
--------------------------------------------------------------------------
A process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

Local host: [[60281,1],151] (PID 10182)

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
#0 0x2B0363C7E697
#1 0x2B0363C7ECDE
#2 0x2B03644DB27F
#3 0x2B0376FB425C
#4 0x2B0364FE8F3B
#5 0x2B03639B64E4
#6 0x2B0363A034A9
#7 0x2B03639C9A4F
#8 0x2B036375716A
#9 0x80FFC2 in __edyn_mpi_MOD_mp_magpole_2d
#10 0x826C10 in __edynamo_MOD_dynamo
#11 0x7E853D in __dpie_coupling_MOD_d_pie_coupling
#12 0x5566E1 in __ionosphere_interface_MOD_ionosphere_run2
#13 0x4CBEBC in __cam_comp_MOD_cam_run2
#14 0x4C33C1 in __atm_comp_mct_MOD_atm_run_mct
#15 0x4323F2 in __component_mod_MOD_component_run
#16 0x418CB3 in __cime_comp_mod_MOD_cime_run
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 151 with PID 10182 on node n0236 exited on signal 11 (Segmentation fault).

Then i turn the dubug on ,and receive the new cesm.log attached.
Anybody knows how to solve this problem. Thanks for helping me.
 

Attachments

  • cesm.log.txt
    126.6 KB · Views: 12
Top