Hi,
Please feel free to move this post to where it may need to be.
I am running CESMv2.1.3 on Plasma at the University of New Hampshire with G1850ECOIAF, no carbon isotopes, by3 grid resolution. I have attempted to add a new interior boundary forcing for hydrothermal vent input of dissolved organic carbon (DOC). Currently, the model has Fe from vents, so I followed the fesedflux code and duplicated it for my docventflux code, which gets added to interior tendency forcing doc_ind in marbl_interior_tendency_mod.F90 as
My edits follow (as best as I can) the FESEDFLUXIN (fesedflux) interior forcing, where I read in DOCSEDFLUXIN (docventflux) variable from a nc file that works. I checked it by renaming the variable to FESEDFLUXIN and running with out of the box model with several resubmits before making code edits. DOC (mmol/cm^3/s) is inputted into the ocean at the kmt where seafloor is 0-1 Myr.
The nc file is read in with user_nl_pop edit
My code edits are in 9 files spread out between namelist, yaml, pop, and marbl:
After uploading the code changes and running with DEBUG=T, my model builds and crashes in 50 seconds on day 00010102. In my cesm.log, I get a floating point divide by 0 error message with ww3 model error below (w3iogomd.f90:508).
I made no code changes in the wav model. Also, prior out-of-the-box runs worked before I edited the code.
I guess the question is what is the feed through in marbl and pop that causes the ww3 to crash? Is there any documentation on adding interior boundary forcings in pop model that gets fed into marbl?
I also note I ran with compset G1850ECO, and got the cesm.log error below after day 00010817. I could modify the Jint_Ctot_thres to log_noerror, but it would likely crash with the same ww3 error above. I got this same result when I duplicated this code but with DOC removal by vents. In the future, I will combine the source and sink of DOC from vents into one model.
Thank you,
James
Please feel free to move this post to where it may need to be.
I am running CESMv2.1.3 on Plasma at the University of New Hampshire with G1850ECOIAF, no carbon isotopes, by3 grid resolution. I have attempted to add a new interior boundary forcing for hydrothermal vent input of dissolved organic carbon (DOC). Currently, the model has Fe from vents, so I followed the fesedflux code and duplicated it for my docventflux code, which gets added to interior tendency forcing doc_ind in marbl_interior_tendency_mod.F90 as
interior_tendencies(doc_ind,k) = DOC_prod(k) * (c1 - DOCprod_refract) - DOC_remin(k) + docventflux(k)
My edits follow (as best as I can) the FESEDFLUXIN (fesedflux) interior forcing, where I read in DOCSEDFLUXIN (docventflux) variable from a nc file that works. I checked it by renaming the variable to FESEDFLUXIN and running with out of the box model with several resubmits before making code edits. DOC (mmol/cm^3/s) is inputted into the ocean at the kmt where seafloor is 0-1 Myr.
The nc file is read in with user_nl_pop edit
docventflux_input%filename = '/mnt/lustre/letscher/shared/SourceCode_gx3v7/docvent_gx3v7_2025_modified_with_DOCSEDFLUXIN.nc'
docventflux_input%file_fmt = 'nc'
docventflux_input%file_varname = 'DOCSEDFLUXIN'
docventflux_input%scale_factor = 1
My code edits are in 9 files spread out between namelist, yaml, pop, and marbl:
- build-namelist, namelist_defaults_pop.xml, namelist_definition_pop.xml,
- diagnostics_latest.yaml,
- ecosys_forcing_mod.F90,
- marbl_diagnostics_mod.F90, marbl_init_mod.F90, marbl_interface_private_types.F90, and marbl_interior_tendency_mod.F90.
After uploading the code changes and running with DEBUG=T, my model builds and crashes in 50 seconds on day 00010102. In my cesm.log, I get a floating point divide by 0 error message with ww3 model error below (w3iogomd.f90:508).
[cn-0015:219550:0:219550] Caught signal 8 (Floating point exception: floating-point divide by zero)
==== backtrace (tid: 219550) ====
0 /mnt/lustre/software/ucx/1.12.1/gcc/9.1.0/lib/libucs.so.0(ucs_handle_error+0x2a4) [0x2aaac248b8d4]
1 /mnt/lustre/software/ucx/1.12.1/gcc/9.1.0/lib/libucs.so.0(+0x2bad7) [0x2aaac248bad7]
2 /mnt/lustre/software/ucx/1.12.1/gcc/9.1.0/lib/libucs.so.0(+0x2bf6a) [0x2aaac248bf6a]
3 /mnt/lustre/letscher/jsl1063/jsl.302/bld/cesm.exe() [0xd29d44]
4 /mnt/lustre/letscher/jsl1063/jsl.302/bld/cesm.exe() [0xd08466]
5 /mnt/lustre/letscher/jsl1063/jsl.302/bld/cesm.exe() [0xcb3905]
6 /mnt/lustre/letscher/jsl1063/jsl.302/bld/cesm.exe() [0x426e10]
7 /mnt/lustre/letscher/jsl1063/jsl.302/bld/cesm.exe() [0x40c9e7]
8 /mnt/lustre/letscher/jsl1063/jsl.302/bld/cesm.exe() [0x425960]
9 /mnt/lustre/letscher/jsl1063/jsl.302/bld/cesm.exe() [0x425a6c]
10 /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaad7cc555]
11 /mnt/lustre/letscher/jsl1063/jsl.302/bld/cesm.exe() [0x4073a9]
=================================
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x2aaaad59d62f in ???
#1 0xd29d44 in __w3iogomd_MOD_w3outg
at /mnt/lustre/letscher/jsl1063/my_cesm_sandbox_invent/components/ww3/src/source/w3iogomd.f90:508
#2 0xd08465 in __w3wavemd_MOD_w3wave
at /mnt/lustre/letscher/jsl1063/my_cesm_sandbox_invent/components/ww3/src/source/w3wavemd.f90:859
#3 0xcb3904 in __wav_comp_mct_MOD_wav_run_mct
at /mnt/lustre/letscher/jsl1063/my_cesm_sandbox_invent/components/ww3/src/cpl_mct/wav_comp_mct.F90:885
#4 0x426e0f in __component_mod_MOD_component_run
at /mnt/lustre/letscher/jsl1063/my_cesm_sandbox_invent/cime/src/drivers/mct/main/component_mod.F90:705
#5 0x40c9e6 in __cime_comp_mod_MOD_cime_run
at /mnt/lustre/letscher/jsl1063/my_cesm_sandbox_invent/cime/src/drivers/mct/main/cime_comp_mod.F90:2751
#6 0x42595f in cime_driver
at /mnt/lustre/letscher/jsl1063/my_cesm_sandbox_invent/cime/src/drivers/mct/main/cime_driver.F90:125
#7 0x425a6b in main
at /mnt/lustre/letscher/jsl1063/my_cesm_sandbox_invent/cime/src/drivers/mct/main/cime_driver.F90:23
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 39 with PID 219550 on node cn-0015 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------
I made no code changes in the wav model. Also, prior out-of-the-box runs worked before I edited the code.
I guess the question is what is the feed through in marbl and pop that causes the ww3 to crash? Is there any documentation on adding interior boundary forcings in pop model that gets fed into marbl?
I also note I ran with compset G1850ECO, and got the cesm.log error below after day 00010817. I could modify the Jint_Ctot_thres to log_noerror, but it would likely crash with the same ww3 error above. I got this same result when I duplicated this code but with DOC removal by vents. In the future, I will combine the source and sink of DOC from vents into one model.
POP aborting...
Stopping in ecosys_driver:print_marbl_log
------------------------------------------------------------------------
max rss=118.6 MB
memory_write: model date = 00010831 0 memory = -0.00 MB (highwater) 118.38 MB (usage) (pe= 44 comps= GLC)
(Task 65, block 1) Message from (lon, lat) ( 344.605, 69.892), which is global (i,j) (14, 109). Level: 1
(Task 65, block 1) MARBL ERROR (marbl_diagnostics_mod:store_diagnostics_carbon_fluxes): abs(Jint_Ctot)= 0.122E-004 exceeds Jint_Ctot_thres= 0.317E-011
(Task 65, block 1) MARBL ERROR (marbl_diagnostics_mod:marbl_diagnostics_interior_tendency_compute): Error reported from store_diagnostics_carbon_fluxes
(Task 65, block 1) MARBL ERROR (marbl_interior_tendency_mod:marbl_interior_tendency_compute): Error reported from marbl_diagnostics_interior_tendency_compute()
(Task 65, block 1) MARBL ERROR (marbl_interface:interior_tendency_compute): Error reported from marbl_interior_tendency_compute()
(Task 65, block 1) MARBL ERROR (ecosys_driver:ecosys_driver_set_interior): Error reported from marbl_instances(1)%set_interior_forcing()
ERROR reported from MARBL library
------------------------------------------------------------------------
POP aborting...
Stopping in ecosys_driver:print_marbl_log
------------------------------------------------------------------------
max rss=118.6 MB
memory_write: model date = 00010816 0 memory = -0.00 MB (highwater) 121.81 MB (usage) (pe= 28 comps= LND)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 120 in communicator MPI_COMM_WORLD
with errorcode 10922.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
max rss=118.6 MB
memory_write: model date = 00010901 0 memory = -0.00 MB (highwater) 118.38 MB (usage) (pe= 44 comps= GLC)
max rss=118.6 MB
memory_write: model date = 00010817 0 memory = -0.00 MB (highwater) 121.81 MB (usage) (pe= 28 comps= LND)
[cn-0015:222391] 10 more processes have sent help message help-mpi-api.txt / mpi-abort
Thank you,
James