Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

MARBL WARNING

tangtao2009

Tao Tang
New Member
Dear community

I am doing an idealized CO2x4 experiment with a modified soil physics by using B1850 compset. The model stopped after 41 years and 5months. No errors found in the log file of each component, but MARBL WARNING are shown in the cesm.log file, along with the following info:

-1:MPT ERROR: MPI_COMM_WORLD rank 713 has terminated without calling MPI_Finalize()
-1: aborting job


Here is the path for the scratch folder: /glade/scratch/taotang/LUMIP_indv_soil_CO2X4/run

pretty similar to this post: MARBL ERROR

I changed the dt_count to 48 in the user_nl_pop (the default value is 24), and branched from that simulation. The model went well for another 35 years, but extremely slow then, only 11 months in 6 hours limit. Then the simulation stopped again, with the MARBL WARNING:

-1:MPT ERROR: MPI_COMM_WORLD rank 257 has terminated without calling MPI_Finalize()
-1: aborting job


Here is the path for the scratch folder: /glade/scratch/taotang/LUMIP_indv_soil_CO2X4_branch/run

The paths for the two cases are:

/glade/u/home/taotang/cases/LUMIP_indv_soil_CO2X4/
/glade/u/home/taotang/cases/LUMIP_indv_soil_CO2X4_branch/

I am not sure if I can further modify dt_count to 72, but even if I can, the model would be slower, and may stop again.

I read some posts saying that modifying PE layout might be a choice. However, I tried to do so and modified env_mach_pes.xml file with the following settings:

./xmlchange NTASKS_ATM=1152,NTHRDS_ATM=3,ROOTPE_ATM=0
./xmlchange NTASKS_ICE=360,NTHRDS_ICE=3,ROOTPE_ICE=792
./xmlchange NTASKS_LND=792,NTHRDS_LND=3,ROOTPE_LND=0
./xmlchange NTASKS_ROF=792,NTHRDS_ROF=3,ROOTPE_ROF=0
./xmlchange NTASKS_CPL=1152,NTHRDS_CPL=3,ROOTPE_CPL=0
./xmlchange NTASKS_OCN=512,NTHRDS_OCN=3,ROOTPE_OCN=1152
./xmlchange NTASKS_GLC=1152,NTHRDS_GLC=3,ROOTPE_GLC=0
./xmlchange NTASKS_WAV=28,NTHRDS_WAV=3,ROOTPE_WAV=1664


Unfortunately, the model never got run, queued for a whole day.

Here is the case path: /glade/u/home/taotang/cases/LUMIP_indv_soil_CO2X4_branch2

Any help is appreciated.
Tao
 

mlevy

Michael Levy
CSEG and Liaisons
Staff member
Hi Tao,

Where are the history files associated with this run? Without seeing them, this is mostly speculation... but a common issue with the 4xCO2 runs (and mentioned in the issue you linked) is that sea ice melts, and exposes open ocean in small grid cells off the coast of Greenland (POP uses a rotated lat-lon grid, so the "north pole" where the grid cells converge is in Greenland rather than the Arctic Ocean). The MARBL error you are seeing in your log file

Code:
713:(Task 137, block 1) Message from (lon, lat) ( 344.935,  83.941), which is global (i,j) (136, 376). Level: 24
713:(Task 137, block 1) MARBL WARNING (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) it = 4
713:(Task 137, block 1) MARBL WARNING (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) x1,f =  0.2697884E-008-0.6145388E-004
713:(Task 137, block 1) MARBL WARNING (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) x2,f =  0.4275857E-005-0.2949289E-002
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:drtsafe): bounding bracket for pH solution not found
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) dic =  0.8505160E+003
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) ta =  0.3356605E+004
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) pt =  0.0000000E+000
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) sit =  0.0000000E+000
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) temp =  0.7549034E+002
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:drtsafe): (marbl_co2calc_mod:drtsafe) salt =  0.5071460E+002
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:comp_htotal): Error reported from drtsafe
713:(Task 137, block 1) MARBL ERROR (marbl_co2calc_mod:marbl_co2calc_interior): Error reported from comp_htotal()
713:(Task 137, block 1) MARBL ERROR (marbl_interior_tendency_mod:compute_carbonate_chemistry): Error reported from marbl_co2calc_interior() with dic_alt_co2
713:(Task 137, block 1) MARBL ERROR (marbl_interior_tendency_mod:marbl_interior_tendency_compute): Error reported from compute_carbonate_chemistry()
713:(Task 137, block 1) MARBL ERROR (marbl_interface:interior_tendency_compute): Error reported from marbl_interior_tendency_compute()
713:(Task 137, block 1) MARBL ERROR (ecosys_driver:ecosys_driver_set_interior): Error reported from marbl_instances(1)%set_interior_forcing()
713:ERROR reported from MARBL library

Is consistent with this hypothesis, in that the grid cell causing the error is off the northeast coast of Greenland. I suspect that the ice fraction there is typically very large, but has gotten small in your 4xCO2 run. Unfortunately, the best fix we can offer is to continue to adjust dt_count. As you've noted, this will slow the model down -- POP should use a consistent amount of wallclock time for each time step, but by increasing dt_count you would be reducing the time step and therefore require more time steps per model year.
 

tangtao2009

Tao Tang
New Member
Hi Mike

Thanks for your reply. I have resolved this issue by changing the PE layout just with the following command:

./pelayout --set-ntasks 2160

The model runs well now and much faster than before, but also costs more.

Thank again for your input.
 
Top