model crashed with MARBL warning

Tao Tang
I am doing an idealized CO2x4 experiment with a modified soil physics (I used separated soil column instead shared soil column for each PFT) by using B1850 compset. The model stopped after 41 years and 5months. No errors found in the log file of each component, but MARBL WARNING are shown in the cesm.log file, along with the following info:

-1:MPT ERROR: MPI_COMM_WORLD rank 713 has terminated without calling MPI_Finalize()
-1: aborting job

Here is the path for the scratch folder: /glade/scratch/taotang/LUMIP_indv_soil_CO2X4/run

pretty similar to this post: MARBL ERROR

I changed the dt_count to 48 in the user_nl_pop (the default value is 24). The model went well for another 35 years, but extremely slow then, only 11 months in 6 hours limit. Then the simulation stopped again, with the following MARBL WARNING:

-1:MPT ERROR: MPI_COMM_WORLD rank 257 has terminated without calling MPI_Finalize()
-1: aborting job

Here is the path for the scratch folder: /glade/scratch/taotang/LUMIP_indv_soil_CO2X4_branch/run

The paths for the two cases are:


I am not sure if I can further modify dt_count to 72, but even if I can, the model would be slower, and may stop again.

I read some posts saying that modifying PE layout might be a choice. However, I tried to do so and modified env_mach_pes.xml file with the following settings:

./xmlchange NTASKS_ICE=360,NTHRDS_ICE=3,ROOTPE_ICE=792
./xmlchange NTASKS_OCN=512,NTHRDS_OCN=3,ROOTPE_OCN=1152
./xmlchange NTASKS_WAV=28,NTHRDS_WAV=3,ROOTPE_WAV=1664

Unfortunately, the model never got run, queued for a whole day.

Here is the case path: /glade/u/home/taotang/cases/LUMIP_indv_soil_CO2X4_branch2
Really have no idea how to resolve this issue.

Any help or hint is appreciated.


Erik Kluzek
MARBL is the BGC part of the ocean model, so I'm moving this to that forum.
