Code:
	
	$ ./describe_version
            ccs_config at tag ccs_config_cesm0.0.109
M                      HEAD detached at 797acd7
                      Changes not staged for commit:
                        (use "git add <file>..." to update what will be committed)
                        (use "git restore <file>..." to discard changes in working directory)
                          modified:   machines/config_batch.xml
                          modified:   machines/config_machines.xml
                      no changes added to commit (use "git add" and/or "git commit -a")
                 share at tag share1.0.19
                  cime at tag cime6.0.246
                   mct at tag MCT_2.11.0
            mpi-serial at tag MPIserial_2.5.0
                   cam at tag cam6_3_162
M                      HEAD detached at ab476f9b
                      Changes not staged for commit:
                        (use "git add/rm <file>..." to update what will be committed)
                        (use "git restore <file>..." to discard changes in working directory)
                          deleted:    cime_config/testdefs/testmods_dirs/cam/outfrq9s_waccm_ma_mam4/shell_commands
                          deleted:    cime_config/testdefs/testmods_dirs/cam/outfrq9s_waccm_ma_mam4/user_nl_cam
                          deleted:    cime_config/testdefs/testmods_dirs/cam/outfrq9s_waccm_ma_mam4/user_nl_clm
                      no changes added to commit (use "git add" and/or "git commit -a")
                   ww3 at tag ww3i_0.0.2
                   rtm at tag rtm1_0_79
                pysect at tag 3.2.2
                mosart at tag mosart1_0_49
             mizuroute at tag cesm-coupling.n02_v2.1.2
                   fms at tag fi_240516
            parallelio at tag pio2_6_2
                 cdeps at tag cdeps1.0.37
                 cmeps at tag cmeps0.14.63
                  cice at tag cesm_cice6_5_0_9
                  cism at tag cismwrap_2_2_001
                   clm at tag ctsm5.2.007
                   mom at tag mi_240522
    testfails = 0, local mods = 2, needs updates 0
    The submodules labeled with 'M' above are not in a clean state.
    The following are options for how to proceed:
    (1) Go into each submodule which is not in a clean state and issue a 'git status'
        Either revert or commit your changes so that the submodule is in a clean state.
    (2) use the --force option to git-fleximod
    (3) you can name the particular submodules to update using the git-fleximod command line
    (4) As a last resort you can remove the submodule (via 'rm -fr [directory]')
        then rerun git-fleximod update../create_newcase --case /capstor/scratch/cscs/jbuzan/cesm3_0_beta01/cases/intel_cesm3_0_beta01_F2000climo_x025_O6144_01 --compiler intel --compset F2000climo --res ne120pg3_ne120pg3_mt13 --mach eiger --driver nuopc --mpilib mpich --run-unsupported
env_mach_pes is attached. 48 nodes x 128 cores per node (Eiger is almost the same machine as Derecho).
I set up the core distribution as attached.
Describe your problem or question:
The simulation always seems to fail to execute. I've tried less nodes, but run into wallclock issues for a 20 day test. I used the following formula to determine trying to use 48 nodes.
ne30pg3_ne30pg3_mg17 grid executes successfully with 3 nodes (384 cores).
ne120pg3_ne120pg3_mt13 is approximately 4x4 higher resolution, and I multiple the 3 nodes by 16 to get 48 nodes.
I get the error below.
Thanks,
-Jonathan
		Code:
	
	jbuzan@eiger-ln002:/capstor/scratch/cscs/jbuzan/cesm3_0_beta01/cases/intel_cesm3_0_beta01_F2000climo_x025_O6144_01 [19:18:46] $ cat /capstor/scratch/cscs/jbuzan/cesm3_0_beta01/output/intel_cesm3_0_beta01_F2000climo_x025_O6144_01/run/cesm.log.3303936.240902-185339
Mon Sep  2 18:56:58 2024: [PE_3896]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=94, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:56:59 2024: [PE_5872]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=107, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:56:59 2024: [PE_2169]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=104, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:56:59 2024: [PE_624]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=103, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:56:59 2024: [PE_1592]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=105, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:56:59 2024: [PE_5936]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=97, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:57:00 2024: [PE_3696]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=103, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:57:00 2024: [PE_378]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=98, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:57:00 2024: [PE_3706]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=103, pes_this_node=128, timeout=180 secs
Mon Sep  2 18:57:00 2024: [PE_3706]:_pmi_mmap_init:Failed to setup PMI mmap.Mon Sep  2 18:57:00 2024: [PE_3706]:globals_init:_pmi_mmap_init returned -1
MPICH ERROR [Rank 0] [job id unknown] [Mon Sep  2 18:57:00 2024] [nid001420] - Abort(1092879) (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(170):
MPID_Init(441).......:
MPIR_pmi_init(110)...: PMI_Init returned 1
aborting job:
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(170):
MPID_Init(441).......:
MPIR_pmi_init(110)...: PMI_Init returned 1
Mon Sep  2 18:57:00 2024: [PE_4922]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=91, pes_this_node=128, timeout=180 secs
srun: error: nid001420: task 3706: Exited with exit code 255
srun: Terminating StepId=3303936.0
slurmstepd: error: *** STEP 3303936.0 ON nid001117 CANCELLED AT 2024-09-02T18:57:00 ***
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source            
libpthread-2.31.s  00001468BCEBE910  Unknown               Unknown  Unknown
libpmi.so.0.6.0    00001468B51A7755  Unknown               Unknown  Unknown
libpmi.so.0.6.0    00001468B51A7855  Unknown               Unknown  Unknown
libpmi.so.0        00001468B51A7C14  _pmi_mmap_init        Unknown  Unknown
libpmi.so.0        00001468B51A252C  _pmi_init             Unknown  Unknown
libpmi.so.0        00001468B51AF706  PMI2_Init             Unknown  Unknown
libmpi_intel.so.1  00001468B9632A11  Unknown               Unknown  Unknown
libmpi_intel.so.1  00001468B96384DD  Unknown               Unknown  Unknown
libmpi_intel.so.1  00001468B80C3D7E  Unknown               Unknown  Unknown
libmpi_intel.so.1  00001468B80C4304  PMPI_Init_thread      Unknown  Unknown
libmpifort_intel.  00001468B9FD392F  MPI_INIT_THREAD       Unknown  Unknown
cesm.exe           0000000000436E1B  MAIN__                     40  esmApp.F90
cesm.exe           0000000000425DCD  Unknown               Unknown  Unknown
libc-2.31.so       00001468B73C724D  __libc_start_main     Unknown  Unknown
cesm.exe           0000000000425CFA  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)