As an evaluation for porting CESM to macOS 15.2 on Apple Silicon (arm64 Darwin), I am following the CAM f2000 control exercise under the CESM tutorial. 1: Control case: F2000climo — CESM Tutorial. This is on a single node with 16 performance cores @ 3.7 GHz.
I've installed netcdf, hdf5, openmpi, python3, gfortran, LAPACK, openBLAS, etc. via homebrew. Versions and formulas are in brewlist.txt.
ESMF is built and installed with version: v8.8.0b00-240-g7228e87d3c. Python is version 3.13.1.
My CESM directory tree is as follows:
What version of the code are you using?
git-describe:
cesm3_0_beta04
./bin/git-fleximod status: (prompt text in the forum template still suggests checkout_externals!)
Result in git-fleximod-status.txt.
I selected this branch as none of the CESM2 branches I selected supported python3.
Have you made any changes to files in the source tree?
- I have created a file ~/.cime/config_machines.xml that passes XML validation. It is attached below.
- I have modified env_mach_specific.py under CIME to comment out resource.setrlimit() since modifying RLIMIT_STACK is evidently not supported on macOS (see python3 resource.setrlimit strange behaviour under macOS · Issue #78783 · python/cpython and other issues).
Describe every step you took leading up to the problem:
- Checkout CESM at the specified branch using git-fleximod
- Make modifications to CIME as mentioned above
- Create .cime/config_machine.xml as attached
- Confirm that MPI processes can ssh into localhost without issue:
I could not find any issues on ESMF with NTASKS mentioned: Issues · esmf-org/esmf
I looked into changing NTASKS_xyz for ATM/ICE/LND etc. to 1 (from 32) via xmlchange, but the same log file showed another error. Is this an ESMF issue?
I found this thread on a similar issue, but did not find any difference in behavior when I changed mpi-serial to mpt or the number of available PEs per node to 1.
If this is a port to a new machine: Please attach any files you added or changed for the machine port (e.g., config_compilers.xml, config_machines.xml, and config_batch.xml) and tell us the compiler version you are using on this machine.
Please attach any log files showing error messages or other useful information.
Build logs and PET0.ESMF_LogFile attached.
Describe your problem or question:
Case submission on a tutorial CAM test case fails within seconds due to an ESMF error.
I've installed netcdf, hdf5, openmpi, python3, gfortran, LAPACK, openBLAS, etc. via homebrew. Versions and formulas are in brewlist.txt.
ESMF is built and installed with version: v8.8.0b00-240-g7228e87d3c. Python is version 3.13.1.
My CESM directory tree is as follows:
phansel@CRUMPET CESM_Data % pwd
/Users/phansel/Public/CESM_Data
phansel@CRUMPET CESM_Data % tree -L 1
.
├── esmf
├── f2000_control
├── hosts
├── inputdata
├── my_cesm_sandbox
└── outputdata
5 directories, 1 file
What version of the code are you using?
git-describe:
cesm3_0_beta04
./bin/git-fleximod status: (prompt text in the forum template still suggests checkout_externals!)
Result in git-fleximod-status.txt.
I selected this branch as none of the CESM2 branches I selected supported python3.
Have you made any changes to files in the source tree?
- I have created a file ~/.cime/config_machines.xml that passes XML validation. It is attached below.
- I have modified env_mach_specific.py under CIME to comment out resource.setrlimit() since modifying RLIMIT_STACK is evidently not supported on macOS (see python3 resource.setrlimit strange behaviour under macOS · Issue #78783 · python/cpython and other issues).
- I have modified CIME/Tools/Makefile to force the inclusion of LAPACK. Without this, case.build returns a number of errors related to LAPACK FORTRAN functions. I'm aware that this is not the right place to link LAPACK, but it's the first place I found that worked since ~/.cime/config_compilers.xml is no longer applicable.diff --git a/CIME/XML/env_mach_specific.py b/CIME/XML/env_mach_specific.py
- resource.setrlimit(attr, limits)
+ #resource.setrlimit(attr, limits)
Errors that appear without including lapack:diff --git a/CIME/Tools/Makefile b/CIME/Tools/Makefile
-FoX_LIBS := -L$(SHAREDLIBROOT)/$(SHAREDPATH)/CDEPS/fox/lib -lFoX_dom -lFoX_sax -lFoX_utils -lFoX_fsys -lFoX_wxml -lFoX_common -lFoX_fsys
+FoX_LIBS := -L$(SHAREDLIBROOT)/$(SHAREDPATH)/CDEPS/fox/lib -lFoX_dom -lFoX_sax -lFoX_utils -lFoX_fsys -lFoX_wxml -lFoX_common -lFoX_fsys -llapack
Undefined symbols for architecture arm64:
"_dgbsv_", referenced from:
___lapack_interfaces_MOD_dgbsv_wrap in libatm.a[272](lapack_interfaces.o)
[continued]
"_strmv_", referenced from:
___lapack_interfaces_MOD_strmv_wrap in libatm.a[272](lapack_interfaces.o)
ld: symbol(s) not found for architecture arm64
collect2: error: ld returned 1 exit status
gmake: *** [/Users/phansel/Public/CESM_Data/example3/Tools/Makefile:935: ../../cesm.exe] Error 1
Describe every step you took leading up to the problem:
- Checkout CESM at the specified branch using git-fleximod
- Make modifications to CIME as mentioned above
- Create .cime/config_machine.xml as attached
- Confirm that MPI processes can ssh into localhost without issue:
- Set environment variablesssh localhost
- Create new case:export CIME_MACHINE=crumpet
export NETCDF_PATH=/opt/homebrew/Cellar/netcdf/4.9.2_2
export NETCDF_FORTRAN_PATH=/opt/homebrew/Cellar/netcdf-fortran/4.6.1_1
export NETCDF_C_PATH=/opt/homebrew/Cellar/netcdf/4.9.2_2
export ESMFMKFILE=/Users/phansel/Public/CESM_Data/esmf/lib/libO/Darwin.gfortranclang.64.mpiuni.default/esmf.mk
- Set up the case per defaultscd $CIMEROOT
./scripts/create_newcase --case $CESMDATAROOT/f2000_control --compset F2000climo --res f19_f19_mg17
- Build the casecd $CESMDATAROOT/f2000_control
./case.setup
- Download input data - this can only be done after building the case?./case.build
- Build again (just to be sure)./check_input_data --download
- Submit to queue (which is none)./case.build
The case stops running after ~1 second and exits with this error../case.submit
The log file specified has been moved, but is otherwise not informative:run command is mpirun /Users/phansel/Public/CESM_Data/outputdata/f2000_control/bld/cesm.exe >> cesm.log.$LID 2>&1
Exception from case_run: ERROR: RUN FAIL: Command 'mpirun /Users/phansel/Public/CESM_Data/outputdata/f2000_control/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /Users/phansel/Public/CESM_Data/outputdata/f2000_control/run/cesm.log.250112-173750
Looking further into the outputdata/f2000_control/run directory, an ESMF logfile is present:phansel@CRUMPET f2000_control % cat /Users/phansel/Public/CESM_Data/outputdata/archive/f2000_control/logs/cesm.log.250112-173750
(t_initf) Read in prof_inparm namelist from: drv_in
(t_initf) Using profile_disable= F
(t_initf) profile_timer= 4
(t_initf) profile_depth_limit= 4
(t_initf) profile_detail_limit= 2
(t_initf) profile_barrier= F
(t_initf) profile_outpe_num= 1
(t_initf) profile_outpe_stride= 0
(t_initf) profile_single_file= F
(t_initf) profile_global_stats= T
(t_initf) profile_ovhd_measurement= F
(t_initf) profile_add_detail= F
(t_initf) profile_papi_enable= F
--------------------------------------------------------------------------
prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:
Process name: [prterun-CRUMPET-23234@1,17]
Exit code: 1
--------------------------------------------------------------------------
No other log files exist in the outputdata/f2000_climo directory.phansel@CRUMPET f2000_control % cat ../outputdata/f2000_control/run/PET0.ESMF_LogFile
20250112 173751.639 ERROR PET0 /Users/phansel/Public/CESM_Data/my_cesm_sandbox/components/cmeps/cime_config/../cesm/driver/esm.F90:950 Not valid - Invalid NTASKS value specified for component: cpl ntasks: 32 1
20250112 173751.639 ERROR PET0 /Users/phansel/Public/CESM_Data/my_cesm_sandbox/components/cmeps/cime_config/../cesm/driver/esm.F90:203 Not valid - Passing error in return code
20250112 173751.639 ERROR PET0 ESM0001:src/addon/NUOPC/src/NUOPC_Driver.F90:797 Not valid - Passing error in return code
20250112 173751.639 ERROR PET0 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:2918 Not valid - Phase 'IPDv02p1' Initialize for modelComp 1: ESM0001 did not return ESMF_SUCCESS
20250112 173751.639 ERROR PET0 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:1345 Not valid - Passing error in return code
20250112 173751.639 ERROR PET0 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:486 Not valid - Passing error in return code
20250112 173751.639 ERROR PET0 /Users/phansel/Public/CESM_Data/my_cesm_sandbox/components/cmeps/cime_config/../cesm/driver/esmApp.F90:134 Not valid - Passing error in return code
20250112 173751.639 INFO PET0 Finalizing ESMF with endflag==ESMF_END_ABORT
I could not find any issues on ESMF with NTASKS mentioned: Issues · esmf-org/esmf
I looked into changing NTASKS_xyz for ATM/ICE/LND etc. to 1 (from 32) via xmlchange, but the same log file showed another error. Is this an ESMF issue?
I found this thread on a similar issue, but did not find any difference in behavior when I changed mpi-serial to mpt or the number of available PEs per node to 1.
Question about setting up a global simulation (the same as the case in the CTSM tutorial)?
Hi, I am a green hand in the global simulation using CLM5. There is an official CTSM tutorial for global simulation (CTSM-Tutorial/notebooks/Day1a_GlobalCase.ipynb at main · NCAR/CTSM-Tutorial). It really helps! I tried to set up the same case but I came across some issues with the global...
bb.cgd.ucar.edu
If this is a port to a new machine: Please attach any files you added or changed for the machine port (e.g., config_compilers.xml, config_machines.xml, and config_batch.xml) and tell us the compiler version you are using on this machine.
Please attach any log files showing error messages or other useful information.
Build logs and PET0.ESMF_LogFile attached.
Describe your problem or question:
Case submission on a tutorial CAM test case fails within seconds due to an ESMF error.
Attachments
-
CDEPS.bldlog.250112-170508.txt166.2 KB · Views: 0
-
git-fleximod-status.txt4 KB · Views: 0
-
brewlist.txt1.8 KB · Views: 1
-
rof.bldlog.250112-170508.txt39.8 KB · Views: 0
-
pio.bldlog.250112-170508.txt95.5 KB · Views: 0
-
ocn.bldlog.250112-170508.txt384 bytes · Views: 0
-
ice.bldlog.250112-170508.txt284.7 KB · Views: 0
-
gptl.bldlog.250112-170508.txt7.7 KB · Views: 0
-
csm_share.bldlog.250112-170508.txt182.8 KB · Views: 0
-
cesm.bldlog.250112-170508.txt122.7 KB · Views: 0