Hi everyone,
I am trying to run a regional CLM simulation (Qinling/Shanxi area) using CTSM5.2.005 with DATM (GSWP3v1) as forcing. The case builds successfully, but immediately fails at runtime with KILLED BY SIGNAL: 9 and no output files are produced. I have already spent a lot of time debugging and would appreciate any suggestions.
What I did:
What could cause the model to be killed immediately after reading the first forcing file, despite single-core mode and sufficient memory? Is it a mesh/domain mismatch, a library issue, or something else? Any help is greatly appreciated.
Thank you!
I am trying to run a regional CLM simulation (Qinling/Shanxi area) using CTSM5.2.005 with DATM (GSWP3v1) as forcing. The case builds successfully, but immediately fails at runtime with KILLED BY SIGNAL: 9 and no output files are produced. I have already spent a lot of time debugging and would appreciate any suggestions.
What I did:
- Created case:
text
./create_newcase --case spinup06112 --res CLM_USRDAT --compset 2000_DATM%GSWP3v1_CLM50%SP_SICE_SOCN_MOSART_SGLC_SWAV --machine myintel --compiler intel --run-unsupported - Set up custom grid and forcing (see attached user_nl_clm, user_datm.streams.xml, user_nl_datm and env_run.xml for details). Main settings:
- CLM using a 0.01° unstructured mesh (186,837 elements) with mask file.
- DATM streams use <meshfile> pointing to a coarse atm_mesh.nc (2,170 elements).
- Forcing: GSWP3 0.5° data (years 1951-1999).
- Spinup mode: accelerated spinup, cold start, 49 years per run.
- Built and submitted. The job dies within seconds.
- Set NTASKS=1 (single core) – still killed.
- Fixed nlevurb mismatch: my surface dataset originally had nlevurb=5, I extended it to 10 using NCO/Python.
- Checked all input files exist and are readable; GSWP3 files appear normal.
- Verified memory is not exhausted (node has 503 GB, ~320 GB available when job runs).
- DATM opens the first solar forcing file successfully, but the process is killed shortly after, while setting up the I/O descriptor for variable FSDS. No data is actually read or interpolated.
- The cesm.log shows `malloc(): invalid size (unsorted)` followed by termination with mixed signals (SIGABRT on rank 1, SIGKILL on other ranks). This indicates heap corruption, not just out-of-memory.
- No lnd.log output beyond initialization header.
- cesm.log, atm.log, lnd.log, drv.log, med.log
- My detailed setup steps (covering user_nl_clm, user_nl_datm and user_datm.streams.xml)
What could cause the model to be killed immediately after reading the first forcing file, despite single-core mode and sufficient memory? Is it a mesh/domain mismatch, a library issue, or something else? Any help is greatly appreciated.
Thank you!
Attachments
-
atm.log.260611-104436.txt13.5 KB · Views: 1
-
cesm.log.260611-104436.txt50.8 KB · Views: 1
-
drv.log.260611-104436.txt961 bytes · Views: 1
-
lnd.log.260611-104436.txt4.7 KB · Views: 1
-
med.log.260611-104436.txt14.9 KB · Views: 1
-
My detailed setup steps (covering user_nl_clm, user_nl_datm and user_datm.streams.xml).txt8.5 KB · Views: 1