Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM 1.2.2 (F_AMIP_CAM5 compset) crashes immediately on Edison

Hi everyone,
The CASCADE project has constructed some small test problems that we are using to understand some issues that arise on Edison since its new hardware configuration. One of them is a case configured with the F_AMIP_CAM5 compset on 192 processes that dies with a segmentation fault somewhere in the domain decomposition stage for CAM, just a few minutes after starting up. I have been trying without success to get more information about what causes the crash, and am wondering if you guys can reproduce this issue using NERSC's copy of CESM 1.2.2 and the following materials. To create the case, run the following on Edison in the /global/homes/j/johnson/CASCADE_test_cases directory:csh CASCADE_test_case1.csh  This generates and submits the case. If you are game to try this, you might want to cancel that submission and change the .run script to use the debug queue, as Edison and Cori's queues are even more backed-up than usual these days. The run will die with a segmentation fault as mentioned. I have successfully run the DDT debugger in offline mode by replacing the line srun --label --ntasks=192 --cpu_bind=sockets --cpu_bind=verbose --kill-on-bad-exit $EXEROOT/cesm.exe >&! cesm.log.$LIDin the .run script with
ddt --offline=output.html -np 192 $EXEROOT/cesm.exe >&! cesm.log.$LID
which writes a file output.html to the run/ directory, which can be opened in the browser and will show a stack trace. Unfortunately I don't have an output.html example handy to show you. Can anyone reproduce this crash, and does this ring any bells for anyone? This is holding us up, and we have engaged people at NERSC, but it's been slow going and we would greatly appreciate any insight. Please let me know if this isn't a good enough description of the case/problem, or if you need any more materials to check it out. Best,Jeffrey JohnsonLawrence Berkeley Laboratory
 
I would add that test case 1 is default resolution finite volume dynamics but with the prescribed aerosol package instead of the prognostic aerosol package. It is not clear if this configuration ever ran on edison, but it ran very well on hopper before its demise.
 

jedwards

CSEG and Liaisons
Staff member
Hi Jeffrey, I have not been able to reproduce your problem.   I last updated cesm1_2_2 with a fix in cam/src/dynamics/fv/spmd_dyn.F90 on March 29th can you double check that you have this change?   It's in the source code in  /project/projectdirs/ccsm1/collections/cesm1_2_2
 
Top