Dear all,
I have been running the FXHIST (I also see this in FX2000) compset, with a f19_f19_mg16 resolution. The model runs, and the cpl log reports 'SUCCESSFUL TERMINATION OF CPL7-cesm' but then there is a crash. From the cesm log it looks as though the issue is in writing the restart file for the atm (I have built with debug flags, but can't see the stack properly even with -g -Og).
The issue can be bypassed by putting half the number of tasks on a node and using 2 OMP threads, so whilst the error is reported as a segfault, not OOM, it may be that having more memory per task helps???
The compiler is GNU 10, the netcdf is parallel 4.7 ; cesm is 2.1.3 ; the machine is ARCHER2 (Cray / AMD)
If any of this is familiar, or you suspect that WACCM-X may need to be run with special settings/limits please share the info with me - I haven't found it so far, so thought that I'd ask. I can get something to run, but it'd be nice to be aware of anything obvious that I may be missing and it may be that others have had this issue too.
Thanks,
Dave
I have been running the FXHIST (I also see this in FX2000) compset, with a f19_f19_mg16 resolution. The model runs, and the cpl log reports 'SUCCESSFUL TERMINATION OF CPL7-cesm' but then there is a crash. From the cesm log it looks as though the issue is in writing the restart file for the atm (I have built with debug flags, but can't see the stack properly even with -g -Og).
The issue can be bypassed by putting half the number of tasks on a node and using 2 OMP threads, so whilst the error is reported as a segfault, not OOM, it may be that having more memory per task helps???
The compiler is GNU 10, the netcdf is parallel 4.7 ; cesm is 2.1.3 ; the machine is ARCHER2 (Cray / AMD)
If any of this is familiar, or you suspect that WACCM-X may need to be run with special settings/limits please share the info with me - I haven't found it so far, so thought that I'd ask. I can get something to run, but it'd be nice to be aware of anything obvious that I may be missing and it may be that others have had this issue too.
Thanks,
Dave