Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

OMP error on Stampede

Hi, I've run into the same OMP error 7 times when trying to run the BC5 compset on Stampede. The error in the CESM log: Opened existing file  /scratch/projects/xsede/CESM/inputdata/atm/cam/topo/USGS-gtopo30_0.9x1.25_remap _c051027.nc      131072 NetCDF: Invalid dimension ID or nameOMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailableI am trying to run "out of the box" with the default PES setup for Stampede. Per the suggestion here I added a few lines to my runscript (some of which may be redundant) so that it now readslimit coredumpsize 1000000limit stacksize unlimitedsetenv SLURM_NPROCS 1664setenv SLURM_CPUS_PER_TASK 1setenv OMP_STACKSIZE 1000Msetenv MPSTKZ 40000000 I've also tried running in debug mode , and with a unique "config_machines" file borrowed from another stampede user, but that did not prevent the error. Now I'm out of ideas! Of note is that I'm also trying to run the COSP simulator, but there's no reason to believe that's the problem. One of many runscripts attached. Thanks!
 
Just an update on this, I'm working with the people at TACC to find the source of the problem. It's probably related to Stampede's compiler update to intel15. -BenjUPDATED: After a struggle to fix this problem with intel15, I reverted back to intel13.0.2.146. The model now runs again. If you would like to do it this way, you can copy the attached file into your case directory as env_mach_specific.  It will purge your modules and use the right ones for intel13. 
 
Just an update on this, I'm working with the people at TACC to find the source of the problem. It's probably related to Stampede's compiler update to intel15. -BenjUPDATED: After a struggle to fix this problem with intel15, I reverted back to intel13.0.2.146. The model now runs again. If you would like to do it this way, you can copy the attached file into your case directory as env_mach_specific.  It will purge your modules and use the right ones for intel13. 
 
UPDATE 2: I was wrong about the source of the error. According to TACC, the OMP Error was actually related to the way that MPI jobs are launched on Stampede. I'm told Stampede is not set up to handle jobs with multiple threads per task, unlike Yellowstone. To resolve this, I had to configure the PES with 1 thread per task for all model components, and adjust the root PES accordingly. The F compset runs with nthreads=1 by default. However, the B compset runs with nthreads=4 for several model components. Therefore, the problem only arose when used the B compset. The compiler "error" was a red herring. I suspect other people have had this problem, or they know something I don't about how to submit nthreads >1 on Stampede! Hopefully this message will help. 
 
UPDATE 2: I was wrong about the source of the error. According to TACC, the OMP Error was actually related to the way that MPI jobs are launched on Stampede. I'm told Stampede is not set up to handle jobs with multiple threads per task, unlike Yellowstone. To resolve this, I had to configure the PES with 1 thread per task for all model components, and adjust the root PES accordingly. The F compset runs with nthreads=1 by default. However, the B compset runs with nthreads=4 for several model components. Therefore, the problem only arose when used the B compset. The compiler "error" was a red herring. I suspect other people have had this problem, or they know something I don't about how to submit nthreads >1 on Stampede! Hopefully this message will help. 
 
Top