Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Experiment just hang at the atmospheric initialization on Nersc Edison

Dear Nersc Users,Can you run B2000 experiment successfully on Nersc Edison with CESM1.2.2? I can run this kind of experiemnt successfully before, but after Edison's update, the experiment can be compiled successfully, and after submission, it stops at atmospheric initialization. One possible error message is "NetCDF: Invalid dimension ID or name". However, I checked my previous success experiment, this message also existed there. Did anyone meet this problem before? Any suggestions are welcome. I attached my log files here.I suspect the modules I loaded are not quite right. Could someone can run CESM1.2.2 successfully share the module list to me, please?Thanks in advance.Fuyao  
 
I had the same problem, but then I updated my cesm1_2_2 directory from /project/projectdirs/ccsm1/collections/cesm1_2_2 and this problem went away.Here is my module list: Currently Loaded Modulefiles:  1) modules/3.2.10.3                      19) PrgEnv-intel/5.2.56  2) nsg/1.2.0                             20) craype-ivybridge  3) eswrap/1.1.0-1.020200.1130.0          21) cray-shmem/7.3.1  4) switch/1.0-1.0502.57058.1.58.ari      22) cray-mpich/7.3.1  5) craype-network-aries                  23) slurm/edison  6) craype/2.5.1                          24) altd/2.0  7) intel/15.0.1.133                      25) darshan/2.3.1  8) cray-libsci/13.3.0                    26) cray-netcdf/4.3.3.1  9) udreg/2.3.2-1.0502.9889.2.20.ari      27) ncview/2.1.2 10) ugni/6.0-1.0502.10245.9.9.ari         28) nco/4.5.2 11) pmi/5.0.10-1.0000.11050.0.0.ari       29) ncl/6.1.1 12) dmapp/7.0.1-1.0502.10246.8.47.ari     30) python_base/2.7.9 13) gni-headers/4.0-1.0502.10317.9.2.ari  31) netcdf4-python/1.1.7.1 14) xpmem/0.1-2.0502.57015.1.15.ari       32) numpy/1.9.2 15) dvs/2.5_0.9.0-1.0502.1958.2.55.ari    33) scipy/0.15.1 16) alps/5.2.3-2.0502.9295.14.14.ari      34) matplotlib/1.4.3 17) rca/1.0.0-2.0502.57212.2.56.ari       35) basemap/1.0.7 18) atp/1.8.3 
 
I had the same problem, but then I updated my cesm1_2_2 directory from /project/projectdirs/ccsm1/collections/cesm1_2_2 and this problem went away.Here is my module list: Currently Loaded Modulefiles:  1) modules/3.2.10.3                      19) PrgEnv-intel/5.2.56  2) nsg/1.2.0                             20) craype-ivybridge  3) eswrap/1.1.0-1.020200.1130.0          21) cray-shmem/7.3.1  4) switch/1.0-1.0502.57058.1.58.ari      22) cray-mpich/7.3.1  5) craype-network-aries                  23) slurm/edison  6) craype/2.5.1                          24) altd/2.0  7) intel/15.0.1.133                      25) darshan/2.3.1  8) cray-libsci/13.3.0                    26) cray-netcdf/4.3.3.1  9) udreg/2.3.2-1.0502.9889.2.20.ari      27) ncview/2.1.2 10) ugni/6.0-1.0502.10245.9.9.ari         28) nco/4.5.2 11) pmi/5.0.10-1.0000.11050.0.0.ari       29) ncl/6.1.1 12) dmapp/7.0.1-1.0502.10246.8.47.ari     30) python_base/2.7.9 13) gni-headers/4.0-1.0502.10317.9.2.ari  31) netcdf4-python/1.1.7.1 14) xpmem/0.1-2.0502.57015.1.15.ari       32) numpy/1.9.2 15) dvs/2.5_0.9.0-1.0502.1958.2.55.ari    33) scipy/0.15.1 16) alps/5.2.3-2.0502.9295.14.14.ari      34) matplotlib/1.4.3 17) rca/1.0.0-2.0502.57212.2.56.ari       35) basemap/1.0.7 18) atp/1.8.3 
 
Hi Fuyao,I'm also running cesm on Edison but with another version 1.1.2, it always crashes immediately after submitting with the followng errors in ccsm.log,961057 (deleted)2aaab7800000-2aaab7808000 rwxs 00000000 00:00 02aaab7808000-2aaab7810000 rwxs 00000000 00:00 02aaab7810000-2aaab7811000 rwxs 00000000 00:00 02aaab7811000-2aaab7e11000 rwxs 00000000 00:00 0                          /dsl/dev/xpmem2aaab7e11000-2aaab8411000 rwxs 00000000 00:00 0                          /dsl/dev/xpmem2aaab8411000-2aaab8a11000 rwxs 00000000 00:00 0                          /dsl/dev/xpmem2aaab8a11000-2aaab9011000 rwxs 00000000 00:00 0                          /dsl/dev/xpmem2aaabc000000-2aaabc03f000 rwxp 00000000 00:00 02aaabc03f000-2aaac0000000 ---p 00000000 00:00 07ffffffc2000-7ffffffff000 rwxp 00000000 00:00 0                          [stack]ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]srun: error: nid02060: tasks 32-42,44-63: Abortedsrun: Terminating job step 2961057.0slurmstepd: error: *** STEP 2961057.0 ON nid02059 CANCELLED AT 2016-12-08T17:32:21 ***srun: Job step aborted: Waiting up to 32 seconds for job step to finish.srun: error: nid02059: tasks 0-31: Killedsrun: error: nid02060: task 43: Aborted (core dumped) 
I wondered if you know how to solve the problem? Or could you share your machine and compiler settings? thanks! --Fukai
 
Hi Fuyao,I'm also running cesm on Edison but with another version 1.1.2, it always crashes immediately after submitting with the followng errors in ccsm.log,961057 (deleted)2aaab7800000-2aaab7808000 rwxs 00000000 00:00 02aaab7808000-2aaab7810000 rwxs 00000000 00:00 02aaab7810000-2aaab7811000 rwxs 00000000 00:00 02aaab7811000-2aaab7e11000 rwxs 00000000 00:00 0                          /dsl/dev/xpmem2aaab7e11000-2aaab8411000 rwxs 00000000 00:00 0                          /dsl/dev/xpmem2aaab8411000-2aaab8a11000 rwxs 00000000 00:00 0                          /dsl/dev/xpmem2aaab8a11000-2aaab9011000 rwxs 00000000 00:00 0                          /dsl/dev/xpmem2aaabc000000-2aaabc03f000 rwxp 00000000 00:00 02aaabc03f000-2aaac0000000 ---p 00000000 00:00 07ffffffc2000-7ffffffff000 rwxp 00000000 00:00 0                          [stack]ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]srun: error: nid02060: tasks 32-42,44-63: Abortedsrun: Terminating job step 2961057.0slurmstepd: error: *** STEP 2961057.0 ON nid02059 CANCELLED AT 2016-12-08T17:32:21 ***srun: Job step aborted: Waiting up to 32 seconds for job step to finish.srun: error: nid02059: tasks 0-31: Killedsrun: error: nid02060: task 43: Aborted (core dumped) 
I wondered if you know how to solve the problem? Or could you share your machine and compiler settings? thanks! --Fukai
 
Top