Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

case.submit ERROR: RUN FAIL: command

Maggie Xia

Maggie Xia
New Member
Dear all,
I am trying to submit a case created using compset 'ISSP126Clm50BgcCrop' in the containized docker version of CESM. The only setting I changed was the DIN_LOC_ROOT. The case create, setup, and build processes were ok. Here I attached the cesm.log file, FYI. Thank you!
 

Attachments

  • cesm.log.2220205-084045.txt
    9.9 KB · Views: 18

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I tried that compset (at f09_g17 resolution) on our supercomputer (outside the container) with release-cesm2.2.0 and it worked fine.
What resolution are you trying to run at?
I see in the following post that @smoggy had the same error ("Program received signal SIGBUS: Access to an undefined portion of a memory object.") in the containerized version of CESM:


So you might check with @smoggy to see if the error was resolved.
Otherwise, I suggest creating a new post in the Containers & Cloud Platforms Forum and see if you can get help there.
I do see that there is a list of tested compsets for the container version and ISSP126Clm50BgcCrop is not on there.
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Dear all,
I am trying to submit a case created using compset 'ISSP126Clm50BgcCrop' in the containized docker version of CESM. The only setting I changed was the DIN_LOC_ROOT. The case create, setup, and build processes were ok. Here I attached the cesm.log file, FYI. Thank you!

I don't think you're running in the container -- your log files indicate a user path that's different from what you get in the container, and the Intel compilers which aren't yet embedded in it:

ERROR: Command make exec_se -j 4 EXEC_SE=/Users/mark/projects/scratch/cam6_fv2deg/bld/cesm.exe
...
/usr/local/intel/bin/mpif90 ...

Can you share more details of what commands you're running and where you're running them? In the meantime, I'll test that this compset works in the container, too. I've run some other I compsets as seen here, so I would think this would too:


Thanks,
- Brian
 

Maggie Xia

Maggie Xia
New Member
I don't think you're running in the container -- your log files indicate a user path that's different from what you get in the container, and the Intel compilers which aren't yet embedded in it:



Can you share more details of what commands you're running and where you're running them? In the meantime, I'll test that this compset works in the container, too. I've run some other I compsets as seen here, so I would think this would too:


Thanks,
- Brian
Thanks for the reply, Brian. I am indeed running in the container, and I think the error you quoted didn't come from the log file I attached. However, I have solved this problem by doing './xmlchange MAX_TASKS_PER_NODE=6, MAX_MPITASKS_PER_NODE=6' (default value is 48). I have tried some other values, 24 did not work but 12 seems fine. I'm not sure why this happened, and I wonder if a smaller value for MAX_TASKS_PER_NODE may slow down the calculation processes? FYI, I am running Linux version 3.10.0-957.10.1.el7.x86_64 and Docker version 1.13.1, build 7d71120/1.13.1.
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Thanks for the reply, Brian. I am indeed running in the container, and I think the error you quoted didn't come from the log file I attached. However, I have solved this problem by doing './xmlchange MAX_TASKS_PER_NODE=6, MAX_MPITASKS_PER_NODE=6' (default value is 48). I have tried some other values, 24 did not work but 12 seems fine. I'm not sure why this happened, and I wonder if a smaller value for MAX_TASKS_PER_NODE may slow down the calculation processes? FYI, I am running Linux version 3.10.0-957.10.1.el7.x86_64 and Docker version 1.13.1, build 7d71120/1.13.1.

My apologies; I must've gotten confused from another post. My guess is that the issue you're hitting is related to the default limit that Docker gives in shared memory - typically 64MB - which is insufficient on high numbers of MPI ranks. Try adding the flag:

--shm-size=512M

... To your 'docker run' command. Basically, every MPI process stores some information in /dev/shm (a mapped region of memory), but Docker defaults to a small amount there, and it's typically fine for a 4- or 8-core laptop, but not a 48-core system like yours. For the GNU / MPICH combination in use, 512MB is likely enough for 48 cores, but worst case try 1G as well.

And yes, if you have 48 cores, using only 6 or 12 will to give you the full performance. It's not always completely linear, since memory bandwidth matters a lot as well, but I'd say try the above and then set the number of tasks correctly. For the container, if you're using the Jupyter version the MAX_TASKS_PER_NODE should be set automatically, but if you're not using the Jupyter version than yes, you need to set it explicitly. Setting both those variables, plus the NTASKS to 48 should work.

If not, let me know and we'll try to solve it quickly.

Cheers,
- Brian
 

Maggie Xia

Maggie Xia
New Member
Than
My apologies; I must've gotten confused from another post. My guess is that the issue you're hitting is related to the default limit that Docker gives in shared memory - typically 64MB - which is insufficient on high numbers of MPI ranks. Try adding the flag:

--shm-size=512M

... To your 'docker run' command. Basically, every MPI process stores some information in /dev/shm (a mapped region of memory), but Docker defaults to a small amount there, and it's typically fine for a 4- or 8-core laptop, but not a 48-core system like yours. For the GNU / MPICH combination in use, 512MB is likely enough for 48 cores, but worst case try 1G as well.

And yes, if you have 48 cores, using only 6 or 12 will to give you the full performance. It's not always completely linear, since memory bandwidth matters a lot as well, but I'd say try the above and then set the number of tasks correctly. For the container, if you're using the Jupyter version the MAX_TASKS_PER_NODE should be set automatically, but if you're not using the Jupyter version than yes, you need to set it explicitly. Setting both those variables, plus the NTASKS to 48 should work.

If not, let me know and we'll try to solve it quickly.

Cheers,
- Brian
Thanks, Brian! It's running fine now.
 
Top