Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Containerized CESM for laptops/workstations (Windows/Mac/Linux)

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Hi everyone,

One of the challenges a lot of new CESM users encounter is that of installing the software to begin with - so we've been working on a 'containerized' CESM environment to help people get up and running, and learning CESM, more easily. Containers offer portable environments - so this works on Macs, Linux and even Windows systems, and runs on anything from personal laptops to high-end servers. Best of all, it requires zero configuration. You can download the image, run it, and have a working CESM environment immediately.

In addition, we've added a Jupyter Lab environment to the image ("CESM-Lab"), giving users a choice of interfaces - shell, or Jupyter notebooks. Since everything is preconfigured, we're also able to provide tutorials on using CESM - the CESM-Lab image comes with a Quick Start notebook walking new users through the main steps of creating a case, configuring it, building it and running it. We plan on adding more tutorials in the future, covering much more, including analysis and visualization of results via Jupyter Notebooks.

A Google Doc with instructions on downloading and installing 'CESM-Lab-2.2' is available here:

And a spreadsheet of tested compsets, their RAM requirements, input data sizes and performance on both a 4-core laptop and a 36-core server is viewable here. We're adding more to this regularly too:

Note that this comes with a full release of CESM 2.2; it's not limited in any way, other than your system resources. RAM limits what compsets and grid sizes you can use, and the type and count of processors will determine the performance of those runs. Some things, like 'simpler model' atmosphere runs or single-point land runs, work fine even with limited resources, whereas fully-coupled 1-degree runs would require a workstation or server with lots of memory.

Finally, this is a technology preview of our container efforts; we're aiming for an official release in the near future. In the meantime, we wanted to get it out to the community, get feedback on how it works for you, and start working on a variety of improvements and additional tutorial material, too!

Thanks, and please ask if you have any questions!
- Brian
 

smoggy

smogger
New Member
Thanks much for doing this. I've spent the past 3 weeks trying to install CESM with no luck. I followed your installation instructions, but I'm getting this error:

Here is the command I'm running (I had to remove the '-n cesmlab' part in your instructions). What am I doing wrong?
sudo docker run -it --rm -v ${HOME}/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2


Code:
sudo docker run -it --rm -v ${HOME}/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2
cp: cannot create directory '/home/user/tutorials': Permission denied
touch: cannot touch '/home/user/.tutorial_initialized': Permission denied
[I 17:54:31.502 LabApp] [nb_conda_kernels] enabled, 1 kernels found
Traceback (most recent call last):
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/traitlets.py", line 535, in get
    value = obj._trait_values[self.name]
KeyError: 'runtime_dir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/default/bin/jupyter-lab", line 10, in <module>
    sys.exit(main())
  File "/srv/conda/envs/default/lib/python3.7/site-packages/jupyter_core/application.py", line 270, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/config/application.py", line 844, in launch_instance
    app.initialize(argv)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/config/application.py", line 87, in inner
    return method(app, *args, **kwargs)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/notebook/notebookapp.py", line 2034, in initialize
    self.init_configurables()
  File "/srv/conda/envs/default/lib/python3.7/site-packages/notebook/notebookapp.py", line 1563, in init_configurables
    connection_dir=self.runtime_dir,
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/traitlets.py", line 575, in __get__
    return self.get(obj, cls)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/traitlets.py", line 538, in get
    default = obj.trait_defaults(self.name)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/traitlets.py", line 1578, in trait_defaults
    return self._get_trait_default_generator(names[0])(self)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/jupyter_core/application.py", line 100, in _runtime_dir_default
    ensure_dir_exists(rd, mode=0o700)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/jupyter_core/utils/__init__.py", line 13, in ensure_dir_exists
    os.makedirs(path, mode=mode)
  File "/srv/conda/envs/default/lib/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/default/lib/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/default/lib/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/default/lib/python3.7/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/user/.local'
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Hi smoggy,

My guess is this is happening because you're running docker via 'sudo' -- is that necessary on your system? And is this a shared system, or a personal one? My thinking is that that's switching you to root, which is then trying to write that '.tutorial_initialized' file into your ~/cesmlab directory. However, 'root' might not have write access to your directory, especially via Docker given how it deals with permissions, but all that depends on your system configuration.

Can you run without sudo? What is the output of 'docker images' (without sudo)?

Also, if not, what OS are you running on and what version of Docker (docker --version)?

Cheers,
- Brian
 

smoggy

smogger
New Member
Hi smoggy,

My guess is this is happening because you're running docker via 'sudo' -- is that necessary on your system? And is this a shared system, or a personal one? My thinking is that that's switching you to root, which is then trying to write that '.tutorial_initialized' file into your ~/cesmlab directory. However, 'root' might not have write access to your directory, especially via Docker given how it deals with permissions, but all that depends on your system configuration.

Can you run without sudo?

I tried running without sudo. The same error happened.

What is the output of 'docker images' (without sudo)?

Code:
docker images

REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE

escomp/cesm-lab-2.2   latest              b05fe4db99f6        3 days ago          4GB

hello-world           latest              bf756fb1ae65        10 months ago       13.3kB

Also, if not, what OS are you running on and what version of Docker (docker --version)?

Cheers,
- Brian

docker --version
Docker version 19.03.13, build 4484c46d9d

OS: Ubuntu 18.04
My system is a personal system.
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
OK, let's try two more things first just to try to understand why this might be happening.

1) On your personal system, do:

echo $HOME
ls ${HOME}/cesmlab


This is to check if Ubuntu isn't giving you a $HOME variable, which would mean you're trying to mount a root-level thing (/cesmlab), which could certainly be problematic.

2) If that does point to where it should (for example, $HOME is something like /home/smoggy), then do this:

docker run -it --rm --entrypoint=/bin/bash -v ${HOME}/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2

This should put you in a bash shell in the container. From there, share the output of:

ls -al /home/user
id


This is to see what's seen from inside the container, and what user ID you have in it. After running those, just type 'exit' to get out.

Thanks!
- Brian
 

smoggy

smogger
New Member
1) On your personal system, do:

echo $HOME
ls ${HOME}/cesmlab


This is to check if Ubuntu isn't giving you a $HOME variable, which would mean you're trying to mount a root-level thing (/cesmlab), which could certainly be problematic.
Ok, I did that and everything looks ok.
2) If that does point to where it should (for example, $HOME is something like /home/smoggy), then do this:

docker run -it --rm --entrypoint=/bin/bash -v ${HOME}/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2

This should put you in a bash shell in the container. From there, share the output of:

ls -al /home/user
id
Ok, here are the results:

Code:
docker run -it --rm --entrypoint=/bin/bash -v ${HOME}/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2
bash-4.4$ ls -al /home/user
total 8
drwxr-xr-x 2 1002 1003 4096 Nov  5 17:41 .
drwxr-xr-x 1 root root 4096 Nov  1 23:07 ..
bash-4.4$ ld
ld: no input files
bash-4.4$ exit
exit
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
OK, almost there - it looks like your 'cesmlab' folder has a different owner (1002) and group (1003). And inside the container, you should have owner 1000 and group 1000, hence why it can't write to that directory. Why this is happening isn't yet clear to me, but I'm downloading an Ubuntu image to try myself.

In the meantime, here's two more things you can try:

1) If you created the ~/cesmlab directory by hand on your host system, try running the docker image with a directory that doesn't exist (~/cesmlabnew), eg:

docker run -it --rm -v ${HOME}/cesmlabnew:/home/user -p 8888:8888 escomp/cesm-lab-2.2

If the directory doesn't exist, it should create it, hopefully with the correct permissions this time. if not, we've got something interesting going on with how Docker is handling permissions from your Ubuntu system.

2) On your host system, can you send the output of:

id -u
id -g


That would tell us the user and group number (without any name information) on your host system. This command is 'id', not 'ld' - the former gives user information, and the latter is for linking object files.

I'll also update once I can set up my own Ubuntu system and duplicate this, hopefully later this morning. If we still can't figure it out and you want to hop on a video call, we can have you share your screen and try to walk through this together. Thanks for your patience!

Cheers,
- Brian
 

smoggy

smogger
New Member
OK, almost there - it looks like your 'cesmlab' folder has a different owner (1002) and group (1003). And inside the container, you should have owner 1000 and group 1000, hence why it can't write to that directory. Why this is happening isn't yet clear to me, but I'm downloading an Ubuntu image to try myself.
I followed these instructions to create a group as part of the Docker installation process: Post-installation steps for Linux
Just in case that's relevant.

In the meantime, here's two more things you can try:

1) If you created the ~/cesmlab directory by hand on your host system, try running the docker image with a directory that doesn't exist (~/cesmlabnew), eg:

docker run -it --rm -v ${HOME}/cesmlabnew:/home/user -p 8888:8888 escomp/cesm-lab-2.2

If the directory doesn't exist, it should create it, hopefully with the correct permissions this time. if not, we've got something interesting going on with how Docker is handling permissions from your Ubuntu system.
Ok, here's the output:
Code:
docker run -it --rm -v ${HOME}/cesmlabnew:/home/user -p 8888:8888 escomp/cesm-lab-2.2
cp: cannot create directory '/home/user/tutorials': Permission denied
touch: cannot touch '/home/user/.tutorial_initialized': Permission denied
[I 17:13:04.681 LabApp] [nb_conda_kernels] enabled, 1 kernels found
Traceback (most recent call last):
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/traitlets.py", line 535, in get
    value = obj._trait_values[self.name]
KeyError: 'runtime_dir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/default/bin/jupyter-lab", line 10, in <module>
    sys.exit(main())
  File "/srv/conda/envs/default/lib/python3.7/site-packages/jupyter_core/application.py", line 270, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/config/application.py", line 844, in launch_instance
    app.initialize(argv)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/config/application.py", line 87, in inner
    return method(app, *args, **kwargs)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/notebook/notebookapp.py", line 2034, in initialize
    self.init_configurables()
  File "/srv/conda/envs/default/lib/python3.7/site-packages/notebook/notebookapp.py", line 1563, in init_configurables
    connection_dir=self.runtime_dir,
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/traitlets.py", line 575, in __get__
    return self.get(obj, cls)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/traitlets.py", line 538, in get
    default = obj.trait_defaults(self.name)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/traitlets/traitlets.py", line 1578, in trait_defaults
    return self._get_trait_default_generator(names[0])(self)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/jupyter_core/application.py", line 100, in _runtime_dir_default
    ensure_dir_exists(rd, mode=0o700)
  File "/srv/conda/envs/default/lib/python3.7/site-packages/jupyter_core/utils/__init__.py", line 13, in ensure_dir_exists
    os.makedirs(path, mode=mode)
  File "/srv/conda/envs/default/lib/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/default/lib/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/default/lib/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/default/lib/python3.7/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/user/.local'
2) On your host system, can you send the output of:

id -u
id -g
id -u
1002
id -g
999

That would tell us the user and group number (without any name information) on your host system. This command is 'id', not 'ld' - the former gives user information, and the latter is for linking object files.

I'll also update once I can set up my own Ubuntu system and duplicate this, hopefully later this morning. If we still can't figure it out and you want to hop on a video call, we can have you share your screen and try to walk through this together. Thanks for your patience!

Cheers,
- Brian
Thanks much for your time and help. I appreciate it!
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
OK, let's try one simple thing before we get into complex stuff - this worked on my Ubuntu 18.04 system:

1) As you (no sudo), make a directory for cesm:

mkdir ~/cesm

(This creates the directory we're going to use with your user's permissions. You can verify this with 'ls -al ~/cesmlab'. If the directory already exists, give it a different name.)

2) Run docker, as you were before - sudo here is optional if you have Docker running without it, but it works with it, too:

sudo docker run -it --rm -v ${HOME}/cesm:/home/user -p 8888:8888 escomp/cesm-lab-2.2

That should work, fingers crossed! If not, I've got a few more ideas to try. :-)

- Brian
 

smoggy

smogger
New Member
OK, let's try one simple thing before we get into complex stuff - this worked on my Ubuntu 18.04 system:

1) As you (no sudo), make a directory for cesm:

mkdir ~/cesm

(This creates the directory we're going to use with your user's permissions. You can verify this with 'ls -al ~/cesmlab'. If the directory already exists, give it a different name.)
Ok. Here's the output of 'ls -al ~/cesm':
Code:
ls -al ~/cesm
total 8
drwxr-xr-x  2 myname docker          4096 Nov  6 14:55 .
drwxr-xr-x 35 myname myname          4096 Nov  6 14:55 ..
2) Run docker, as you were before - sudo here is optional if you have Docker running without it, but it works with it, too:

sudo docker run -it --rm -v ${HOME}/cesm:/home/user -p 8888:8888 escomp/cesm-lab-2.2

That should work, fingers crossed! If not, I've got a few more ideas to try. :-)

- Brian
Nope. I'm getting the same errors still. What else should try?
 

smoggy

smogger
New Member
Also, I discovered that there are multiple accounts on my computer for different users. Could that be causing any problems? I've been able to install other programs (some via apt-get install, and some via build & compile) without any permissions issues until this Docker situation.
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
OK, it turns out I can duplicate this now -- this is an interesting problem, and I think I need to find a good solution to it.

In the meantime, I'll give you a less-than-ideal solution. :-)

You're right that it does have to do with accounts, and user/group ID mappings between the host and the container, but it's not something I encountered until now. The fix we're going to do is easy, just not exactly elegant:

1) Make a directory for your CESM work (if it doesn't already exist)

mkdir ~/cesmlab

2) Change ownership of that to 1000:1000 (the IDs inside the container!):

sudo chown -R 1000:1000 ~/cesmlab

3) Run Docker as normal (sudo is optional if you've configured it to not need it now):

docker run -it --rm -v ${HOME}/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2

The downside to this is that what you do with the files in that directory from your host system might be limited, but I'll work on getting a better solution done soon, too.

Also, I'll send you a private message with my email address - if you want to have a video chat where you can share your screen, if you still hit issues, we can work through it together.

Cheers,
- Brian
 
Last edited:

smoggy

smogger
New Member
Wow thanks! That seems to have worked. I can now use the Jupyter Notebook. However, I'm getting a new problem.

In the Jupyter Notebook, the test simulation seems to have run fine. But the next step ("visualize the output") results in:

Code:
from cesm import QuickView
QuickView('quickstart_case', 'T')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-1ccc8d4d8fd4> in <module>
      1 from cesm import QuickView
----> 2 QuickView('quickstart_case', 'T')

/srv/conda/envs/default/lib/python3.7/site-packages/cesm.py in QuickView(case, variable, model)
     61   type = model.lower()
     62   if type in { 'atmosphere', 'atm', 'cam' }:
---> 63     QuickViewATM(case, variable)
     64

/srv/conda/envs/default/lib/python3.7/site-packages/cesm.py in QuickViewATM(case, variable)
     25   # Get the list of all hist files for the atmosphere:
     26   files = glob.glob('/home/user/archive/' + case + '/atm/hist/*.nc')
---> 27   latest = max(files , key = os.path.getctime)
     28   dataset = xr.open_dataset(latest)
     29

ValueError: max() arg is an empty sequence

However, this error went away if I did the 28-day continuation instead of only the 3-day simulation. Does visualization require a minimum number of days? (Or maybe I did something wrong).

Other questions:
1) On the table of benchmarks in the 1st post in this thread, what does SYPD mean? "simulated years per day"?

2) You mentioned that this solution might cause limitations on what I can do with the files in the directory. What kind of limitations do you mean? Will it affect my ability to access the output data?
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
However, this error went away if I did the 28-day continuation instead of only the 3-day simulation. Does visualization require a minimum number of days? (Or maybe I did something wrong).

No, you didn't do anything wrong - this is because by default the model outputs 'history' files in 1-month intervals, hence why the 28 other days were needed. I've been debating about modifying this in the example so we get 'history' output after that initial 3-day run, making plotting things a lot easier/faster, and I think this question recommends that approach in the next build. For me, it was a balance between changing as little as possible, and having fast results you CAN plot. Another approach would be to just cover this sort of modification in later tutorials (as was planned). I welcome any feedback you have on either approach!

(Also, note that 'history' files and 'restart' files are different -- the latter is really just used to restart simulations. Eg, the 28 day run restarted from the 3-day run's restart outputs.)

1) On the table of benchmarks in the 1st post in this thread, what does SYPD mean? "simulated years per day"?

Yes, I've just changed the sheet to reflect that. We work in a few too many acronyms here! It's always good to get a sanity-check when we use them without defining them.

2) You mentioned that this solution might cause limitations on what I can do with the files in the directory. What kind of limitations do you mean? Will it affect my ability to access the output data?

Yes and no - to clarify, everything you do via the container, even via Jupyter (or the shell), will work fine. No limitations whatsoever. However, since that user:group ID is different on your laptop, if you were to try to modify things on your laptop directly -like, say, edit a text file outside the container- you'll hit permissions errors. You can of course do a 'chown' on all those files to allow yourself, but you'll have to 'chown' them back again to edit them from within the container.

Long story short, as long as you do everything in the container you're fine... you'll just get into permissions errors trying to do things from your Linux system. I'm planning to find a better solution for this (it doesn't happen on Macs / Windows systems!), but in the meantime, that's just something to be aware of.

Definitely let me know if you have more questions, and I'm glad you're up and running!

- Brian
 

NeSe

NS
New Member
Just chiming in to mention that running this docker container under singularity does seem to work out of the box for the most part, with the quirk of /home/user being undefined given the differences in singularity vs docker, I suppose. Just poking around the configuration a bit, there looks to be a way to update the /srv/start and environment variables to avoid the user/group ID gymnastics with some path-smithing and removing the -l flag from the /srv/start script (userspace shell rather than login shell). Alternatively, that can more easily be set up as a different dedicated execution target.

Disclaimer: I am not specifically asking for this extra support. I'm just curious and playing around to see if I can get everything running smoothly using an alternate/unsupported program. I figure I'll just post what I figure out on the off chance it comes in handy to someone else. I am sure there are at least 1 other stubborn ox who'd rather use singularity over docker if they can get away with it :)
 

wwieder

Will Wieder
New Member
I ran 400 years of a single point CLMbgc simulation overnight! Not efficient for model development on your local machine but everything worked well. My only suggestion for the instructions would be to clarify that step #4 happens w/in the docker terminal window (or at least that's how I did it). @dobbins let me know what kind of support or development I can help with for this. I have a notebook that generates some simple comparison plots.


Right now it's configured for cheyenne what do you need to do to make this more generic? (it's also not very clean or efficient code right now).
How does cesm-lab handle different python environments?
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Just chiming in to mention that running this docker container under singularity does seem to work out of the box for the most part, with the quirk of /home/user being undefined given the differences in singularity vs docker, I suppose. Just poking around the configuration a bit, there looks to be a way to update the /srv/start and environment variables to avoid the user/group ID gymnastics with some path-smithing and removing the -l flag from the /srv/start script (userspace shell rather than login shell). Alternatively, that can more easily be set up as a different dedicated execution target.

Thank you very much for this feedback - that's great. With regards to the /home/user stuff, originally the thinking was to do that so there was a consistent path (/home/user) for every user case,... but obviously that can be done with $HOME and whatever it gets mapped to. I'll look into fixing/changing this for the actual release. And I'm already playing with some changes to the '/srv/start' script (a relic of the Jupyter build) to provide for better options - eg, without any options, it does the usual launch, but when a shell is specified, it does that instead. This would also let people launch more easily into a tcsh, for example. Again, thanks very much!

Disclaimer: I am not specifically asking for this extra support. I'm just curious and playing around to see if I can get everything running smoothly using an alternate/unsupported program. I figure I'll just post what I figure out on the off chance it comes in handy to someone else. I am sure there are at least 1 other stubborn ox who'd rather use singularity over docker if they can get away with it :)

Please, keep doing so - we're committed to improving this, and wanted it out there as a 'preview' for this very reason. We did originally intend to release Singularity-native versions, too. I have less experience with them so far, and still need to evaluate whether running the Docker version from Singularity will still give us things like the pass-through capability on HPC machines that Cray systems support for the network, for example.

Thanks again!
- Brian
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
I ran 400 years of a single point CLMbgc simulation overnight! Not efficient for model development on your local machine but everything worked well.

Glad to hear it's running for you, Will! Out of curiosity, when you say it's 'not efficient' for model development, do you just mean the rate is too slow, or that tying up your system while running a case is not efficient? If the former, I'd be curious to see a timing file, and know what sort of processor you have. We're looking at GNU9 vs. GNU8, which gives minor improvements, but for single-point simulations, you're not running in parallel anyway, so there's not a huge boost in running on, say, a Cheyenne node. The big downside is the lack of the Intel compiler, but that's something we might be able to improve upon too.

My only suggestion for the instructions would be to clarify that step #4 happens w/in the docker terminal window (or at least that's how I did it).

Sorry, this wasn't clear to me - step #4 in the documentation? The 'docker run' command? That's actually in a system terminal window - eg, the command prompt in Windows, or a Linux or Mac terminal on those platforms.

@dobbins let me know what kind of support or development I can help with for this. I have a notebook that generates some simple comparison plots.


Right now it's configured for cheyenne what do you need to do to make this more generic? (it's also not very clean or efficient code right now).
How does cesm-lab handle different python environments?

This is already incredibly helpful - I need to add 'cf_utils' and 'udunits2' to the image (you can download them yourself via 'sudo conda install cf_utils', but we want all these prerequisites included, when possible), and then I have some questions about whether to include your ctsm_py module, and how often it changes, or whether to fold that into a more general 'cesm' module. Let's chat sometime next week? We can also talk about input data locations, and what should/shouldn't be on the public FTP servers -- then we can come up with a new version of this that'll work regardless of whether you're on Cheyenne or a laptop.

Thanks!
- Brian
 

wwieder

Will Wieder
New Member
Glad to hear it's running for you, Will! Out of curiosity, when you say it's 'not efficient' for model development, do you just mean the rate is too slow, or that tying up your system while running a case is not efficient? If the former, I'd be curious to see a timing file, and know what sort of processor you have. We're looking at GNU9 vs. GNU8, which gives minor improvements, but for single-point simulations, you're not running in parallel anyway, so there's not a huge boost in running on, say, a Cheyenne node. The big downside is the lack of the Intel compiler, but that's something we might be able to improve upon too.



Sorry, this wasn't clear to me - step #4 in the documentation? The 'docker run' command? That's actually in a system terminal window - eg, the command prompt in Windows, or a Linux or Mac terminal on those platforms.



This is already incredibly helpful - I need to add 'cf_utils' and 'udunits2' to the image (you can download them yourself via 'sudo conda install cf_utils', but we want all these prerequisites included, when possible), and then I have some questions about whether to include your ctsm_py module, and how often it changes, or whether to fold that into a more general 'cesm' module. Let's chat sometime next week? We can also talk about input data locations, and what should/shouldn't be on the public FTP servers -- then we can come up with a new version of this that'll work regardless of whether you're on Cheyenne or a laptop.

Thanks!
- Brian
ctsm_py isn't a well curated repository at this point. A better plan may be to create a better curated repo with code that we want to contribute for tutorial, training, etc.
 

NeSe

NS
New Member
Thank you very much for this feedback - that's great. With regards to the /home/user stuff, originally the thinking was to do that so there was a consistent path (/home/user) for every user case,... but obviously that can be done with $HOME and whatever it gets mapped to. I'll look into fixing/changing this for the actual release. And I'm already playing with some changes to the '/srv/start' script (a relic of the Jupyter build) to provide for better options - eg, without any options, it does the usual launch, but when a shell is specified, it does that instead. This would also let people launch more easily into a tcsh, for example. Again, thanks very much!
Have you been doing this primarily with a Dockerfile to build the container or some other method? If the former, maybe we can get it under version control for easier collaboration? I was originally planning on doing a from-scratch Singularity recipe file to build my container, and then I just happened to stumble upon this at an opportune time :)
Please, keep doing so - we're committed to improving this, and wanted it out there as a 'preview' for this very reason. We did originally intend to release Singularity-native versions, too. I have less experience with them so far, and still need to evaluate whether running the Docker version from Singularity will still give us things like the pass-through capability on HPC machines that Cray systems support for the network, for example.

Thanks again!
- Brian
What kind of passthrough? I have experience using GPU passthrough on Singularity for Nvidia GPUs (tensorflow/deep learning), which typically works out of the box with the --nv flag (although the user might need to be in a "video" group for it to work). If memory serves, that worked on the RHEL7 kernel (3.10), but I haven't tested on an earlier kernel. I don't have experience with radeon or other ASIC passthrough, so I'd be less helpful there. The Singularity team does appear to be relatively responsive though, so if you pass along those details I don't mind looking into it.

Thanks
-Neil
 
Top