Containerized CESM for laptops/workstations (Windows/Mac/Linux)

dobbins · Nov 13, 2020

NSexton said:
Have you been doing this primarily with a Dockerfile to build the container or some other method? If the former, maybe we can get it under version control for easier collaboration? I was originally planning on doing a from-scratch Singularity recipe file to build my container, and then I just happened to stumble upon this at an opportune time :)

We've got a Github repo here, and collaboration sounds great:
ESCOMP/ESCOMP-Containers

Note that right now, you'll see the CESM directory has quite a few changes from a 'base' CESM install - this is because we wanted to get something out to experiment with, and CESM 2.2 was going through the release process. In the near future, most of those changes -generally focused on providing a 'container' machine config- will be included in CIME, and thus be unnecessary. In short, it'll get a lot less ugly soon, but still, that repo should let you play around.

The plan is to have Singularity recipes in there at some point, too.

We also have the start of a tutorial repo (for Jupyter Notebooks), but that's probably of less interest to you, I imagine.

GitHub - NCAR/CESM-Lab-Tutorial: Tutorial Jupyter Notebooks for the 'CESM-Lab' environment

Tutorial Jupyter Notebooks for the 'CESM-Lab' environment - NCAR/CESM-Lab-Tutorial

github.com

NSexton said:
What kind of passthrough? I have experience using GPU passthrough on Singularity for Nvidia GPUs (tensorflow/deep learning), which typically works out of the box with the --nv flag (although the user might need to be in a "video" group for it to work). If memory serves, that worked on the RHEL7 kernel (3.10), but I haven't tested on an earlier kernel. I don't have experience with radeon or other ASIC passthrough, so I'd be less helpful there. The Singularity team does appear to be relatively responsive though, so if you pass along those details I don't mind looking into it.

Ah, I meant pass-through to the underlying network - the MPICH ABI Compatibility Initiative lets you replace one compatible MPI implementation with another at runtime, and Cray machines, for example, can do this automagically with Shifter to use the native CrayMPI runtime. I think when I saw this, they were using Singularity containers, but Shifter is the key thing. So, for example, I can compile a CESM case in a container (with MPICH, no knowledge of a high-speed network like a Cray Aries or even Infiniband), and run it on that Cray, and have the host-level network used. Here's a neat paper showing some of this from Blue Waters:

Container solutions for HPC Systems: A Case Study of Using Shifter on Blue Waters

On Cheyenne (our system), we have SGI MPT, and there's a compatibility mode that also allows for this.. but required a little bit of work to get going, unlike Shifter's automatic use of it. I haven't looked into this in a while, but it's a really great thing for HPC-like use. And on the 'to do' list to check up on again.

Cheers,
- Brian

smoggy · Nov 13, 2020

@dobbins How can I look at/edit the CESM code that's inside the Docker image that is being used in the Jupyter notebook? Are the files accessible?

dobbins · Nov 13, 2020

smoggy said:
@dobbins How can I look at/edit the CESM code that's inside the Docker image that is being used in the Jupyter notebook? Are the files accessible?

Good question - if you just want to look, you can change directory to $CESMROOT ( /opt/ncar/cesm2 ) and poke around. By default, we make them owned by 'root', so you can't easily overwrite them. But you can always copy that whole tree to your home directory, too, to have an editable version:

cp -r /opt/ncar/cesm2 ~/cesm2

(And you can also just 'sudo' to root, without a password, inside the container!)

If you're looking to make source-code changes to a case you're building, you generally identify the file you want changed in the main source tree ( in /opt/ncar/cesm2/components/cam, for example ), then put that file in your case's SourceMods/src.cam directory, again using CAM as the example component. We're working on having tutorials on this sort of thing in the future built into the container, but in the meantime, here's the (non-container) tutorial slides / videos that might help, too:

2020 CESM Tutorial Coursework

Sciences presentations and labs exercises from the 2020 CESM Tutorial

www.cesm.ucar.edu

Hope that helps, and let me know if you have more questions! I'm hoping we can develop more tutorials soon. :-)

smoggy · Nov 20, 2020

dobbins said:
Hope that helps, and let me know if you have more questions! I'm hoping we can develop more tutorials soon. :-)

@dobbins Thanks. I read the User Guide and figured out how to use the "name_ul_***" files to change variables.

Regarding the output data in the Jupyter tutorial, I see that you've plotted temperature using QuickView('quickstart_case', 'T') and I see a map of Earth with what I'm assume is the temperature plotted. Looking at the definition of the QuickView function, I see "supported_4D = ( 'T', 'U', 'V', )". However 'U' and 'V' produce errors when I try them in the QuickView function.

What are U and V? Why don't they work with QuickView?
What other output variables exist for this QPC4 compset besides temperature? How can I find out the output variables for any compset?
How can I export these output files (*.nc) out of the Docker image and onto my real computing environment?

dobbins · Dec 17, 2020

smoggy said:
@dobbins Thanks. I read the User Guide and figured out how to use the "name_ul_***" files to change variables.

Regarding the output data in the Jupyter tutorial, I see that you've plotted temperature using QuickView('quickstart_case', 'T') and I see a map of Earth with what I'm assume is the temperature plotted. Looking at the definition of the QuickView function, I see "supported_4D = ( 'T', 'U', 'V', )". However 'U' and 'V' produce errors when I try them in the QuickView function.

What are U and V? Why don't they work with QuickView?

What other output variables exist for this QPC4 compset besides temperature? How can I find out the output variables for any compset?

How can I export these output files (*.nc) out of the Docker image and onto my real computing environment?

Hi @smoggy,

A thousand apologies -- I didn't see this, and was off on some other projects. I imagine you've solved your problems by now? If not, the short answer is the 'QuickView' function isn't very robust yet - it was really just a preview of the sort of thing we can do. The hope is to make it more robust before an actual release, or perhaps replace it with some standardized CESM diagnostics.

I'm surprised you're having errors with U and V, though, at least with the quick start case. That's working fine for me, provided you ran the full month (which would be needed for the 'T' variable too). U and V are the zonal and meridional winds, respectively, so they should be in that file. As for other variables, there are quite a few - you can do an 'ncdump' on the output file like this and see a lot more, including description and dimensionality:

ncdump -h ~/archive/quickstart_case/atm/hist/quickstart_case.cam.h0.0001-01.nc

Finally, to 'export' a file outside of the Docker image, you've got a few options - the easiest is simply to have mounted a directory into the image when you run it (the '-v' option to 'docker run'). If you didn't do that, you can likely still do a 'sftp' or 'rsync' to a different Linux system, since you have networking available in the container. (You can also 'save' a modified container if you didn't mount something and don't have a place to transfer things to, which adds a layer to the container .. but that's less ideal.)

Does that help? Apologies again. I don't know why I didn't get notified of an update to this thread!

Cheers,
- Brian

smoggy · Jan 12, 2021

dobbins said:
Hi @smoggy,

A thousand apologies -- I didn't see this, and was off on some other projects. I imagine you've solved your problems by now? If not, the short answer is the 'QuickView' function isn't very robust yet - it was really just a preview of the sort of thing we can do. The hope is to make it more robust before an actual release, or perhaps replace it with some standardized CESM diagnostics.

I'm surprised you're having errors with U and V, though, at least with the quick start case. That's working fine for me, provided you ran the full month (which would be needed for the 'T' variable too). U and V are the zonal and meridional winds, respectively, so they should be in that file. As for other variables, there are quite a few - you can do an 'ncdump' on the output file like this and see a lot more, including description and dimensionality:

Finally, to 'export' a file outside of the Docker image, you've got a few options - the easiest is simply to have mounted a directory into the image when you run it (the '-v' option to 'docker run'). If you didn't do that, you can likely still do a 'sftp' or 'rsync' to a different Linux system, since you have networking available in the container. (You can also 'save' a modified container if you didn't mount something and don't have a place to transfer things to, which adds a layer to the container .. but that's less ideal.)

Does that help? Apologies again. I don't know why I didn't get notified of an update to this thread!

Cheers,
- Brian

@dobbins Thanks, excuse my late reply - holidays and all that.

Regarding exporting, I used:
docker cp NAME:/home/user/archive/quickstart_case/atm/hist ./Documents
where NAME is the name of the currently-running Docker container

I saw on the spreadsheet that fully-coupled b1850c4_tutorial should be available for this Docker setup. However, it fails before the first output data is saved. I look in the logfile and I see:

"Program received signal SIGBUS: Access to an undefined portion of a memory object."

Do you know what this means or how to solve this error?

Regarding the ncdump of the quickcase QPC4 compset, I saw the long description of variables as you stated. 2 questions:

It appears that elevation is measured in pressure instead of meters?
I don't see a variable for sealevel. How do I calculate this? Or is this not available because this is an atmosphere-only compset?

bill_paxton · Oct 18, 2021

dobbins said:
In addition, we've added a Jupyter Lab environment to the image ("CESM-Lab"), giving users a choice of interfaces - shell, or Jupyter notebooks. Since everything is preconfigured, we're also able to provide tutorials on using CESM - the CESM-Lab image comes with a Quick Start notebook walking new users through the main steps of creating a case, configuring it, building it and running it. We plan on adding more tutorials in the future, covering much more, including analysis and visualization of results via Jupyter Notebooks.

I'm trying to get this working in Docker on my Mac (macOS Big Sur 11.6, Apple M1 chip) but have run into a problem that is perhaps related to the new chip. Here's the terminal output - things just hang after the "No web browser found" message. The WARNING about non-matching platform is a concern.

Thanks for any help.
Cheers,
Bill

NEW MAC /Users/bpaxton: docker run -it --rm -v /Users/bpaxton/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[I 19:31:58.381 LabApp] [nb_conda_kernels] enabled, 1 kernels found
[I 19:31:58.432 LabApp] Writing notebook server cookie secret to /home/user/.local/share/jupyter/runtime/notebook_cookie_secret
[W 19:32:00.491 LabApp] All authentication is disabled. Anyone who can connect to this server will be able to run code.
[I 19:32:07.581 LabApp] JupyterLab extension loaded from /srv/conda/envs/default/lib/python3.7/site-packages/jupyterlab
[I 19:32:07.584 LabApp] JupyterLab application directory is /srv/conda/envs/default/share/jupyter/lab
[I 19:32:07.612 LabApp] Serving notebooks from local directory: /home/user
[I 19:32:07.612 LabApp] Jupyter Notebook 6.1.4 is running at:
[I 19:32:07.612 LabApp] http://b4ed2e248006:8888/
[I 19:32:07.612 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 19:32:07.637 LabApp] No web browser found: could not locate runnable browser.

dobbins · Oct 18, 2021

I'm trying to get this working in Docker on my Mac (macOS Big Sur 11.6, Apple M1 chip) but have run into a problem that is perhaps related to the new chip. Here's the terminal output - things just hang after the "No web browser found" message. The WARNING about non-matching platform is a concern.

Hi Bill,

There's two things going on here:

1) Yes, the Apple M1 is a different architecture - a fully different instruction set, in fact, though Googling says it emulates x86, and that's why (to me) it makes sense that it prints out all those "LabApp" lines. It seems like it's able to run, but I haven't verified this on an M1 system yet! I'll be interested in hearing how it goes; worst case, we'll build a native M1 version.

2) The 'hanging' after the 'no web browser found' is actually fine - basically, that "no browser found" message is because we're running Jupyter inside the container, where a browser doesn't exist. But if you launch a browser locally on your system, and point it towards http://127.0.0.1:8888, you should get the Jupyter screen. In newer versions (not yet pushed to DockerHub), we override this default message with a clearer one. I'll try to push those changes this week.

Hope that helps, and let me know how it goes!
- Brian

bill_paxton · Oct 19, 2021

dobbins said:
Hi Bill,

There's two things going on here:

1) Yes, the Apple M1 is a different architecture - a fully different instruction set, in fact, though Googling says it emulates x86, and that's why (to me) it makes sense that it prints out all those "LabApp" lines. It seems like it's able to run, but I haven't verified this on an M1 system yet! I'll be interested in hearing how it goes; worst case, we'll build a native M1 version.

2) The 'hanging' after the 'no web browser found' is actually fine - basically, that "no browser found" message is because we're running Jupyter inside the container, where a browser doesn't exist. But if you launch a browser locally on your system, and point it towards http://127.0.0.1:8888, you should get the Jupyter screen. In newer versions (not yet pushed to DockerHub), we override this default message with a clearer one. I'll try to push those changes this week.

Hope that helps, and let me know how it goes!
- Brian

Hi Brian,

Progress! It runs in Jupyter and gets to "Running your case". But instead of taking 1-3 minutes, it is still running after an hour with no sign of progress. The Activity Monitor reports that it is fully using 4 cores, so something is wrong.

This seems to be similar to the problem reported on stackoverflow recently:

Docker on Mac M1 gives: "The requested image's platform (linux/amd64) does not match the detected host platform"

I want to run a docker container for Ganache on my MacBook M1, but get the following error: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) a...

stackoverflow.com

The "solution" suggested was to do a build for the new architecture:
docker build --platform linux/arm64

If you can do that, I'll be happy to give it a try. Or perhaps there is another option?

Thanks,
Bill

bill_paxton · Oct 29, 2021

Hi Brian,

Any news about building a native M1 version? Is it "work in progress" yet?

Thanks,
Bill

Maggie Xia · Jan 19, 2022

dobbins said:
OK, let's try two more things first just to try to understand why this might be happening.

1) On your personal system, do:

echo $HOME
ls ${HOME}/cesmlab

This is to check if Ubuntu isn't giving you a $HOME variable, which would mean you're trying to mount a root-level thing (/cesmlab), which could certainly be problematic.

2) If that does point to where it should (for example, $HOME is something like /home/smoggy), then do this:

docker run -it --rm --entrypoint=/bin/bash -v ${HOME}/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2

This should put you in a bash shell in the container. From there, share the output of:

ls -al /home/user
id

This is to see what's seen from inside the container, and what user ID you have in it. After running those, just type 'exit' to get out.

Thanks!
- Brian

Hi dobbins,

I am having the same permission error as smoggy. I followed your steps here, however, my outputs are:

bash-4.4$ ls -al /home/user
ls: cannot open directory '/home/user': Permission denied

bash-4.4$ id
uid=1000(user) gid=1000(escomp) groups=1000(escomp)

How can I solve the permission error?

dobbins · Feb 7, 2022

Maggie Xia said:
Hi dobbins,

I am having the same permission error as smoggy. I followed your steps here, however, my outputs are:

bash-4.4$ ls -al /home/user
ls: cannot open directory '/home/user': Permission denied

bash-4.4$ id
uid=1000(user) gid=1000(escomp) groups=1000(escomp)

How can I solve the permission error?

Hi Maggie,

My apologies for the late reply - I'm guessing you're running on Linux? This issue seems to not happen on Mac/Windows, and some Linux distros, but does occur on others like Ubuntu due to how Linux maps user IDs, since we've created one in the Docker config. Can you tell me what OS you're running? There are a few potential fixes, but I want to try it out with the same environment you're using to be sure.

Thanks,
- Brian

dobbins · Feb 7, 2022

bill_paxton said:
Hi Brian,

Any news about building a native M1 version? Is it "work in progress" yet?

Thanks,
Bill

Hi Bill,

Yikes, this got left behind - my apologies. Let me see if I can build one via the AWS M1 instances and check. Unfortunately, I don't have a Mac M1 system myself, so testing is limited, but I'll try to run it this week and send you an update if you're still interested.

- Brian

Maggie Xia · Feb 7, 2022

dobbins said:
Hi Maggie,

My apologies for the late reply - I'm guessing you're running on Linux? This issue seems to not happen on Mac/Windows, and some Linux distros, but does occur on others like Ubuntu due to how Linux maps user IDs, since we've created one in the Docker config. Can you tell me what OS you're running? There are a few potential fixes, but I want to try it out with the same environment you're using to be sure.

Thanks,
- Brian

Thanks for the reply! Yes, I am running on Linux. I have solved my problem by mounting into '/home/user/inputdata' in the container instead of '/home/user', and it all works fine now.
One more question here, I tried 'su root' inside the container, but it asks for a password. Do you have any idea what happened here? Thanks in advance!

dobbins · Feb 7, 2022

Maggie Xia said:
Thanks for the reply! Yes, I am running on Linux. I have solved my problem by mounting into '/home/user/inputdata' in the container instead of '/home/user', and it all works fine now.
One more question here, I tried 'su root' inside the container, but it asks for a password. Do you have any idea what happened here? Thanks in advance!

Glad you've got that going, even if that's somewhat puzzling to me -- I'll look into it. Can you share which version of Linux you're running and which version of the container?

As for 'su', just do 'sudo /bin/bash' instead - the default user in the container is already added to the 'sudoers' file, so it doesn't need a password.

- Brian

Maggie Xia · Feb 12, 2022

dobbins said:
Can you share which version of Linux you're running and which version of the container

I am running Linux version 3.10.0-957.10.1.el7.x86_64 and Docker version 1.13.1, build 7d71120/1.13.1.

fostercarly · Oct 30, 2022

bill_paxton said:
[W 19:32:07.637 LabApp] No web browser found: could not locate runnable browser.

Webbrowser is part of the python standard library, you don't have to install a separate package to use it because it comes bundled with your python installation. If you want to get recognized browsers on your system:

import webbrowser
print webbrowser._browsers

If you directly use webbrowser.open() - it will always open the link in the default browser. What you can do is to register the any other browser and then launch a new tab. Something like this:

webbrowser.register(name, constructor, instance=None)

Once a python browser type is registered, the get() function can return a controller for that browser type. You can run open, open_new and open_new_tab on the controller object. This will ensure the commands are executed on the same browser instance you opened.

import webbrowser
url='https://www.google.com'
chrome_path="C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"
webbrowser.register('chrome', None,webbrowser.BackgroundBrowser(chrome_path),1)
webbrowser.get('chrome').open_new_tab(url)

taoliu_tech · Nov 8, 2022

I got the same "permission denied" error in Linux.
System: Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-130-generic x86_64)
The neon demo stores all the data under /home/user. Therefore mounting to /home/user/inputdata does not allow the access to the simulation data from local machine.

Screen Shot 2022-11-08 at 12.08.24 PM.png

feellikeclimateresearch · Jan 27, 2024

dobbins said:
Hi everyone,

One of the challenges a lot of new CESM users encounter is that of installing the software to begin with - so we've been working on a 'containerized' CESM environment to help people get up and running, and learning CESM, more easily. Containers offer portable environments - so this works on Macs, Linux and even Windows systems, and runs on anything from personal laptops to high-end servers. Best of all, it requires zero configuration. You can download the image, run it, and have a working CESM environment immediately.

In addition, we've added a Jupyter Lab environment to the image ("CESM-Lab"), giving users a choice of interfaces - shell, or Jupyter notebooks. Since everything is preconfigured, we're also able to provide tutorials on using CESM - the CESM-Lab image comes with a Quick Start notebook walking new users through the main steps of creating a case, configuring it, building it and running it. We plan on adding more tutorials in the future, covering much more, including analysis and visualization of results via Jupyter Notebooks.

A Google Doc with instructions on downloading and installing 'CESM-Lab-2.2' is available here:

Instructions for using CESM-Lab (Container)

Here are the instructions to run the CESM+Jupyter ('CESM-Lab') container - it's just a few easy steps. There's a preliminary FAQ afterwards for common questions, too: 1) Download and install Docker ( https://www.docker.com/products/docker-desktop ) 2) Run 'docker pull escomp/cesm-lab-2.2...

docs.google.com

And a spreadsheet of tested compsets, their RAM requirements, input data sizes and performance on both a 4-core laptop and a 36-core server is viewable here. We're adding more to this regularly too:

Containerized CESM - Tested Configs

docs.google.com

Note that this comes with a full release of CESM 2.2; it's not limited in any way, other than your system resources. RAM limits what compsets and grid sizes you can use, and the type and count of processors will determine the performance of those runs. Some things, like 'simpler model' atmosphere runs or single-point land runs, work fine even with limited resources, whereas fully-coupled 1-degree runs would require a workstation or server with lots of memory.

Finally, this is a technology preview of our container efforts; we're aiming for an official release in the near future. In the meantime, we wanted to get it out to the community, get feedback on how it works for you, and start working on a variety of improvements and additional tutorial material, too!

Thanks, and please ask if you have any questions!
- Brian

Hello,
I was wondering if it would be possible/how to change a certain element (solar constant) in the CIME config within the containerized version. Sorry if the answer to this question is obvious, I’m new to CESM, CIME, and Linux in general, so any help is greatly appreciated.

Containerized CESM for laptops/workstations (Windows/Mac/Linux)

Brian Dobbins

CSEG and Liaisons

smogger

New Member

Brian Dobbins

CSEG and Liaisons

smogger

New Member

Brian Dobbins

CSEG and Liaisons

smogger

New Member

William H Paxton

New Member

Brian Dobbins

CSEG and Liaisons

William H Paxton

New Member

William H Paxton

New Member

Maggie Xia

New Member

Brian Dobbins

CSEG and Liaisons

Brian Dobbins

CSEG and Liaisons

Maggie Xia

New Member

Brian Dobbins

CSEG and Liaisons

Maggie Xia

New Member

fostercarly

New Member

Tao Liu

Member

Felix

New Member