Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Containerized CESM for laptops/workstations (Windows/Mac/Linux)

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Have you been doing this primarily with a Dockerfile to build the container or some other method? If the former, maybe we can get it under version control for easier collaboration? I was originally planning on doing a from-scratch Singularity recipe file to build my container, and then I just happened to stumble upon this at an opportune time :)

We've got a Github repo here, and collaboration sounds great:
ESCOMP/ESCOMP-Containers

Note that right now, you'll see the CESM directory has quite a few changes from a 'base' CESM install - this is because we wanted to get something out to experiment with, and CESM 2.2 was going through the release process. In the near future, most of those changes -generally focused on providing a 'container' machine config- will be included in CIME, and thus be unnecessary. In short, it'll get a lot less ugly soon, but still, that repo should let you play around.

The plan is to have Singularity recipes in there at some point, too.

We also have the start of a tutorial repo (for Jupyter Notebooks), but that's probably of less interest to you, I imagine.

What kind of passthrough? I have experience using GPU passthrough on Singularity for Nvidia GPUs (tensorflow/deep learning), which typically works out of the box with the --nv flag (although the user might need to be in a "video" group for it to work). If memory serves, that worked on the RHEL7 kernel (3.10), but I haven't tested on an earlier kernel. I don't have experience with radeon or other ASIC passthrough, so I'd be less helpful there. The Singularity team does appear to be relatively responsive though, so if you pass along those details I don't mind looking into it.

Ah, I meant pass-through to the underlying network - the MPICH ABI Compatibility Initiative lets you replace one compatible MPI implementation with another at runtime, and Cray machines, for example, can do this automagically with Shifter to use the native CrayMPI runtime. I think when I saw this, they were using Singularity containers, but Shifter is the key thing. So, for example, I can compile a CESM case in a container (with MPICH, no knowledge of a high-speed network like a Cray Aries or even Infiniband), and run it on that Cray, and have the host-level network used. Here's a neat paper showing some of this from Blue Waters:

Container solutions for HPC Systems: A Case Study of Using Shifter on Blue Waters

On Cheyenne (our system), we have SGI MPT, and there's a compatibility mode that also allows for this.. but required a little bit of work to get going, unlike Shifter's automatic use of it. I haven't looked into this in a while, but it's a really great thing for HPC-like use. And on the 'to do' list to check up on again.

Cheers,
- Brian
 

smoggy

smogger
New Member
@dobbins How can I look at/edit the CESM code that's inside the Docker image that is being used in the Jupyter notebook? Are the files accessible?
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
@dobbins How can I look at/edit the CESM code that's inside the Docker image that is being used in the Jupyter notebook? Are the files accessible?

Good question - if you just want to look, you can change directory to $CESMROOT ( /opt/ncar/cesm2 ) and poke around. By default, we make them owned by 'root', so you can't easily overwrite them. But you can always copy that whole tree to your home directory, too, to have an editable version:

cp -r /opt/ncar/cesm2 ~/cesm2

(And you can also just 'sudo' to root, without a password, inside the container!)

If you're looking to make source-code changes to a case you're building, you generally identify the file you want changed in the main source tree ( in /opt/ncar/cesm2/components/cam, for example ), then put that file in your case's SourceMods/src.cam directory, again using CAM as the example component. We're working on having tutorials on this sort of thing in the future built into the container, but in the meantime, here's the (non-container) tutorial slides / videos that might help, too:


Hope that helps, and let me know if you have more questions! I'm hoping we can develop more tutorials soon. :-)
 

smoggy

smogger
New Member
Hope that helps, and let me know if you have more questions! I'm hoping we can develop more tutorials soon. :-)

@dobbins Thanks. I read the User Guide and figured out how to use the "name_ul_***" files to change variables.

Regarding the output data in the Jupyter tutorial, I see that you've plotted temperature using QuickView('quickstart_case', 'T') and I see a map of Earth with what I'm assume is the temperature plotted. Looking at the definition of the QuickView function, I see "supported_4D = ( 'T', 'U', 'V', )". However 'U' and 'V' produce errors when I try them in the QuickView function.

  1. What are U and V? Why don't they work with QuickView?
  2. What other output variables exist for this QPC4 compset besides temperature? How can I find out the output variables for any compset?
  3. How can I export these output files (*.nc) out of the Docker image and onto my real computing environment?
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
@dobbins Thanks. I read the User Guide and figured out how to use the "name_ul_***" files to change variables.

Regarding the output data in the Jupyter tutorial, I see that you've plotted temperature using QuickView('quickstart_case', 'T') and I see a map of Earth with what I'm assume is the temperature plotted. Looking at the definition of the QuickView function, I see "supported_4D = ( 'T', 'U', 'V', )". However 'U' and 'V' produce errors when I try them in the QuickView function.

  1. What are U and V? Why don't they work with QuickView?
  2. What other output variables exist for this QPC4 compset besides temperature? How can I find out the output variables for any compset?
  3. How can I export these output files (*.nc) out of the Docker image and onto my real computing environment?

Hi @smoggy,

A thousand apologies -- I didn't see this, and was off on some other projects. I imagine you've solved your problems by now? If not, the short answer is the 'QuickView' function isn't very robust yet - it was really just a preview of the sort of thing we can do. The hope is to make it more robust before an actual release, or perhaps replace it with some standardized CESM diagnostics.

I'm surprised you're having errors with U and V, though, at least with the quick start case. That's working fine for me, provided you ran the full month (which would be needed for the 'T' variable too). U and V are the zonal and meridional winds, respectively, so they should be in that file. As for other variables, there are quite a few - you can do an 'ncdump' on the output file like this and see a lot more, including description and dimensionality:

ncdump -h ~/archive/quickstart_case/atm/hist/quickstart_case.cam.h0.0001-01.nc

Finally, to 'export' a file outside of the Docker image, you've got a few options - the easiest is simply to have mounted a directory into the image when you run it (the '-v' option to 'docker run'). If you didn't do that, you can likely still do a 'sftp' or 'rsync' to a different Linux system, since you have networking available in the container. (You can also 'save' a modified container if you didn't mount something and don't have a place to transfer things to, which adds a layer to the container .. but that's less ideal.)

Does that help? Apologies again. I don't know why I didn't get notified of an update to this thread!

Cheers,
- Brian
 

smoggy

smogger
New Member
Hi @smoggy,

A thousand apologies -- I didn't see this, and was off on some other projects. I imagine you've solved your problems by now? If not, the short answer is the 'QuickView' function isn't very robust yet - it was really just a preview of the sort of thing we can do. The hope is to make it more robust before an actual release, or perhaps replace it with some standardized CESM diagnostics.

I'm surprised you're having errors with U and V, though, at least with the quick start case. That's working fine for me, provided you ran the full month (which would be needed for the 'T' variable too). U and V are the zonal and meridional winds, respectively, so they should be in that file. As for other variables, there are quite a few - you can do an 'ncdump' on the output file like this and see a lot more, including description and dimensionality:



Finally, to 'export' a file outside of the Docker image, you've got a few options - the easiest is simply to have mounted a directory into the image when you run it (the '-v' option to 'docker run'). If you didn't do that, you can likely still do a 'sftp' or 'rsync' to a different Linux system, since you have networking available in the container. (You can also 'save' a modified container if you didn't mount something and don't have a place to transfer things to, which adds a layer to the container .. but that's less ideal.)

Does that help? Apologies again. I don't know why I didn't get notified of an update to this thread!

Cheers,
- Brian
@dobbins Thanks, excuse my late reply - holidays and all that.

Regarding exporting, I used:
docker cp NAME:/home/user/archive/quickstart_case/atm/hist ./Documents
where NAME is the name of the currently-running Docker container

I saw on the spreadsheet that fully-coupled b1850c4_tutorial should be available for this Docker setup. However, it fails before the first output data is saved. I look in the logfile and I see:

"Program received signal SIGBUS: Access to an undefined portion of a memory object."

Do you know what this means or how to solve this error?

Regarding the ncdump of the quickcase QPC4 compset, I saw the long description of variables as you stated. 2 questions:
  1. It appears that elevation is measured in pressure instead of meters?
  2. I don't see a variable for sealevel. How do I calculate this? Or is this not available because this is an atmosphere-only compset?
 

bill_paxton

William H Paxton
New Member
In addition, we've added a Jupyter Lab environment to the image ("CESM-Lab"), giving users a choice of interfaces - shell, or Jupyter notebooks. Since everything is preconfigured, we're also able to provide tutorials on using CESM - the CESM-Lab image comes with a Quick Start notebook walking new users through the main steps of creating a case, configuring it, building it and running it. We plan on adding more tutorials in the future, covering much more, including analysis and visualization of results via Jupyter Notebooks.
I'm trying to get this working in Docker on my Mac (macOS Big Sur 11.6, Apple M1 chip) but have run into a problem that is perhaps related to the new chip. Here's the terminal output - things just hang after the "No web browser found" message. The WARNING about non-matching platform is a concern.

Thanks for any help.
Cheers,
Bill

NEW MAC /Users/bpaxton: docker run -it --rm -v /Users/bpaxton/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[I 19:31:58.381 LabApp] [nb_conda_kernels] enabled, 1 kernels found

[I 19:31:58.432 LabApp] Writing notebook server cookie secret to /home/user/.local/share/jupyter/runtime/notebook_cookie_secret
[W 19:32:00.491 LabApp] All authentication is disabled. Anyone who can connect to this server will be able to run code.
[I 19:32:07.581 LabApp] JupyterLab extension loaded from /srv/conda/envs/default/lib/python3.7/site-packages/jupyterlab
[I 19:32:07.584 LabApp] JupyterLab application directory is /srv/conda/envs/default/share/jupyter/lab
[I 19:32:07.612 LabApp] Serving notebooks from local directory: /home/user
[I 19:32:07.612 LabApp] Jupyter Notebook 6.1.4 is running at:
[I 19:32:07.612 LabApp] http://b4ed2e248006:8888/
[I 19:32:07.612 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 19:32:07.637 LabApp] No web browser found: could not locate runnable browser.
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
I'm trying to get this working in Docker on my Mac (macOS Big Sur 11.6, Apple M1 chip) but have run into a problem that is perhaps related to the new chip. Here's the terminal output - things just hang after the "No web browser found" message. The WARNING about non-matching platform is a concern.

Hi Bill,

There's two things going on here:

1) Yes, the Apple M1 is a different architecture - a fully different instruction set, in fact, though Googling says it emulates x86, and that's why (to me) it makes sense that it prints out all those "LabApp" lines. It seems like it's able to run, but I haven't verified this on an M1 system yet! I'll be interested in hearing how it goes; worst case, we'll build a native M1 version.

2) The 'hanging' after the 'no web browser found' is actually fine - basically, that "no browser found" message is because we're running Jupyter inside the container, where a browser doesn't exist. But if you launch a browser locally on your system, and point it towards http://127.0.0.1:8888, you should get the Jupyter screen. In newer versions (not yet pushed to DockerHub), we override this default message with a clearer one. I'll try to push those changes this week.

Hope that helps, and let me know how it goes!
- Brian
 

bill_paxton

William H Paxton
New Member
Hi Bill,

There's two things going on here:

1) Yes, the Apple M1 is a different architecture - a fully different instruction set, in fact, though Googling says it emulates x86, and that's why (to me) it makes sense that it prints out all those "LabApp" lines. It seems like it's able to run, but I haven't verified this on an M1 system yet! I'll be interested in hearing how it goes; worst case, we'll build a native M1 version.

2) The 'hanging' after the 'no web browser found' is actually fine - basically, that "no browser found" message is because we're running Jupyter inside the container, where a browser doesn't exist. But if you launch a browser locally on your system, and point it towards http://127.0.0.1:8888, you should get the Jupyter screen. In newer versions (not yet pushed to DockerHub), we override this default message with a clearer one. I'll try to push those changes this week.

Hope that helps, and let me know how it goes!
- Brian

Hi Brian,

Progress! It runs in Jupyter and gets to "Running your case". But instead of taking 1-3 minutes, it is still running after an hour with no sign of progress. The Activity Monitor reports that it is fully using 4 cores, so something is wrong.

This seems to be similar to the problem reported on stackoverflow recently:

The "solution" suggested was to do a build for the new architecture:
docker build --platform linux/arm64

If you can do that, I'll be happy to give it a try. Or perhaps there is another option?

Thanks,
Bill
 

Maggie Xia

Maggie Xia
New Member
OK, let's try two more things first just to try to understand why this might be happening.

1) On your personal system, do:

echo $HOME
ls ${HOME}/cesmlab


This is to check if Ubuntu isn't giving you a $HOME variable, which would mean you're trying to mount a root-level thing (/cesmlab), which could certainly be problematic.

2) If that does point to where it should (for example, $HOME is something like /home/smoggy), then do this:

docker run -it --rm --entrypoint=/bin/bash -v ${HOME}/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2

This should put you in a bash shell in the container. From there, share the output of:

ls -al /home/user
id


This is to see what's seen from inside the container, and what user ID you have in it. After running those, just type 'exit' to get out.

Thanks!
- Brian
Hi dobbins,

I am having the same permission error as smoggy. I followed your steps here, however, my outputs are:

bash-4.4$ ls -al /home/user
ls: cannot open directory '/home/user': Permission denied

bash-4.4$ id
uid=1000(user) gid=1000(escomp) groups=1000(escomp)


How can I solve the permission error?
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Hi dobbins,

I am having the same permission error as smoggy. I followed your steps here, however, my outputs are:

bash-4.4$ ls -al /home/user
ls: cannot open directory '/home/user': Permission denied

bash-4.4$ id
uid=1000(user) gid=1000(escomp) groups=1000(escomp)


How can I solve the permission error?

Hi Maggie,

My apologies for the late reply - I'm guessing you're running on Linux? This issue seems to not happen on Mac/Windows, and some Linux distros, but does occur on others like Ubuntu due to how Linux maps user IDs, since we've created one in the Docker config. Can you tell me what OS you're running? There are a few potential fixes, but I want to try it out with the same environment you're using to be sure.

Thanks,
- Brian
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Hi Brian,

Any news about building a native M1 version? Is it "work in progress" yet?

Thanks,
Bill

Hi Bill,

Yikes, this got left behind - my apologies. Let me see if I can build one via the AWS M1 instances and check. Unfortunately, I don't have a Mac M1 system myself, so testing is limited, but I'll try to run it this week and send you an update if you're still interested.

- Brian
 

Maggie Xia

Maggie Xia
New Member
Hi Maggie,

My apologies for the late reply - I'm guessing you're running on Linux? This issue seems to not happen on Mac/Windows, and some Linux distros, but does occur on others like Ubuntu due to how Linux maps user IDs, since we've created one in the Docker config. Can you tell me what OS you're running? There are a few potential fixes, but I want to try it out with the same environment you're using to be sure.

Thanks,
- Brian
Thanks for the reply! Yes, I am running on Linux. I have solved my problem by mounting into '/home/user/inputdata' in the container instead of '/home/user', and it all works fine now.
One more question here, I tried 'su root' inside the container, but it asks for a password. Do you have any idea what happened here? Thanks in advance!
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Thanks for the reply! Yes, I am running on Linux. I have solved my problem by mounting into '/home/user/inputdata' in the container instead of '/home/user', and it all works fine now.
One more question here, I tried 'su root' inside the container, but it asks for a password. Do you have any idea what happened here? Thanks in advance!

Glad you've got that going, even if that's somewhat puzzling to me -- I'll look into it. Can you share which version of Linux you're running and which version of the container?

As for 'su', just do 'sudo /bin/bash' instead - the default user in the container is already added to the 'sudoers' file, so it doesn't need a password.

- Brian
 

fostercarly

fostercarly
New Member
[W 19:32:07.637 LabApp] No web browser found: could not locate runnable browser.
Webbrowser is part of the python standard library, you don't have to install a separate package to use it because it comes bundled with your python installation. If you want to get recognized browsers on your system:

import webbrowser
print webbrowser._browsers

If you directly use webbrowser.open() - it will always open the link in the default browser. What you can do is to register the any other browser and then launch a new tab. Something like this:

webbrowser.register(name, constructor, instance=None)

Once a python browser type is registered, the get() function can return a controller for that browser type. You can run open, open_new and open_new_tab on the controller object. This will ensure the commands are executed on the same browser instance you opened.

import webbrowser
url='https://www.google.com'
chrome_path="C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"
webbrowser.register('chrome', None,webbrowser.BackgroundBrowser(chrome_path),1)
webbrowser.get('chrome').open_new_tab(url)
 

taoliu_tech

Tao Liu
Member
I got the same "permission denied" error in Linux.
System: Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-130-generic x86_64)
The neon demo stores all the data under /home/user. Therefore mounting to /home/user/inputdata does not allow the access to the simulation data from local machine.
Screen Shot 2022-11-08 at 12.08.24 PM.png
 

feellikeclimateresearch

Felix
New Member
Hi everyone,

One of the challenges a lot of new CESM users encounter is that of installing the software to begin with - so we've been working on a 'containerized' CESM environment to help people get up and running, and learning CESM, more easily. Containers offer portable environments - so this works on Macs, Linux and even Windows systems, and runs on anything from personal laptops to high-end servers. Best of all, it requires zero configuration. You can download the image, run it, and have a working CESM environment immediately.

In addition, we've added a Jupyter Lab environment to the image ("CESM-Lab"), giving users a choice of interfaces - shell, or Jupyter notebooks. Since everything is preconfigured, we're also able to provide tutorials on using CESM - the CESM-Lab image comes with a Quick Start notebook walking new users through the main steps of creating a case, configuring it, building it and running it. We plan on adding more tutorials in the future, covering much more, including analysis and visualization of results via Jupyter Notebooks.

A Google Doc with instructions on downloading and installing 'CESM-Lab-2.2' is available here:

And a spreadsheet of tested compsets, their RAM requirements, input data sizes and performance on both a 4-core laptop and a 36-core server is viewable here. We're adding more to this regularly too:

Note that this comes with a full release of CESM 2.2; it's not limited in any way, other than your system resources. RAM limits what compsets and grid sizes you can use, and the type and count of processors will determine the performance of those runs. Some things, like 'simpler model' atmosphere runs or single-point land runs, work fine even with limited resources, whereas fully-coupled 1-degree runs would require a workstation or server with lots of memory.

Finally, this is a technology preview of our container efforts; we're aiming for an official release in the near future. In the meantime, we wanted to get it out to the community, get feedback on how it works for you, and start working on a variety of improvements and additional tutorial material, too!

Thanks, and please ask if you have any questions!
- Brian
Hello,
I was wondering if it would be possible/how to change a certain element (solar constant) in the CIME config within the containerized version. Sorry if the answer to this question is obvious, I’m new to CESM, CIME, and Linux in general, so any help is greatly appreciated.
 
Top