Welcome to the new DiscussCESM forum!
We are still working on the website migration, so you may experience downtime during this process.

Existing users, please reset your password before logging in here: https://xenforo.cgd.ucar.edu/cesm/index.php?lost-password/

Containerized CESM for laptops/workstations (Windows/Mac/Linux)

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Have you been doing this primarily with a Dockerfile to build the container or some other method? If the former, maybe we can get it under version control for easier collaboration? I was originally planning on doing a from-scratch Singularity recipe file to build my container, and then I just happened to stumble upon this at an opportune time :)

We've got a Github repo here, and collaboration sounds great:
ESCOMP/ESCOMP-Containers

Note that right now, you'll see the CESM directory has quite a few changes from a 'base' CESM install - this is because we wanted to get something out to experiment with, and CESM 2.2 was going through the release process. In the near future, most of those changes -generally focused on providing a 'container' machine config- will be included in CIME, and thus be unnecessary. In short, it'll get a lot less ugly soon, but still, that repo should let you play around.

The plan is to have Singularity recipes in there at some point, too.

We also have the start of a tutorial repo (for Jupyter Notebooks), but that's probably of less interest to you, I imagine.

What kind of passthrough? I have experience using GPU passthrough on Singularity for Nvidia GPUs (tensorflow/deep learning), which typically works out of the box with the --nv flag (although the user might need to be in a "video" group for it to work). If memory serves, that worked on the RHEL7 kernel (3.10), but I haven't tested on an earlier kernel. I don't have experience with radeon or other ASIC passthrough, so I'd be less helpful there. The Singularity team does appear to be relatively responsive though, so if you pass along those details I don't mind looking into it.

Ah, I meant pass-through to the underlying network - the MPICH ABI Compatibility Initiative lets you replace one compatible MPI implementation with another at runtime, and Cray machines, for example, can do this automagically with Shifter to use the native CrayMPI runtime. I think when I saw this, they were using Singularity containers, but Shifter is the key thing. So, for example, I can compile a CESM case in a container (with MPICH, no knowledge of a high-speed network like a Cray Aries or even Infiniband), and run it on that Cray, and have the host-level network used. Here's a neat paper showing some of this from Blue Waters:

Container solutions for HPC Systems: A Case Study of Using Shifter on Blue Waters

On Cheyenne (our system), we have SGI MPT, and there's a compatibility mode that also allows for this.. but required a little bit of work to get going, unlike Shifter's automatic use of it. I haven't looked into this in a while, but it's a really great thing for HPC-like use. And on the 'to do' list to check up on again.

Cheers,
- Brian
 

smoggy

smogger
New Member
@dobbins How can I look at/edit the CESM code that's inside the Docker image that is being used in the Jupyter notebook? Are the files accessible?
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
@dobbins How can I look at/edit the CESM code that's inside the Docker image that is being used in the Jupyter notebook? Are the files accessible?

Good question - if you just want to look, you can change directory to $CESMROOT ( /opt/ncar/cesm2 ) and poke around. By default, we make them owned by 'root', so you can't easily overwrite them. But you can always copy that whole tree to your home directory, too, to have an editable version:

cp -r /opt/ncar/cesm2 ~/cesm2

(And you can also just 'sudo' to root, without a password, inside the container!)

If you're looking to make source-code changes to a case you're building, you generally identify the file you want changed in the main source tree ( in /opt/ncar/cesm2/components/cam, for example ), then put that file in your case's SourceMods/src.cam directory, again using CAM as the example component. We're working on having tutorials on this sort of thing in the future built into the container, but in the meantime, here's the (non-container) tutorial slides / videos that might help, too:


Hope that helps, and let me know if you have more questions! I'm hoping we can develop more tutorials soon. :-)
 

smoggy

smogger
New Member
Hope that helps, and let me know if you have more questions! I'm hoping we can develop more tutorials soon. :-)

@dobbins Thanks. I read the User Guide and figured out how to use the "name_ul_***" files to change variables.

Regarding the output data in the Jupyter tutorial, I see that you've plotted temperature using QuickView('quickstart_case', 'T') and I see a map of Earth with what I'm assume is the temperature plotted. Looking at the definition of the QuickView function, I see "supported_4D = ( 'T', 'U', 'V', )". However 'U' and 'V' produce errors when I try them in the QuickView function.

  1. What are U and V? Why don't they work with QuickView?
  2. What other output variables exist for this QPC4 compset besides temperature? How can I find out the output variables for any compset?
  3. How can I export these output files (*.nc) out of the Docker image and onto my real computing environment?
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
@dobbins Thanks. I read the User Guide and figured out how to use the "name_ul_***" files to change variables.

Regarding the output data in the Jupyter tutorial, I see that you've plotted temperature using QuickView('quickstart_case', 'T') and I see a map of Earth with what I'm assume is the temperature plotted. Looking at the definition of the QuickView function, I see "supported_4D = ( 'T', 'U', 'V', )". However 'U' and 'V' produce errors when I try them in the QuickView function.

  1. What are U and V? Why don't they work with QuickView?
  2. What other output variables exist for this QPC4 compset besides temperature? How can I find out the output variables for any compset?
  3. How can I export these output files (*.nc) out of the Docker image and onto my real computing environment?

Hi @smoggy,

A thousand apologies -- I didn't see this, and was off on some other projects. I imagine you've solved your problems by now? If not, the short answer is the 'QuickView' function isn't very robust yet - it was really just a preview of the sort of thing we can do. The hope is to make it more robust before an actual release, or perhaps replace it with some standardized CESM diagnostics.

I'm surprised you're having errors with U and V, though, at least with the quick start case. That's working fine for me, provided you ran the full month (which would be needed for the 'T' variable too). U and V are the zonal and meridional winds, respectively, so they should be in that file. As for other variables, there are quite a few - you can do an 'ncdump' on the output file like this and see a lot more, including description and dimensionality:

ncdump -h ~/archive/quickstart_case/atm/hist/quickstart_case.cam.h0.0001-01.nc

Finally, to 'export' a file outside of the Docker image, you've got a few options - the easiest is simply to have mounted a directory into the image when you run it (the '-v' option to 'docker run'). If you didn't do that, you can likely still do a 'sftp' or 'rsync' to a different Linux system, since you have networking available in the container. (You can also 'save' a modified container if you didn't mount something and don't have a place to transfer things to, which adds a layer to the container .. but that's less ideal.)

Does that help? Apologies again. I don't know why I didn't get notified of an update to this thread!

Cheers,
- Brian
 

smoggy

smogger
New Member
Hi @smoggy,

A thousand apologies -- I didn't see this, and was off on some other projects. I imagine you've solved your problems by now? If not, the short answer is the 'QuickView' function isn't very robust yet - it was really just a preview of the sort of thing we can do. The hope is to make it more robust before an actual release, or perhaps replace it with some standardized CESM diagnostics.

I'm surprised you're having errors with U and V, though, at least with the quick start case. That's working fine for me, provided you ran the full month (which would be needed for the 'T' variable too). U and V are the zonal and meridional winds, respectively, so they should be in that file. As for other variables, there are quite a few - you can do an 'ncdump' on the output file like this and see a lot more, including description and dimensionality:



Finally, to 'export' a file outside of the Docker image, you've got a few options - the easiest is simply to have mounted a directory into the image when you run it (the '-v' option to 'docker run'). If you didn't do that, you can likely still do a 'sftp' or 'rsync' to a different Linux system, since you have networking available in the container. (You can also 'save' a modified container if you didn't mount something and don't have a place to transfer things to, which adds a layer to the container .. but that's less ideal.)

Does that help? Apologies again. I don't know why I didn't get notified of an update to this thread!

Cheers,
- Brian
@dobbins Thanks, excuse my late reply - holidays and all that.

Regarding exporting, I used:
docker cp NAME:/home/user/archive/quickstart_case/atm/hist ./Documents
where NAME is the name of the currently-running Docker container

I saw on the spreadsheet that fully-coupled b1850c4_tutorial should be available for this Docker setup. However, it fails before the first output data is saved. I look in the logfile and I see:

"Program received signal SIGBUS: Access to an undefined portion of a memory object."

Do you know what this means or how to solve this error?

Regarding the ncdump of the quickcase QPC4 compset, I saw the long description of variables as you stated. 2 questions:
  1. It appears that elevation is measured in pressure instead of meters?
  2. I don't see a variable for sealevel. How do I calculate this? Or is this not available because this is an atmosphere-only compset?
 

bill_paxton

William H Paxton
New Member
In addition, we've added a Jupyter Lab environment to the image ("CESM-Lab"), giving users a choice of interfaces - shell, or Jupyter notebooks. Since everything is preconfigured, we're also able to provide tutorials on using CESM - the CESM-Lab image comes with a Quick Start notebook walking new users through the main steps of creating a case, configuring it, building it and running it. We plan on adding more tutorials in the future, covering much more, including analysis and visualization of results via Jupyter Notebooks.
I'm trying to get this working in Docker on my Mac (macOS Big Sur 11.6, Apple M1 chip) but have run into a problem that is perhaps related to the new chip. Here's the terminal output - things just hang after the "No web browser found" message. The WARNING about non-matching platform is a concern.

Thanks for any help.
Cheers,
Bill

NEW MAC /Users/bpaxton: docker run -it --rm -v /Users/bpaxton/cesmlab:/home/user -p 8888:8888 escomp/cesm-lab-2.2
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[I 19:31:58.381 LabApp] [nb_conda_kernels] enabled, 1 kernels found

[I 19:31:58.432 LabApp] Writing notebook server cookie secret to /home/user/.local/share/jupyter/runtime/notebook_cookie_secret
[W 19:32:00.491 LabApp] All authentication is disabled. Anyone who can connect to this server will be able to run code.
[I 19:32:07.581 LabApp] JupyterLab extension loaded from /srv/conda/envs/default/lib/python3.7/site-packages/jupyterlab
[I 19:32:07.584 LabApp] JupyterLab application directory is /srv/conda/envs/default/share/jupyter/lab
[I 19:32:07.612 LabApp] Serving notebooks from local directory: /home/user
[I 19:32:07.612 LabApp] Jupyter Notebook 6.1.4 is running at:
[I 19:32:07.612 LabApp] http://b4ed2e248006:8888/
[I 19:32:07.612 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 19:32:07.637 LabApp] No web browser found: could not locate runnable browser.
 

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
I'm trying to get this working in Docker on my Mac (macOS Big Sur 11.6, Apple M1 chip) but have run into a problem that is perhaps related to the new chip. Here's the terminal output - things just hang after the "No web browser found" message. The WARNING about non-matching platform is a concern.

Hi Bill,

There's two things going on here:

1) Yes, the Apple M1 is a different architecture - a fully different instruction set, in fact, though Googling says it emulates x86, and that's why (to me) it makes sense that it prints out all those "LabApp" lines. It seems like it's able to run, but I haven't verified this on an M1 system yet! I'll be interested in hearing how it goes; worst case, we'll build a native M1 version.

2) The 'hanging' after the 'no web browser found' is actually fine - basically, that "no browser found" message is because we're running Jupyter inside the container, where a browser doesn't exist. But if you launch a browser locally on your system, and point it towards http://127.0.0.1:8888, you should get the Jupyter screen. In newer versions (not yet pushed to DockerHub), we override this default message with a clearer one. I'll try to push those changes this week.

Hope that helps, and let me know how it goes!
- Brian
 

bill_paxton

William H Paxton
New Member
Hi Bill,

There's two things going on here:

1) Yes, the Apple M1 is a different architecture - a fully different instruction set, in fact, though Googling says it emulates x86, and that's why (to me) it makes sense that it prints out all those "LabApp" lines. It seems like it's able to run, but I haven't verified this on an M1 system yet! I'll be interested in hearing how it goes; worst case, we'll build a native M1 version.

2) The 'hanging' after the 'no web browser found' is actually fine - basically, that "no browser found" message is because we're running Jupyter inside the container, where a browser doesn't exist. But if you launch a browser locally on your system, and point it towards http://127.0.0.1:8888, you should get the Jupyter screen. In newer versions (not yet pushed to DockerHub), we override this default message with a clearer one. I'll try to push those changes this week.

Hope that helps, and let me know how it goes!
- Brian

Hi Brian,

Progress! It runs in Jupyter and gets to "Running your case". But instead of taking 1-3 minutes, it is still running after an hour with no sign of progress. The Activity Monitor reports that it is fully using 4 cores, so something is wrong.

This seems to be similar to the problem reported on stackoverflow recently:

The "solution" suggested was to do a build for the new architecture:
docker build --platform linux/arm64

If you can do that, I'll be happy to give it a try. Or perhaps there is another option?

Thanks,
Bill
 
Top