Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

cesm run

khj

Parisa
Member
Dear all;
I had trouble running the model. I want to run the model through Docker. Unfortunately it does not download input data at all. I will send you the screenshots of the model error. Thank you for your help.
yours faithfully,

Parisa
 

Attachments

  • Capture1.PNG
    Capture1.PNG
    59.9 KB · Views: 18
  • Capture2.PNG
    Capture2.PNG
    66.3 KB · Views: 10
  • Capture3.PNG
    Capture3.PNG
    74.2 KB · Views: 9
  • Capture.PNG
    Capture.PNG
    10.2 KB · Views: 12

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Hi Parisa,

It's unclear from these whether the issue is at the NCAR servers for data or your own system -- from in the container, can you try this command:

ping 8.8.8.8

Basically, if you get a reply (eg, '64 bytes from 8.8.8.8...'), then it means you're at least able to reach the Google servers. If not, I'm inclined to think perhaps you have a firewall that's blocking certain IPs? If it does work, can you try:

ping 128.117.124.213

That's the server it's trying to connect to. If the first one works and the second doesn't, it's likely due to an issue where various IPs have gotten blacklisted for no known reason - it even happened to my home IP. To work around this, we've set up an additional input data server with low-res data on it, but it seems maybe you're not reaching that one too?

Also, can you tell me which compset and resolution you're running so I can check that the data exists on the low-res server?

Thanks,
- Brian
 

khj

Parisa
Member
Hi Brian,
thanks for your response
I checked ping 8.8.8.8 and ping 128.117.124.213. They seem to work. But I'm ./case.submit again. I tried and got the same answer as before. I will send you the screenshots related to ping 8.8.8.8 and ping 128.117.124.213 and the function of the model.
And that I'm currently trying to get an output from the model on a trial basis. Now I tried QPC4 compset and f09_f09_mg17 resolution.

Thanks,
Parisa
 

Attachments

  • Capture3.PNG
    Capture3.PNG
    11.4 KB · Views: 7
  • Capture4.PNG
    Capture4.PNG
    52.5 KB · Views: 9

khj

Parisa
Member
Fortunately, the problem of downloading data was solved, but I encountered a new problem again. I took a screenshot of the specifications of the server I am working on. On the other hand, I took a screenshot of the model execution error, which I will send both to you.

Yours sincerely,
Parisa
 

Attachments

  • Capture7.PNG
    Capture7.PNG
    5.5 KB · Views: 9
  • Capture8.PNG
    Capture8.PNG
    53.1 KB · Views: 15

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Hi Parisa,

Just to double check on the data issue - your ping commands actually weren't working, but I assume you found and solved some other issue and it is indeed all set now?

As for the current issue, it's a bit hard to tell from this, but what it looks like is you're running on a system with 128 cores? In that case, if you're using Docker, you likely need to add an option to your run command:

--shm-size=512M. (though 256M may be sufficient)

Basically, Docker only gives the virtual machine a 64MB '/dev/shm' space, which is where MPI keeps track of process info. This default is big enough for laptops and desktops with 4-8 cores, but insufficient for large workstations or servers. Upping it to 256M or 512M may work.

If not, can you share the 'cesm.log.220208-112745' file you have in your screenshot? That might tell us more.

Finally, while you can run on such a large workstation with this particular container image, it just uses the GNU compilers. These are usually fine for experimenting and learning, but a system of your size is capable of decently large runs, and if you want the best performance, you probably want a version with the Intel compilers. Depending on the component set, resolution, and processor type, you can get up to 30-50% better performance. We haven't released a public one yet, but if this is of interest to you, let me know and I'll try to make it available soon. It's much larger, and requires even more of a 'shared memory' space per rank (you may need to go to 1G!), but it is faster.

Cheers,
- Brian
 

khj

Parisa
Member
Hi Brian,
Thank you very much for your help.
I tried to run the model yesterday. I ran the command ./xmlchange NTASKS = 4 before case.submit. The model was ran and had no error, but did not give a message that the model was successful. I'm not sure if the full model was executed or if the above command was the right thing to do or not. I will send the screenshot when the model runs out. Thank you for your help.
(On the other hand, I welcome your suggestion in the previous post. I am looking forward to the release of the version with the Intel compiler).

Thanks a lot,
Yours sincerely,
-Parisa
 

Attachments

  • image.jpeg
    image.jpeg
    194.5 KB · Views: 13

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Hi Parisa,

It looks like it ran successfully -- this shows it archiving files after a 5-day run, and with no signs of a failure in the run (ignore the 'nhfil' one). If you don't want it to archive data after a run, you can use './xmlchange DOUT_S=false' and it will just do the run itself, and not the 'archiver' part, which is all that extra stuff you see.

I think at this point you're up and running, so if you want additional information on how to use CESM, in addition to these forums there are some tutorial materials here - these are from the annual tutorial, and we're aiming to start doing some simple, short online courses in the May time-frame for teaching CESM basics, too, if that's of interest.


Hope that helps, and let me know if you have other questions. I'll get back to you on the Intel container soon; it's still on hold as I get some of our cloud stuff wrapped up first.

Cheers,
- Brian
 

khj

Parisa
Member
Dear Brian,

Thank you very much for your helpful answer and guidance.

Best regards,
Parisa
 

khj

Parisa
Member
Dear Brian,

Unfortunately, I got into trouble again. This time I ran FWscHIST compost with a resolution of f09_f09_mg17. But unfortunately I got an error. I pulled the CESM-Lab container. I will send you the details of the server I work with. I also take screenshots of model errors and send them to you.
Thank you for your help.
Best regards,
Parisa
 

Attachments

  • Capture6.PNG
    Capture6.PNG
    45.8 KB · Views: 8
  • Capture7.PNG
    Capture7.PNG
    51.8 KB · Views: 5
  • Capture8.PNG
    Capture8.PNG
    9.8 KB · Views: 8
Top