Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Running CESM2.1.5 on Stampede 3

xiangli

Xiang Li
Member
Hi Jim,

As I understood from the forum and our system administrator, the Duke Compute Cluster, which is a shared cluster using virtualized, hyper-threaded CPUs, is not an ideal platform for the CESM MPI code. Specifically, as I tested B1850 cases on the Duke Cluster with various CPU configurations, the maximum speed could only be 2.4 simulated year per day, which is far below our expectation.

Now I'm trying running CESM2.1.5 on the TACC Stampede 3. Is this version fully supported on Stampede 3? If so, does it mean that there have already been a full set of prerequisite softwares for running the model on the Stampede 3?

Thanks,
Xiang
 

jedwards

CSEG and Liaisons
Staff member
I have not yet done a port to stampede3, one of the newest systems at TACC. However I may get an opportunity to work on it later this week.
 

xiangli

Xiang Li
Member
I have not yet done a port to stampede3, one of the newest systems at TACC. However I may get an opportunity to work on it later this week.
Hi Jim,

Do we have an update on the porting to Stampede 3? No hurry - just would like to see how it goes.

Thanks,
Xiang
 

jedwards

CSEG and Liaisons
Staff member
If you want to use the old skx nodes I have completed that port and can give you an update, if you were hoping to use the
new nodes they haven't been made available yet.
 

xiangli

Xiang Li
Member
If you want to use the old skx nodes I have completed that port and can give you an update, if you were hoping to use the
new nodes they haven't been made available yet.
Hi Jim,

It looks like I can use skx nodes. Here is what I see after logging:

1711400746752.png

1711400770016.png

Could you please provide me with some information on running CESM2.1 on the skx nodes?

Thanks,
Xiang
 

jedwards

CSEG and Liaisons
Staff member
I have completed a partial port to stampede3 skx nodes. There is an unresolved issue in the cam configure script that I am still working on.
You can get what I have with
Code:
cd cesm/cime
git remote add jpe https://github.com/jedwards4b/cime
git fetch jpe
git checkout  port/maint-5.6/stampede3
 

xiangli

Xiang Li
Member
I have completed a partial port to stampede3 skx nodes. There is an unresolved issue in the cam configure script that I am still working on.
You can get what I have with
Code:
cd cesm/cime
git remote add jpe https://github.com/jedwards4b/cime
git fetch jpe
git checkout  port/maint-5.6/stampede3
Hi Jim,

I could successfully git your port!

Is this the issue you mentioned?

1711570844008.png

Thanks,
Xiang
 

jedwards

CSEG and Liaisons
Staff member
No, that's something different. But you must have modified the DIN_LOC_ROOT - try setting that variable back to my directory in config_machines.xml
 

xiangli

Xiang Li
Member
No, that's something different. But you must have modified the DIN_LOC_ROOT - try setting that variable back to my directory in config_machines.xml
Hi Jim,

I set this back as you can see in the screenshot, but the error was still there.

1711573755200.png

Thanks,
Xiang
 

xiangli

Xiang Li
Member
This is what I am still waiting for tacc to fix. ticket is still in the queue
Hi Jim,

It looks like that the issue with ./check_case has been resolved. But I got an error after ./case.build.

1713189905223.png

1713189961585.png

1713190031530.png

Do you have any suggestions on the error?

Thanks,
Xiang
 

jedwards

CSEG and Liaisons
Staff member
The error refers you to a file - config.log in the mct directory. Did you examine that file? I am working on stampede3 today,
I'll let you know when a port is ready.
 
Top