Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM2 porting to Niagara on SciNet (Part of Digital Research Alliance of Canada), no internet on compute nodes

nstant

Noah Stanton
New Member
To Whom this may concern,

I receive the following exit error (slurm batch) at about 2:02 minutes into a run:

When running an instance of the model CESM2 on niagara I run into the following error:

Traceback (most recent call last):
File "/var/spool/slurm/slurmd/job9684628/slurm_script", line 25, in <module>
from CIME.case import Case
File "/gpfs/fs0/project/n/ntandon/nstant/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/case/__init__.py", line 1, in <module>
from CIME.case.case import Case
File "/gpfs/fs0/project/n/ntandon/nstant/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/case/case.py", line 43, in <module>
class Case(object):
File "/gpfs/fs0/project/n/ntandon/nstant/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/case/case.py", line 81, in Case
from CIME.case.check_input_data import check_all_input_data, stage_refcase, check_input_data
File "/gpfs/fs0/project/n/ntandon/nstant/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 7, in <module>
import CIME.Servers
File "/gpfs/fs0/project/n/ntandon/nstant/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/Servers/__init__.py", line 3, in <module>
has_gftp = find_executable("globus-url-copy")
File "/usr/lib64/python2.7/distutils/spawn.py", line 184, in find_executable
path = os.environ['PATH']
File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__
raise KeyError(key)
KeyError: 'PATH'

The compute nodes of Niagara are not connected to the internet, so cannot run the "globus-url-copy". I have run the ./check_input_data --download for the case, and it appears that the necessary files are in the run directory. Is there something missing? Is there a workaround to prevent cesm2 from attempting to use gFTP while running on a compute node?

Sincerely,

Noah Stanton
 

fischer

CSEG and Liaisons
Staff member
Hi Noah,

The CESM2 script is checking to see if the gftp executable is available, its not trying to use it. Accord to the error message, your
environment PATH variable isn't being set on the compute nodes.

Chris
 

nstant

Noah Stanton
New Member
Hello Fischer,

Do you have any pointers to how I can ensure that the PATH variable set on the compute nodes? Is that in config_machines.xml, or should I be looking elsewhere to set this up? I have followed the same procedure for porting as I have with cedar (a HPC on the digital alliance network). I was successful with the port there, but not having the same luck here. The difference between Niagara and Cedar is that Niagara does not have access to the internet on compute nodes. That is why I assumed it didn't directly have to be caused by the PATH variable.

Thank you for your help.

Sincerely,

Noah Stanton
 

nstant

Noah Stanton
New Member
Additionally Chris,

I have added this to my config_machines.xml...

<environment_variables>
<env name="OMP_STACKSIZE">256M</env>
<env name="PATH">$ENV{PATH}</env>
</environment_variables>

The PATH variable is recognized in the ./case.submit ./preview_run phase, so I assume that it is registering the PATH variable. But I am still getting the same error.

I am currently adding the following to config_batch.xml to see if I need to pass an additional argument on in the sbatch submission...

<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
<!-- <arg flag="-p" name="$JOB_QUEUE"/> -->
<arg flag="--account" name="rrg-ntandon"/>
<arg flag="--export" name="$PATH"/>
</submit_args>

Thank you for your help.

Noah S
 

fischer

CSEG and Liaisons
Staff member
Hi Noah,

You might need to talk to your local sys admin about why the PATH isn't being set on the compute nodes. ./case.submit and ./preview_run, I'm assuming
are being run on a login node.

You could also try adding something like this to config_machines.xml to see if you get past the error.

<env name="PATH">$ENV{HOME}/bin</env>

Chris
 

nstant

Noah Stanton
New Member
Hello Chris,

I will ask them about PATH. Yes ./case.submit and ./preview_run are being run on a login node. The PATH for some reason isn't being passed onto the compute nodes.

While I wait for a response from them, I will go ahead and try modifying the config_machines.xml in the way you have described.

Thank you, I will get back to the thread once I have gone through these avenues.

Sincerely,

Noah Stanton
 

nstant

Noah Stanton
New Member
Hello Chris,

I was able to resolve this issue. There was a mismatched local library path, as well as a mismatched env name in config_machines.xml.

The solution was as follows:

<environment_variables>
<env name="OMP_STACKSIZE">256M</env>
<env name="PATH">$ENV{PATH}</env>
</environment_variables>

This fixed the PATH issues.

Thank you for your help.

Sincerely,

Noah Stanton
 
Top