Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Problems while submitting the case

arti-14

Arti Jadav
New Member
Hi,
I have succesfully build the model in single column mode for the MPACE: scam_mpace configuration. But when I run the ./case.submit I get errors. I have attached the config_inputdata file for reference.
Code:
Client protocol gftp not enabled
Using protocol wget with user anonymous and passwd user@example.edu
Could not connect to repo 'ftp://ftp.cgd.ucar.edu/cesm/inputdata'
This is most likely either a proxy, or network issue .
Trying to download file: '../inputdata_checksum.dat' to path '/home/rt/SOURCE/projects/scratch/test_scam_mpace/run/inputdata_checksum.dat.raw' using NoneType protocol.
Traceback (most recent call last):
  File "./case.submit", line 107, in <module>
    _main_func(__doc__)
  File "./case.submit", line 104, in _main_func
    mail_user=mail_user, mail_type=mail_type, batch_args=batch_args)
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 157, in submit
    custom_success_msg_functor=verbatim_success_msg)
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/utils.py", line 1683, in run_and_log_case_status
    rv = func()
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 155, in <lambda>
    batch_args=batch_args)
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 85, in _submit
    case.check_case()
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 171, in check_case
    self.check_all_input_data()
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 163, in check_all_input_data
    _download_checksum_file(self.get_value("RUNDIR"))
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 54, in _download_checksum_file
    success = server.getfile(rel_path, new_file)
AttributeError: 'NoneType' object has no attribute 'getfile'
 

Attachments

  • config_inputdata.txt
    1.4 KB · Views: 6

arti-14

Arti Jadav
New Member
Thank you for the reply.
I tried the ./check_input_data --download. it shows the following error
Code:
Client protocol gftp not enabled
Using protocol wget with user anonymous and passwd user@example.edu
Could not connect to repo 'ftp://ftp.cgd.ucar.edu/cesm/inputdata'
This is most likely either a proxy, or network issue .
Trying to download file: '../inputdata_checksum.dat' to path '/home/rt/SOURCE/projects/scratch/test_scam_mpace/run/inputdata_checksum.dat.raw' using NoneType protocol.
Traceback (most recent call last):
  File "./check_input_data", line 76, in <module>
    _main_func(__doc__)
  File "./check_input_data", line 71, in _main_func
    chksum=chksum) else 1)
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 163, in check_all_input_data
    _download_checksum_file(self.get_value("RUNDIR"))
  File "/home/rt/SOURCE/CESM/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 54, in _download_checksum_file
    success = server.getfile(rel_path, new_file)
AttributeError: 'NoneType' object has no attribute 'getfile'

The wget.log file is
Code:
failed: Connection timed out.
Retrying.
--2020-02-24 15:05:33--  ftp://ftp.cgd.ucar.edu/cesm/inputdata
  (try: 2) => ‘inputdata’
Connecting to ftp.cgd.ucar.edu (ftp.cgd.ucar.edu)|128.117.23.220|:21... failed: Connection timed out.
Retrying.
--2020-02-24 15:07:45--  ftp://ftp.cgd.ucar.edu/cesm/inputdata
  (try: 3) => ‘inputdata’
Connecting to ftp.cgd.ucar.edu (ftp.cgd.ucar.edu)|128.117.23.220|:21... failed: Connection timed out.
Retrying.
--2020-02-24 15:09:59--  ftp://ftp.cgd.ucar.edu/cesm/inputdata
  (try: 4) => ‘inputdata’
Connecting to ftp.cgd.ucar.edu (ftp.cgd.ucar.edu)|128.117.23.220|:21...
 
I have the same problem. What was the solution to this? I ftp'd directly in and downloaded the ftp.cgd.ucar.edu/cesm/inputdata_checksum.dat into my run directory as inputdata_checksum.dat.raw as it appeared to require, but I still get exactly the same error when I re-run ./case.submit. I did not have this problem with v2.0.1 but I just upgraded to v2.1.3. How do I fix this?
 
I just did, and hopefully they can help. I didn't e-mail them earlier because above you suggested only doing that if I could not access the ftp site, which I can. I was able to manually download and install several nc input data files which fixed problems at the case.build stage, but doing so for inputdata_checksum.dat file did not make the error go away, so I presume the problem is deeper than simply not being able to connect to the ftp site. Unless you know any better I'll keep my fingers crossed that fixing the auto download solves the problem.
 

aherring

Adam
Member
Has anyone heard back from help@cgd.ucar.edu on this yet? I am having the same problem I think. I'm using a very up to date code base and it ran fine last week, but is now having trouble connecting to the license server in the build phase ... and after powering through that just like Darrel did, the submit script pitches a fit trying to download a dataset. My older code bases (tags are a few months old) have no problem building and running.
 

mmoore

New Member
Hi, Adam,

I responded in the CGD help system to those that submitted tickets.

There are at least 3 different problems going on, only one of which I have partial control over.

1) Not connecting to the license server is an NCAR/CISL problem. Please submit a ticket
at System Dashboard - servicedesk.ucar.edu.

2) The problem from last Sat was an inadvertent denial of service attack from someone
(still unknown) trying to down load the entire 30 TB of inputdata, which is a) unwise,
and b) unnecessary. The orgin network was blocked, after which inputdata downloads
were successful.

3) Some random problems (like yours) trying to download necessary inputdata. To track
this down I need more information: time of failures, file names, and any error messages.

I also gotta wonder, why is it breaking at ftp.cgd.ucar.edu? Jim's code is supposed to
fall back, in order, from
. anongridftp - which cheyenne.ucar.edu certainly has the code for and should have worked.
. ftp - which is where things broke
. svn - the system of last resort which is where the reference data is stored.

Jim needs to take a look at this.

Mark
--0-
 

aherring

Adam
Member
That makes sense that there are multiple issues going on here.

Mark looped me in that the licensing issue in the build phase was resolved by CISL as of 9AM this morning. My source code now builds and runs to completion. great.

But one problem remains that I really need to solve. And it does seem to mirror some of the errors reported in this thread, in that ./check_input_data --download is failing ... but for Branch runs.

I ran a simulation to completion ... but now I want to branch off that run with some small changes to the source code, and run the branch run in the same directory as the original run (this way has always worked for me in the past). This requires setting a few xml variables in env_run.xml of the original run, and then I do a clean build in the same directory. Now case.submit pitches a fit when it tries to fetch Buildconf/clm.input_data_list:

Code:
Loading input file list: 'Buildconf/mosart.input_data_list'
Loading input file list: 'Buildconf/cam.input_data_list'
Checking server ftp.cgd.ucar.edu/cesm/inputdata with protocol ftp
Setting resource.RLIMIT_STACK to -1 from (-1, -1)
Using protocol ftp with user anonymous and passwd user@example.edu
server address ftp.cgd.ucar.edu root path cesm/inputdata
Loading input file list: 'Buildconf/clm.input_data_list'
  Model clm missing file nrevsn = 'cam6_2_017_FHIST_ne0np4.ARCTIC.ne30x4_mt12_200327_mg3-resetsnow0-a2xall-1979bc-20yrs.clm2.r.1992-01-01-00000.nc'
Trying to download file: 'lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc' to path '/glade/p/cesmdata/cseg/inputdata/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc' using FTP protocol.
Traceback (most recent call last):
  File "./check_input_data", line 76, in <module>
    _main_func(__doc__)
  File "./check_input_data", line 71, in _main_func
    chksum=chksum) else 1)
  File "/glade/u/home/aherring/src/cam6_2_017/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 170, in check_all_input_data
    success = _downloadfromserver(self, input_data_root, data_list_dir)
  File "/glade/u/home/aherring/src/cam6_2_017/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 192, in _downloadfromserver
    user=user, passwd=passwd, ic_filepath=ic_filepath)
  File "/glade/u/home/aherring/src/cam6_2_017/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 366, in check_input_data
    isdirectory=isdirectory, ic_filepath=ic_filepath)
  File "/glade/u/home/aherring/src/cam6_2_017/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 146, in _download_if_in_repo
    success = server.getfile(rel_path, full_path)
  File "/glade/u/home/aherring/src/cam6_2_017/cime/scripts/Tools/../../scripts/lib/CIME/Servers/ftp.py", line 66, in getfile
    os.remove(full_path)
OSError: [Errno 13] Permission denied: '/glade/p/cesmdata/cseg/inputdata/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc'

Note that the typical 'startup' run that does work as of 9AM this morning has no problem grabbing all the clm files, and so it seems to me that the error may have something to do with the disruption of pointing to the restart files in a branch run.

Mark - would you recommend I put in a ticket to NCAR/CISL via system dashboard regarding this issue?
 

mmoore

New Member
....at the risk of providing misdirection....

There was an e-mail a couple of days ago to CSEG regarding file system errors on /glade
resulting in zero length files. I'm not sure if this is related or not, but is a possibility. An e-mail
to CISL could eliminate that as a possiblity. Jim might have more information this, also.

Mark
--0-
 

jedwards

CSEG and Liaisons
Staff member
That last error is really weird, it's trying to download a clm restart file into an existing file path. Adam, It looks like the nrevsn file which does not include a path
is trying to use the path from the previous entry in clm.input_data_list. Try listing this file in user_nl_clm with the full path to your run directory, you may need to do the same thing with the rtm file.
 

aherring

Adam
Member
It's not just clm, but all *.input_data_list are looking for a nrevsn file. This is weird to me because I set GET_REFCASE=FALSE ... which I thought flags the model to look to the rpointer.* and restart files in the run directory, requiring the user to prestage the run directory. But instead it looks for nrevsn = RUN_REFCASE + ... + RUN_REFDATE + ... So based on your comment, I set RUN_REFCASE to the full path of the restarts, and the ./check_input_data runs fine:

Code:
1010 /glade/u/home/aherring> ./check_input_data --download
Setting resource.RLIMIT_STACK to -1 from (307200000, -1)
Loading input file list: 'Buildconf/clm.input_data_list'
Loading input file list: 'Buildconf/cpl.input_data_list'
Loading input file list: 'Buildconf/cice.input_data_list'
Loading input file list: 'Buildconf/docn.input_data_list'
Loading input file list: 'Buildconf/cism.input_data_list'
Loading input file list: 'Buildconf/mosart.input_data_list'
Loading input file list: 'Buildconf/cam.input_data_list'
GET_REFCASE is false, the user is expected to stage the refcase to the run directory.

And the model runs fine. I set RUN_REFCASE to a unique directory outside the run directory with a copy of the restart files ... and I can tell from the logs that it is grabbing these restart files, rather than the restart files in the run directory.
 
Hi Adam, not sure it will help you but my problem turned out to be a combination of automatic download not working and a missing .nc which didn't get flagged when I ./check_input_data. I discovered there was still a .nc file missing when I rebuilt after doing a ./case.build --clean-all. Having identified the missing file I downloaded it manually from the ftp site, and then the check_sum error went away, so I think it was just doing a less than obvious job of trying to tell me I was still missing an input file even though ./check_input_data claimed I wasn't.
 
Top