Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

DATA_ASSIMILATION_SCRIPT in CESM/DART fails with "Bad fd number" when using tcsh

jurymark

mark
New Member
Dear all,
I am trying to run DART data assimilation within CESM on Ubuntu 22.04, but I encounter a shell error when specifying a tcsh assimilation script.
When I run ./case.submit , I get:

Running /root/cases/fwscdart/assimilate.csh
/bin/sh: 1: Syntax error: Bad fd number
ERROR: /root/cases/fwscdart/assimilate.csh /root/cases/fwscdart 0 >& /root/output/fwscdart/run/da.log.251014-222608 FAILED, cat /root/output/fwscdart/run/da.log.251014-222608


But this file does not exist.

I set
./xmlchange DATA_ASSIMILATION_SCRIPT="/bin/tcsh /root/cases/fwscdart/assimilate.csh"
or create a new bash with
#!/bin/bash
/bin/tcsh /root/cases/fwscdart/assimilate.csh "$@"

But neither of them works.
This seems to be related to my operating system.
echo $SHELL
/bin/bash

So I tried changing SHELL to /bin/tcsh, but I still get the same error; it seems Bash is still being used.

Which part of the cime code should I modify? Thank you in advance for guidance!

Mark
 

raeder

Member
Hi Mark,
it will be helpful to have more context about what you're trying to do, and your setup.
What compset and observations are you trying to use?
Which versions of CESM and DART?
Have you successfully run your CESM without data assimilation on your machine?
How big is your machine (CPUs, memory, etc)?
Which assimilate.csh are you using? (attaching it here would help. You might need to add ".txt" to the name)

Then a few questions about the failure:
"./assimilate.csh" implies that you're running it outside of any CESM job. Is that the whole command that you used?
In which directory did you execute ./assimilate.csh?
An internet search tells me that "Bad fd number" results from using the wrong redirection symbols for the shell.
sh wants to see '>', not '>&'
Does the directory /root/output/fwscdart/run exist?
 
Vote Upvote 1 Downvote

jurymark

mark
New Member
Hi Mark,
it will be helpful to have more context about what you're trying to do, and your setup.
What compset and observations are you trying to use?
Which versions of CESM and DART?
Have you successfully run your CESM without data assimilation on your machine?
How big is your machine (CPUs, memory, etc)?
Which assimilate.csh are you using? (attaching it here would help. You might need to add ".txt" to the name)

Then a few questions about the failure:
"./assimilate.csh" implies that you're running it outside of any CESM job. Is that the whole command that you used?
In which directory did you execute ./assimilate.csh?
An internet search tells me that "Bad fd number" results from using the wrong redirection symbols for the shell.
sh wants to see '>', not '>&'
Does the directory /root/output/fwscdart/run exist?
Hi raeder,

CESM version is 2.1.5, DART version is 11.10.4.
I can run ./case.submit & after that run "/bin/tcsh ./assimilate.csh $CASEROOT“ in $CASEROOT successfully.

Here I added some checks to solve this problem.
In $CESM/cime/scripts/lib/CIME/utils.py, in function run_cmd: I add check in timeout check whether it is csh and try to specify the executable here.
if cmd.strip().endswith(".csh") or cmd.strip().endswith(".tcsh"):
shell_exe = "/bin/tcsh"
Because I don't know whether directly specifying the shell by default will have other bad implications.
As modified, like this txt.

Answer some other questions: compset:FW2010climo, observations is era5 datas. Yes, I have successfully run CESM without data assimilation. The machine is AMD EPYC 9654. About assimilate.csh, I just change compset & stopoptions.
I think the main problem still lies with the machine; this is not a job system like a computing cluster.

Thank you for your patience !!

Mark
 

Attachments

  • utils.txt
    64.8 KB · Views: 2
Vote Upvote 0 Downvote

raeder

Member
I can run ./case.submit & after that run "/bin/tcsh ./assimilate.csh $CASEROOT“ in $CASEROOT successfully.
It looks like you can run CESM and assimilate.csh separately.
Running the model and assimilation in CASEROOT is unconventional. That may explain why you couldn't find the da.log file, which you mentioned in your first message.

Here I added some checks to solve this problem.
In $CESM/cime/scripts/lib/CIME/utils.py, in function run_cmd: I add check in timeout check whether it is csh and try to specify the executable here.
if cmd.strip().endswith(".csh") or cmd.strip().endswith(".tcsh"):
shell_exe = "/bin/tcsh"
To help future readers; shell_exe is passed as a new argument to subprocess.Popen. It specifies which shell should be used to run the cmd (assimilate.csh) which is also an argument to Popen.
Your comments make be think that you are able to run WACCM and DART in the same job now, so your fixes may be all that you need.

Since the compset is WACCM, I'm guessing that you're using the assimilate.csh from $DART/models/cam-fv/shell_scripts/cesm2_1.
If you're not, please attach the script that you're using.
I'm surprised that this change is necessary. DART has assimilate.XXX scripts written in at least 2 common shell languages
and we have not needed to make this change when running on many other machines.
It makes me wonder whether there's something broken in the importing of CESM to your Ubuntu computer,
or in the creation and setup of the CASE.

About assimilate.csh, I just change compset & stopoptions.
compset and stopoptions are set in CESM commands and files. They are not controlled by, or inputs to, assimilate.XXX.

I'm also curious about how you are using the ERA5 observations. Did you convert them into DART's observation sequence files?

Kevin
 
Vote Upvote 0 Downvote

jurymark

mark
New Member
Since the compset is WACCM, I'm guessing that you're using the assimilate.csh from $DART/models/cam-fv/shell_scripts/cesm2_1.
Yes, and I modified 'setup_advanced' in the folder with the same name.
Did you convert them into DART's observation sequence files?
Yes, but when I try to run with a 15-minute time step and hourly assimilation, the common error is a segmentation fault when restarting from an .r. file. I still haven't been able to resolve this issue. I'm still testing locally, attaching the file without much hope, maybe someone knows why.
 

Attachments

  • cesm.log.txt
    164.4 KB · Views: 0
Vote Upvote 0 Downvote
Top