Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

WAACM-CARMA - Buildlib.mct Error?

Hello,

I am trying to make, configure, and build a new WAACM-CARMA test case (5-day default), using the latest developer code CESM 1 - Beta 16 on Bluefire. However, I am getting a buildlib.mct error when building.

All I have changed is in the CAM_CONFIG_OPTS in env_build.xml, and everything else is at its default. I successfully created and configured the case. However, when trying to build the case, I get the following error:

Buildlib.mct failed

In the mct.bldlog file, it states the following:

“New build of MCT
Running Configure…
For OS = AIX MACH = bluefire
Gmake: command not found
Cp: Makefile.conf: A file or directory in the path name does not exist.”


Has anyone run into this issue? Am I simply missing setting a path somewhere?

Any help would be much appreciated, thank you.

Patrick
 

santos

Member
I am trying to reproduce this issue on bluefire, but I cannot so far.

I attempted this by building an F_WACCM_2000 case with one modification: CAM_CONFIG_OPTS in env_build.xml was set to "-phys cam4 -chem waccm_mozart_sulfur -carma sulfate". This built with no errors.

If you have only tried this once, I would try building again just to see if there was a problem with bluefire's filesystem and/or one of the login nodes. It may have been fixed.

If the problem is still happening, it would be easiest if you could let me know where your case and run directories are, so I can look at those directly. Otherwise, I would like to know the following:

1) Did you check out the code by hand, or use the version in "collections"? If you checked it out yourself, please give the output of these two commands:

which svn
svn --version

There is a known problem with mct if it is checked out with a subversion client older than 1.6. However, this should not be an issue on bluefire, which has version 1.6.13.

2) Which WACCM compset and grid are you using?

3) What did you set CAM_CONFIG_OPTS to?

EDIT: Please also give the value of the PATH variable for you (say, with "env"). It looks like gmake is not in your path. If /usr/local/bin is not in your path, please add it.
 
Thank you for the speedy response!!

In answer (bold) to your questions (italics):

If you have only tried this once, I would try building again just to see if there was a problem with bluefire's filesystem and/or one of the login nodes. It may have been fixed.

I have tried to build this case numerous times to no avail. I have also tried doing the same procedure using the Beta_15 developer code, and I get the exact same error.

If the problem is still happening, it would be easiest if you could let me know where your case and run directories are, so I can look at those directly.

My case and run directories are located at: /glade/home/pcampbe
I have configured the following two cases for the two Beta versions I tried -
f.e10.FSDW.f19_f19.001 (Beta_16)
f.e10.FSDW.f19_f19.002
(Beta_15)


1) Did you check out the code by hand, or use the version in "collections"?


I did not check out the code by hand, I used the copy in "collections". Here are the directories I used: /glade/proj3/cseg/collections/cesm1_1_beta16 (beta15)

2) Which WACCM compset and grid are you using?

I am using the specified dynamics compset, FSDW. I am using the f19_f19 grid.

3) What did you set CAM_CONFIG_OPTS to?

I changed CAM_CONFIG_OPTS using the following command:
./xmlchange -file env_build.xml -id CAM_CONFIG_OPTS -val "-phys cam4 -chem waccm_mozart_sulfur -nlev 88 -offline_dyn -carma sulfate"


EDIT: Please also give the value of the PATH variable for you (say, with "env"). It looks like gmake is not in your path. If /usr/local/bin is not in your path, please add it.
I am hoping this is what you were looking for me to do:

$ pwd
/glade/home/pcampbe
$ $PATH
ksh: /usr/local/lsf/7.0/aix5-64/etc:/usr/local/lsf/7.0/aix5-64/bin:/usr/local/kr b5/current/bin:/usr/local/openssh/current/bin/:/usr/bin:/etc:/usr/sbin:/usr/ucb: /usr/bin/X11:/sbin:/usr/java5/jre/bin:/usr/java5/bin: not found.

I noticed that /usr/local/bin wasn't included, so I added it:

$ export PATH=$PATH:/usr/local/bin

And rechecked the path:

$ $PATH
ksh: /usr/local/lsf/7.0/aix5-64/etc:/usr/local/lsf/7.0/aix5-64/bin:/usr/local/kr b5/current/bin:/usr/local/openssh/current/bin/:/usr/bin:/etc:/usr/sbin:/usr/ucb: /usr/bin/X11:/sbin:/usr/java5/jre/bin:/usr/java5/bin:/usr/local/bin: not found.



I believe the case is now building after I added /usr/local/bin as path!

I assume I will need to add this path each time I build?

Thank you for your help!

Patrick
 

santos

Member
I'm glad this worked! This should definitely be added to your path every time you log in. I believe that for Korn shell, you can add it automatically upon login by adding the "export" line to the file ~/.profile (if this login script does not exist, create it):

export PATH=$PATH:/usr/local/bin

If you want a better shell, and don't want to wait for Yellowstone to come online, look at https://www2.cisl.ucar.edu/docs/bluefire/getting-started under the "Shell & Environment" tab. I use bash, whereas most people I know use tcsh. I would recommend either of those over Korn shell.

If you change your shell, you will also have to make sure that the login script moves/changes to conform with whatever shell you are using. For bash, the export command will still work, but you should probably put it in .bash_profile. For tcsh, I think it will be "setenv PATH ${PATH}:/usr/local/bin" and should be in .login. (Or you can use the usual rc files, like ".bashrc" and ".tcshrc", but on bluefire these may not always be sourced.)

I am rather baffled as to why /usr/local/bin is not in the default PATH for new users, since bluefire is practically unusable without it. This should not be an issue on Yellowstone, which hopefully will handle PATH in a saner way. Yellowstone will also use tcsh as the default shell.

Two more tips:

1) Instead of typing "$PATH" to display the path, type "env PATH" or "echo $PATH". Just typing "$PATH" will try to run it as a command, rather than displaying it, which is why you got a ksh error message. These commands should work in all common shells.

2) When it's your first time running the model on a new machine, you may want to try running the simplest case that you can (say, an F case in vanilla CAM). This makes it easier to tell whether your issue is related to some specific configuration (such as WACCM), or whether you have the same issue with all CESM runs.
 
Thank you for all of your help! I am new to Bluefire (and soon to be Yellowstone), so your comments and tips are very much appreciated!

EDIT:
Quick question, I have successfully built my test case, but when I go to submit the job I get the following error:

> *.submit
check_case OK
Bad user group name. Job not submitted.


Patrick
 

santos

Member
The script that you want to submit will be the script [case name].run, and you want to submit it by feeding it into the batch submission system for the local machine. So for your first case on bluefire, you'd use the command:

bsub < f.e10.FSDW.f19_f19.001.run

because bluefire has an LSF system, and bsub is the LSF command. Similarly, for a test case I created called "CARMATEST", it would be:

bsub < CARMATEST.run

Most other machines use a PBS system, so the local command is "qsub":

qsub < f.e10.FSDW.f19_f19.001.run

I have a tutorial on running the previous version of the model, although several things have been moved, so it is out of date for the CESM1.1 betas:

https://www.cesm.ucar.edu/models/cesm1.0/cesm/cesm1_tutorial.pdf

There will eventually be an updated version for CESM1.1.
 
Clearly my lack of experience is shining through!

Would this be why after I was actually able to submit my job the way I explained, my job was exited?

Such as:

From: LSF System [lsfadmin@bluefire.ucar.edu]
Sent: Friday, September 07, 2012 1:15 PM
To: pcampbe@bluefire.ucar.edu
Subject: Job 156561: Exited

Job was submitted from host by user in cluster .
Job was executed on host(s) , in queue , as user in cluster .



was used as the home directory.
was used as the working directory.
Started at Fri Sep 7 13:13:34 2012
Results reported at Fri Sep 7 13:15:20 2012

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#! /bin/tcsh -f
#BSUB -n 64
#BSUB -R "span[ptile=16]"
#BSUB -q regular
#BSUB -N
#BSUB -x
#BSUB -a poe
#BSUB -o poe.stdout.%J
#BSUB -e poe.stderr.%J
#BSUB -J f.e10.FSDW.f19_f19.001
#BSUB -W 4:00
#BSUB -P 35711032

setenv OMP_NUM_THREADS 4
set launchtool=/usr/local/bin/hybrid_launch
# ----------------------------------------
# PE LAYOUT:
# total number of tasks = 64
# maximum threads per task = 4
# cpl ntasks=64 nthreads=4 rootpe=0 ninst=1
# cam ntasks=64 nthreads=4 rootpe=0 ninst=1
# clm ntasks=64 nthreads=4 rootpe=0 ninst=1
# cice ntasks=64 nthreads=4 rootpe=0 ninst=1
# docn ntasks=64 nthreads=4 rootpe=0 ninst=1
# sglc ntasks=64 nthreads=4 rootpe=0 ninst=1
#
# total number of hw pes = 256
# cpl hw pe range ~ from 0 to 255
# cam hw pe range ~ from 0 to 255
# clm hw pe range ~ from 0 to 255
# cice hw pe range ~ from 0 to 255
# docn hw pe range ~ from 0 to 255
# sglc hw pe range ~ from 0 to 255
# ----------------------------------------
#----------------------------------------------------
# Determine necessary environment variables
#----------------------------------------------------

cd /glade/home/pcampbe/f.e10.FSDW.f19_f19.001

./Tools/ccsm_check_lockedfiles || exit -1
source ./Tools/ccsm_getenv || exit -2

if ($BUILD_COMPLETE != "TRUE") then
echo "BUILD_COMPLETE is not TRUE"
echo "Please rebuild the model interactively via"
echo " ./${CASE}.build"
exit -2
endif


(... more ...)
------------------------------------------------------------

Exited with exit code 134.

Resource usage summary:

CPU time : 4394.90 sec.
Max Memory : 14196 MB
Max Swap : 15072 MB

Max Processes : 134
Max Threads : 3337

Read file for stdout output of this job.
Read file for stderr output of this job.


You can see that my job didn't complete successfully. I will use the method you explained.

Thanks again!

Patrick
 
Santos,
I have submitted the job as you explained, and it was successfully submitted to the queue.

However, as I show above, during the run it exits my job prematurely. In the ccsm.log.120907-131340 log file it was found that:

pio_support::pio_die:: myrank= -1 : ERROR: nf_mod.F90: 666 : NetCDF: Variable not found
1:
1: Traceback:
1: Offset 0x00000bec in procedure __pio_support_NMOD_piodie, near line 101 in file pio_support.F90.in
1: Offset 0x00000568 in procedure __pio_utils_NMOD_check_netcdf, near line 68 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/utils/pio/pio_utils.F90
1: Offset 0x00000300 in procedure __nf_mod_NMOD_inq_varid_vid, near line 666 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/utils/pio/nf_mod.F90
1: Offset 0x000005d4 in procedure __mo_flbc_NMOD_flbc_get, near line 601 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/atm/cam/src/chemistry/utils/mo_flbc.F90
1: Offset 0x000014e4 in procedure __mo_flbc_NMOD_flbc_inti, near line 388 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/atm/cam/src/chemistry/utils/mo_flbc.F90
1: Offset 0x000000f4 in procedure __chem_surfvals_NMOD_chem_surfvals_init, near line 257 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/atm/cam/src/physics/cam/chem_surfvals.F90
1: Offset 0x000000b8 in procedure __inital_NMOD_cam_initial, near line 103 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/atm/cam/src/dynamics/fv/inital.F90
1: Offset 0x00000378 in procedure __cam_comp_NMOD_cam_init, near line 155 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/atm/cam/src/control/cam_comp.F90
1: Offset 0x000008a4 in procedure __atm_comp_mct_NMOD_atm_init_mct, near line 276 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/atm/cam/src/cpl_mct/atm_comp_mct.F90
1: Offset 0x00004698 in procedure __ccsm_comp_mod_NMOD_ccsm_init, near line 920 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/drv/driver/ccsm_comp_mod.F90
1: Offset 0x00000034 in procedure ccsm_driver, near line 90 in file /glade/proj3/cseg/collections/cesm1_1_beta16/models/drv/driver/ccsm_driver.F90
1: --- End of call chain ---
1:Communication statistics of task 1 is associated with task key: 2056783011_1
1:
1:Running: /ptmp/pcampbe/f.e10.FSDW.f19_f19.001/bin/ccsm.exe

It looks like a NetCDF variable cannot be found, but this is confusing since I have run the CESM1_1 Beta 16 model in most of its default settings. I have only changed the CAM_CONFIG_OPTS, as discussed above as well.

I am sorry for all the difficulties, any help you can provide is much appreciated.

Patrick
 

santos

Member
I will run a short test to try reproducing this, but I don't recognize this bug. Can you post the output of "env"? I'd be interested to know if there's anything in your environment that could affect which version of NetCDF you are using.
 

santos

Member
I have reproduced this error, so it seems to be a genuine bug in the model. I'm trying to track down the cause now. Our regression tests did not catch any problem in FSDW f19_f19, so this is a bit of a surprise. I'll post again when I know more.
 

santos

Member
It looks like this is not a bug after all. SD WACCM uses a different lower boundary conditions file (flbc_file in the namelist), which doesn't have all the fields that waccm_mozart_sulfur needs.

In particular, I notice that there is no OCS_LBC field in the SD WACCM input file, which is probably what caused this error. I would ask Michael Mills about this when he gets back next week; I'm not familiar with all the relevant distinctions between these data sets.
 
I have looked at flbc_file in the CaseDocs/atm_in namelist file. I see:

flbc_file = '/glade/proj3/cseg/inputdata/atm/waccm/lb/LBC_1765-2500_1.9x2.5_CMIP5_RCP45_za_c091214.nc'

So just to be clear, your suggesting that the LBC_1765-2500_1.9x2.5_CMIP5_RCP45_za_c091214.nc SD WACCM input file has missing variables (e.g., OCS_LBC) in regards to a waccm_mozart_sulfur configuration? I also notice where we set the waccm_mozart configuration at : cam_chempkg = 'waccm_mozart' Is there a place I can check whats required in the waccm_mozart config?


I will also wait to discuss this issue with Mike next week.

Thanks again!!

Patrick
 

santos

Member
The list of species fixed at the lower boundary is generated in the namelist as "flbc_list":

flbc_list = 'CCL4', 'CF2CLBR', 'CF3BR', 'CFC11', 'CFC113', 'CFC12', 'CH3BR', 'CH3CCL3', 'CH3CL', 'CH4', 'CO2', 'H2', 'HCFC22',
'N2O', 'OCS'

Note OCS in this list, which means the data has to be present in the flbc_file.

This variable is generated by CAM's build-namelist script. The simplest way to find this variable is to just look in atm_in, like you did to find flbc_file. But I think that all the waccm_mozart chemistry packages have the same list, except that waccm_mozart_sulfur adds OCS.

(Until you had this issues, I'd forgotten that not all the WACCM cases have had their flbc_files updated with OCS. The flbc_list is set according to the chemistry package, but flbc_file is set according to the compset you are using. In fact, I think that only basic year 2000 cases work with waccm_mozart_sulfur "out of the box".)
 
Top