Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Debugging CESM with Linaro Forge

paulhall

Paul Hall
Member
Does any documentation exist for debugging CESM using the Linaro Forge DDT on Derecho? I have found instructions for using Linaro Forge on Derecho for a general case, but is there any documentation describing the process of configuring and submitting a CESM2 case to run with DDT?
 

paulhall

Paul Hall
Member
A little more detail on what I'm trying to do, and where I'm encountering issues. I am attempting to debug a CESM case (cesm2_3_alpha17a tag) on Derecho using Linaro Forge DDT. The steps I have taken are as follows:
1. Installed and configured the Linaro Forge Client (23.1.2) following the instructions here

2. I configure my CESM case on Derecho, including using xmlchange to set DEBUG=TRUE

3. I edit .env_mach_specific.sh to load ncarenv-basic/23.09 ncarenv/23.09 linaro-forge/23.1 (the 23.09 modules replace the 23.06 in the original files, the newer versions are required for linaro-forge/23.1)

4. Start the Linaro Forge client locally

Following instructions for Derecho in the comments here I do the following from the command line on Derecho to launch the job:
1. qsub -l select=9:ncpus=128:mem=235GB:mpiprocs=128 -A $PROJECTNUMBER -q main@desched1 -l walltime=01:00:00

2. cd $CASEROOT

3. source .env_mach_specific.sh

4. cd $RUNDIR

5. ddt --connect $EXEROOT/cesm.exe

After waiting a few minutes for the job to launch, I get a Reverse Connect Request on my local machine, which I accept. However, when I then try to run the case, I get a pop-up window saying that the CTI version cannot be detected and that I should ensure that a cray-cti module is loaded (see attached). However, I can't seem to find a cray-cti module on Derecho (module spider cray-cti comes up empty). Does anyone know why I'm getting this message and/or how to address it? Thanks in advance for any help with this!
 

Attachments

  • Screenshot 2024-05-13 at 2.39.29 PM.png
    Screenshot 2024-05-13 at 2.39.29 PM.png
    159.1 KB · Views: 2

paulhall

Paul Hall
Member
Thanks @dbailey ! Incidentally, I just tried this again using the older version of Linaro Forge (23.0.4 for client and linaro-forge/23.0 on Derecho) and reverting to the ncarenv-basic/23.06 and ncarenv/23.06 modules, and it works without any errors. Could there be something different between the 23.0 and the 23.1 modules on Derecho, as far as how they interface with cray-mpich on Derecho?
 

jedwards

CSEG and Liaisons
Staff member
Hi Paul,

I see that you got it working. That's great. I think that the problem is that linaro-forge/23.1 works with ncarenv/23.09 but not 23.06 and your step
3 (editing the .env_mach_specific.sh) isn't really doing what you think it is. When I use ddt I do not use that step, instead I
module load linaro-forge/23.1 after the source step (your second step 3). The newer linaro-forge will work with newer cesm tags that use 23.09 but for
17a I would use the older linaro-forge version as you are doing.
 

paulhall

Paul Hall
Member
Hi Paul,

I see that you got it working. That's great. I think that the problem is that linaro-forge/23.1 works with ncarenv/23.09 but not 23.06 and your step
3 (editing the .env_mach_specific.sh) isn't really doing what you think it is. When I use ddt I do not use that step, instead I
module load linaro-forge/23.1 after the source step (your second step 3). The newer linaro-forge will work with newer cesm tags that use 23.09 but for
17a I would use the older linaro-forge version as you are doing.
Great! Thanks @jedwards!
 

paulhall

Paul Hall
Member
@jedwards When I attempt to actually run the cesm.exe in Linaro Forge I get a popup window with an error saying that "A source file could not be found in its original location: /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-esmf-8.6.0b04-mkg7dasd7hipsqte2ibfflqzfe7cwgos/spack-src/src/Infrastructure/VM/src/ESMCI_VMKernel.C". I take it this has something to do with the spack install of the esmf module on Derecho that I'm using, and that you might have been involved (it looks like the path points to your scratch directory)? Any idea how I can resolve this? Thanks!
 

jedwards

CSEG and Liaisons
Staff member
If you need to look at the source of that file you should checkout that esmf version from git using:
git clone git@github.com:esmf-org/esmf.git-b v8.6.0b04
then point ddt to the source file in your new esmf source tree. But if you are debugging in the
ESMF source then maybe a discussion of the issue you are trying to find would be in order?
 

paulhall

Paul Hall
Member
Hi @jedwards I'm not trying to debug ESMF. I have a custom CICE grid that is getting caught up in the ice_mesh_check subroutine, and I was hoping to use DDT to get a better sense of what the problem is with the grid's configuration by looking at the values of the relevant variables in the code where it throws the error. The error message about the missing source file pops up before I can get to the call to ice_mesh_check.

I wonder if there is something wrong with how I am setting up the job in Linaro Forge. For example, the Linaro Forge "Run" pop-up window shows just a single MPI process when it comes up, even though the job requests 9x128=1152 cores.
 
Top