Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

regional single-point test run

wvsi3w

wvs3iw
Member
Hello,
I used the documentation of running single point configuration for Alaska with CLM5 CESM 2.1.3 and the case I used has --res f19_g16 --compset I2000Clm50BgcCruGs and runs with my own machine config (that is tested and has worked with other compset).

I used this link info to make the case and after it downloaded all of the required input data I submitted the case successfully.

I have to mention that in some parts I had to manually download some domain and surface_map data using the following command:
wget https://svn-ccsm-inputdata.cgd.ucar...clm/domain.lnd.13x12pt_f19_alaskaUSA_gx1v6.nc -P /home/XXXXX/projects/YYYYY/XXXXX/inputdata/share/domains/ --no-check-certificate

But after submission, it failed with 1 min of running with the following message which appears in the log file multiple times:
MPI_ABORT was invoked on rank 55 in communicator MPI_COMM_WORLD with errorcode 1001. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

I searched the forum and couldn't find the reason why I am having this error. I appreciate it if you let me know your opinion.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
That's a pretty general error. Have you looked carefully at all of the log files to see if there is more information?
 

wvsi3w

wvs3iw
Member
That's a pretty general error. Have you looked carefully at all of the log files to see if there is more information?
Thanks for your response. Oh, I understand. I also see the below error in the land log file:
ERROR: ERROR in decompInitMod.F90 at line 169

I have attached all of the log files. I don't see any other error in them.
 

Attachments

  • atm log.txt
    14.3 KB · Views: 0
  • cesm log.txt
    128.2 KB · Views: 0
  • cpl log.txt
    42.2 KB · Views: 0
  • lnd log.txt
    8.2 KB · Views: 3

oleson

Keith Oleson
CSEG and Liaisons
Staff member
It looks like you have more processors assigned to the job (128) than the number of grid cells that you have (85). You can reduce the number of tasks/processors in env_mach_pes.xml.
 

wvsi3w

wvs3iw
Member
Thanks for your helpful answer.
I have tested the model with reduced number of task/processors in my env_mach_pes file. resubmitted the case and for the first time I used 40 instead of 64 and the land log file showed the following error:

Attempting to read GLACIER_REGION...
(GETFIL): attempting to find local file
surfdata_1.9x2.5_hist_16pfts_Irrig_CMIP6_simyr2000_c190304.nc
(GETFIL): using
/home/meisam/projects/def-hbeltram/meisam/inputdata/lnd/clm2/surfdata_map/relea
se-clm5.0.18/surfdata_1.9x2.5_hist_16pfts_Irrig_CMIP6_simyr2000_c190304.nc
ncd_getiodesc ERROR in vsize 13824 156
88

ERROR:
ERROR in /home/meisam/my_cesm_sandbox/components/clm/src/main/ncdio_pio.F90.in
at line 2388

for the third time I tested it with 4 tasks and mpi tasks per node (instead of 40 which I used for the previous run), and in the land log file it shows the following error:

Attempting to read GLACIER_REGION...
(GETFIL): attempting to find local file
surfdata_1.9x2.5_hist_16pfts_Irrig_CMIP6_simyr2000_c190304.nc
(GETFIL): using
/home/meisam/projects/def-hbeltram/meisam/inputdata/lnd/clm2/surfdata_map/relea
se-clm5.0.18/surfdata_1.9x2.5_hist_16pfts_Irrig_CMIP6_simyr2000_c190304.nc
ncd_getiodesc ERROR in vsize 13824 156
88

ERROR:
ERROR in /home/meisam/my_cesm_sandbox/components/clm/src/main/ncdio_pio.F90.in
at line 2388

I don't know why it shows the same error for two different numbers of tasks per node (and different mpi tasks per node) I have checked the input data which is stated in that error message and I can find it there, I don't know if it has an issue or it can not process it.

In the first error, it said the Number of processes (128) exceeds the number of land grid cells(85); how to set the env_mach_pes variables to fit that odd number? I use any number for the MAX TASKS PER NODE and MAX MPI TASKS PER NODE and it doubles it, which in the first run these two variables were 64 and the number of processes was 128. For instance, here I used 4:

Code:
<group id="mach_pes_last">
    <entry id="COST_PES" value="8">
      <type>integer</type>
      <desc>pes or cores used relative to MAX_MPITASKS_PER_NODE for accounting (0 means TOTALPES is valid)</desc>
    </entry>
    <entry id="TOTALPES" value="8">
      <type>integer</type>
      <desc>total number of physical cores used (setup automatically - DO NOT EDIT)</desc>
    </entry>
    <entry id="MAX_TASKS_PER_NODE" value="4">
      <type>integer</type>
      <desc>maximum number of tasks/ threads allowed per node </desc>
    </entry>
    <entry id="MAX_MPITASKS_PER_NODE" value="4">
      <type>integer</type>
      <desc>pes or cores per node for mpitasks </desc>
    </entry>
 

wvsi3w

wvs3iw
Member
I did all of the steps again from the beginning and it shows the same error using 4 processors. The log files are attached.
 

Attachments

  • CESM log new.txt
    23.1 KB · Views: 2
  • LND LOG NEW.txt
    10 KB · Views: 2

slevis

Moderator
This still seems like a task geometry issue to me.

What if you try creating the case with
--mpilib mpi-serial
and running on 1 cpu?

If that doesn't work, then @erik may have a suggestion and/or you might ask a system admin for the machine that you are using.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
It looks like you are using a 2deg surface dataset (144x96 = 13824) instead of the Alaska surface dataset (13x12 = 156)? Your domain file is 13x12.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
In looking at this in more detail, it doesn't look like this example is supported by recent versions of the model (e.g., release-cesm2.1.3), sorry. The surface dataset for Alaska doesn't come out of the box anymore and the dataset that I found doesn't have GLACIER_REGION on it which is now required.
 

wvsi3w

wvs3iw
Member
In looking at this in more detail, it doesn't look like this example is supported by recent versions of the model (e.g., release-cesm2.1.3), sorry. The surface dataset for Alaska doesn't come out of the box anymore and the dataset that I found doesn't have GLACIER_REGION on it which is now required.
Thanks for your clarification. Yes, I am using the 2.1.3 version of the model. I used the steps in the link, and I realized there were some other input data that it couldn't download; I downloaded the following manually:

surfdata_1.9x2.5_hist_16pfts_Irrig_CMIP6_simyr2000_c190304.nc
surfdata_13x12pt_f19_alaskaUSA_simyr2000.nc
domain.lnd.13x12pt_f19_alaskaUSA_gx1v6.nc

So is there a supported list of regional single-point tests for this version?

One other question (less related to the topic of this thread): I couldn't find step-by-step videos of setting up, configuring, and running the model. I believe if we had a series of videos explaining these steps for most of the discussed topics in the forum, it could save a lot of time and effort for everyone using this model. I have read and watched all of the tutorials available but most of them include slides and none of them show the real environment and the steps in real time. This video and other videos in this channel could be examples of what I (and probably many other students) have in mind.

For instance, for CESM we can have one video for each of the following topics:
-the porting process which is the major issue for people with zero knowledge of the model (like me) could be solved with one single video explaining how to do that on some other cluster using different configurations.
-quick run steps and solving some typical errors in it.
-changing the namelists for a desired simulation. Also changing other XML files for that desired simulation.
-running single point + regional study + using user defined input data + using different user defined "compset and res" +...
-coupling the model
-etc
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
In cime/scripts, you can run ./query_config --grids to get the available grids. For release-cesm2.1.3, I see "5x5_amazon" which is a regional grid over the amazon. There are also several supported single-point grids, e.g., 1x1_brazil.
It's a little tricky to find a compset and resolution combination that works, you may need several tries. This worked for me:

./create_newcase --compset 2000_DATM%GSWP3v1_CLM50%SP_SICE_SOCN_MOSART_SGLC_SWAV --res 5x5_amazon --case I2000Clm50Sp_5x5_amazon --run-unsupported

I'm not sure if you've seen the most recent CESM tutorial material, which might be better than the previous material which had videos with slides:

 
Top