Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Error when I submit in BHIST

JIAA

Xuanjia Li
New Member
Dear All,

I have some questions about BHIST case. When I submitted the case, it gave this error(showed in picture), and there is no output data in run directory.
I only changed NTASKS=64;ROOTPE=0;JOB_WALLCLOCK_TIME=12:00:00;DOUT_S=TRUE.
cesm.log.230823-204238 is as follows.
Thank you!

Best wishes,
Li1692799269550.png
1692798435611.png
 

sacks

Bill Sacks
CSEG and Liaisons
Staff member
It is typical for there to be no output in the run directory because it is archived to the archive directory (given by the xml variable DOUT_S_ROOT) following successful completion of the run. However, it looks like there might have been a failure in your case. It sounds like you meant to include the cesm log file but I don't see it attached here. If you upload your cesm log file along with your other model log files (atm.log, lnd.log, etc.), we may be able to help you see the problem.

In addition, please provide the information requested here: Information to include in help requests
 

JIAA

Xuanjia Li
New Member
It is typical for there to be no output in the run directory because it is archived to the archive directory (given by the xml variable DOUT_S_ROOT) following successful completion of the run. However, it looks like there might have been a failure in your case. It sounds like you meant to include the cesm log file but I don't see it attached here. If you upload your cesm log file along with your other model log files (atm.log, lnd.log, etc.), we may be able to help you see the problem.

In addition, please provide the information requested here: Information to include in help requests
Thanks for your reply! Model logs are as follow. This time, I have changed GET_REFCASE=TRUE;NTASKS=40, and a new hv.nc file has appeared. 1692947259971.png
 

Attachments

  • atm.log.230824-162422.txt
    374.5 KB · Views: 1
  • cesm.log.230824-162422.txt
    671.4 KB · Views: 1
  • cpl.log.230824-162422.txt
    46.6 KB · Views: 1
  • lnd.log.230824-162422.txt
    197.5 KB · Views: 1
  • ocn.log.230824-162422.txt
    822.3 KB · Views: 1

sacks

Bill Sacks
CSEG and Liaisons
Staff member
It looks like this is dying in the initialization of the ocean model, based on the last output in the cpl.log file, but it's hard to tell what's going wrong here. Have you successfully run any other CESM configurations on this machine?

One possible problem is that there isn't enough memory for this one-degree fully-coupled run when you run on just 64 processors. Are you able to run on more processors? If so, I would try that. We also typically use a processor layout where the ocean is on different processors from the other components - i.e., the ocean runs concurrently. (See 5. Controlling processors and threads — CIME master documentation for some information on this.) A reasonable starting point would be the out-of-the-box processor layout for this resolution, which has:
Code:
Pes setting: tasks       is {'NTASKS_ATM': -8, 'NTASKS_LND': -4, 'NTASKS_ROF': -4, 'NTASKS_ICE': -4, 'NTASKS_OCN': -1, 'NTASKS_GLC': -8, 'NTASKS_WAV': -8, 'NTASKS_CPL': -8}
Pes setting: threads     is {'NTHRDS_ATM': 1, 'NTHRDS_LND': 1, 'NTHRDS_ROF': 1, 'NTHRDS_ICE': 1, 'NTHRDS_OCN': 1, 'NTHRDS_GLC': 1, 'NTHRDS_WAV': 1, 'NTHRDS_CPL': 1}
Pes setting: rootpe      is {'ROOTPE_ATM': 0, 'ROOTPE_LND': 0, 'ROOTPE_ROF': 0, 'ROOTPE_ICE': -4, 'ROOTPE_OCN': -8, 'ROOTPE_GLC': 0, 'ROOTPE_WAV': 0, 'ROOTPE_CPL': 0}

(where negative numbers mean number of full nodes).

Another thing you could try is a simpler configuration, in terms of compset and/or resolution. For example, it would be interesting to see if an ocean-only configuration works (e.g., compset C, resolution T62_g37 for a lower-resolution ocean run, and then T62_g17 for a higher-resolution ocean run). An alternative / additional thing to try would be a lower-resolution atmosphere and land, such as BHIST with resolution f19_g17 (so roughly 2-degree atm/lnd instead of 1 degree).
 

JIAA

Xuanjia Li
New Member
It looks like this is dying in the initialization of the ocean model, based on the last output in the cpl.log file, but it's hard to tell what's going wrong here. Have you successfully run any other CESM configurations on this machine?

One possible problem is that there isn't enough memory for this one-degree fully-coupled run when you run on just 64 processors. Are you able to run on more processors? If so, I would try that. We also typically use a processor layout where the ocean is on different processors from the other components - i.e., the ocean runs concurrently. (See 5. Controlling processors and threads — CIME master documentation for some information on this.) A reasonable starting point would be the out-of-the-box processor layout for this resolution, which has:
Code:
Pes setting: tasks       is {'NTASKS_ATM': -8, 'NTASKS_LND': -4, 'NTASKS_ROF': -4, 'NTASKS_ICE': -4, 'NTASKS_OCN': -1, 'NTASKS_GLC': -8, 'NTASKS_WAV': -8, 'NTASKS_CPL': -8}
Pes setting: threads     is {'NTHRDS_ATM': 1, 'NTHRDS_LND': 1, 'NTHRDS_ROF': 1, 'NTHRDS_ICE': 1, 'NTHRDS_OCN': 1, 'NTHRDS_GLC': 1, 'NTHRDS_WAV': 1, 'NTHRDS_CPL': 1}
Pes setting: rootpe      is {'ROOTPE_ATM': 0, 'ROOTPE_LND': 0, 'ROOTPE_ROF': 0, 'ROOTPE_ICE': -4, 'ROOTPE_OCN': -8, 'ROOTPE_GLC': 0, 'ROOTPE_WAV': 0, 'ROOTPE_CPL': 0}

(where negative numbers mean number of full nodes).

Another thing you could try is a simpler configuration, in terms of compset and/or resolution. For example, it would be interesting to see if an ocean-only configuration works (e.g., compset C, resolution T62_g37 for a lower-resolution ocean run, and then T62_g17 for a higher-resolution ocean run). An alternative / additional thing to try would be a lower-resolution atmosphere and land, such as BHIST with resolution f19_g17 (so roughly 2-degree atm/lnd instead of 1 degree).
Thank you Bill!
I've run F2000climo and FHIST successfully before. But when I submit FHIST, it appeared warning and error like the figure1. There is output after running it except ocn, but I'm not sure if the above affects the results. Befor submitting, I changed namelist for clm and cam to control output(figure 2,3). Other settings are shown below:
Results in group run_begin_stop_restart
RUN_TYPE: startup
CONTINUE_RUN: FALSE
RUN_REFCASE: case.std
RUN_REFDATE: 0001-01-01
RUN_STARTDATE: 1897-01-01
RUN_REFDIR: cesm2_init
GET_REFCASE: FALSE
Results in group run_begin_stop_restart
STOP_N: 5
STOP_OPTION: ndays
RESUBMIT: 0
Results in group mach_pes
NTASKS: ['CPL:64', 'ATM:64', 'LND:64', 'ICE:64', 'OCN:64', 'ROF:64', 'GLC:64', 'WAV:64', 'IAC:64', 'ESP:64']
NTHRDS: ['CPL:1', 'ATM:1', 'LND:1', 'ICE:1', 'OCN:1', 'ROF:1', 'GLC:1', 'WAV:1', 'IAC:1', 'ESP:1']
ROOTPE: ['CPL:0', 'ATM:0', 'LND:0', 'ICE:0', 'OCN:0', 'ROF:0', 'GLC:0', 'WAV:0', 'IAC:0', 'ESP:0']
DOUT_S: TRUE
1693038979630.png
1693039193781.png
1693039215515.png
For the BHIST, unfortunately, this is the only processor I have and maybe I would try BHIST with lower resolution later.
Actually, I think maybe F compset is more suitable for me because I'm going to research land-atmosphere coupling in the future.
Thank you again Bill, you really help me a lot!
 

Attachments

  • atm.log.230826-103309.gz
    59.2 KB · Views: 0
  • cesm.log.230826-103309.gz
    11.2 KB · Views: 2
  • cpl.log.230826-103309.gz
    8.3 KB · Views: 0
  • glc.log.230826-103309.gz
    5.1 KB · Views: 0
  • ice.log.230826-103309.gz
    13.2 KB · Views: 0
  • lnd.log.230826-103309.gz
    15.9 KB · Views: 0
  • ocn.log.230826-103309.gz
    2.9 KB · Views: 0

sacks

Bill Sacks
CSEG and Liaisons
Staff member
I'm not sure why you are getting this warning about ncdump but I have seen that others have suggested that this is not really a problem:

This has to do with the copying of partially-written history files when archiving and restarting the model. I think that, if your later restart runs successfully - and you find all the output you expect on your history files - then you can safely ignore these ncdump warnings.
 

JIAA

Xuanjia Li
New Member
I'm not sure why you are getting this warning about ncdump but I have seen that others have suggested that this is not really a problem:

This has to do with the copying of partially-written history files when archiving and restarting the model. I think that, if your later restart runs successfully - and you find all the output you expect on your history files - then you can safely ignore these ncdump warnings.
Ok, that makes me feel better, haha. Thank you ~
 
Top