ERROR: One or more of the CTSM cap export_1D fields are NaN

Status
Not open for further replies.

Yuan Sun

Yuan Sun
Active Member
Hi all,

I am runing IHIST at 0.05° (lnd), 0.5° (datm), UK domain. I met a similar error for several times.
# of NaNs = 1
Which are NaNs = F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F T F F F F F F F F F F F F F F F F F F
NaN found in field Sl_lfrin at gridcell index/lon/lat: 113 354.82499999999999 59.575000000000003
ERROR: ERROR: One or more of the CTSM cap export_1D fields are NaN

The gridcells found NaN varied over simulation. For example,
# of NaNs = 1
Which are NaNs = F F F F F F F F F F F F F F F F F F F T F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
NaN found in field Sl_lfrin at gridcell index/lon/lat: 20 0.42500000000000010 49.974999999999994
ERROR: ERROR: One or more of the CTSM cap export_1D fields are NaN

# of NaNs = 1
Which are NaNs = F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F T F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
NaN found in field Sl_lfrin at gridcell index/lon/lat: 172 354.27499999999998 58.875000000000000
ERROR: ERROR: One or more of the CTSM cap export_1D fields are NaN

It seemed that the mediator got Nan from the land model. This error might result from the PE layout that bad datm values were sent to clm. I tried to adjust the PE layout using different schemes for many times. After I add ./xmlchange NTHRDS_ATM=4 and the error disappears.

The simulation works with:
NTASKS: ['CPL:8', 'ATM:8', 'LND:8', 'ICE:8', 'OCN:8', 'ROF:8', 'GLC:8', 'WAV:8', 'ESP:8']
ROOTPE: ['CPL:0', 'ATM:0', 'LND:0', 'ICE:0', 'OCN:0', 'ROF:0', 'GLC:0', 'WAV:0', 'ESP:0']
NTHRDS: ['CPL:1', 'ATM:4', 'LND:1', 'ICE:1', 'OCN:1', 'ROF:1', 'GLC:1', 'WAV:1', 'ESP:1']
nodes: 2
total tasks: 8
tasks per node: 4
thread count: 4
ngpus per node: 0

But I am not sure of the principle behind it. The PE layout looks like a mystery to me. Maybe DATM needs more interpolation threads for 0.5° forcing to 0.05° land grid cells?

Thanks for any comments.

Best,
Yuan
 

Yuan Sun

Yuan Sun
Active Member
I checked the output and found that the spin-up outputs (i ran from a cold start for the UK region) contain nan values. How to solve it?

Best,
Yuan
 

Attachments

  • 截屏2024-06-10 11.02.37.png
    截屏2024-06-10 11.02.37.png
    212.8 KB · Views: 13

slevis

Moderator
Staff member
Did you generate the input files (surface and datm) for the UK domain? If so, I recommend looking for issues with your input files. It could be useful to compare your input files with the model's default input files that work. This may give you insight into the problem.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
To add to this, I see this in your output:

NaN found in field Sl_lfrin at gridcell index/lon/lat

The lfrin field is land model fraction so maybe there is a problem with how that is being computed/specified.
 

Yuan Sun

Yuan Sun
Active Member
To add to this, I see this in your output:

NaN found in field Sl_lfrin at gridcell index/lon/lat

The lfrin field is land model fraction so maybe there is a problem with how that is being computed/specified.
Hi Keith and Sam,

Thanks for your insight. I tried several PE layouts on another machine. One PE layout works using 1 node (128 cores).


NTASKS: ['CPL:48', 'ATM:16', 'LND:48', 'ICE:1', 'OCN:1', 'ROF:1', 'GLC:1', 'WAV:1', 'ESP:1']
ROOTPE: ['CPL:16', 'ATM:0', 'LND:16', 'ICE:0', 'OCN:0', 'ROF:0', 'GLC:0', 'WAV:0', 'ESP:0']
NTHRDS: ['CPL:1', 'ATM:2', 'LND:1', 'ICE:1', 'OCN:1', 'ROF:1', 'GLC:1', 'WAV:1', 'ESP:1']
nodes: 1
total tasks: 64
tasks per node: 64
thread count: 2
ngpus per node: 0

Best,
Yuan
 

Attachments

  • 截屏2024-06-13 09.32.59.png
    截屏2024-06-13 09.32.59.png
    120.9 KB · Views: 10
Status
Not open for further replies.
Back
Top