Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

write history file failuer for single point simulation in derecho

xiaoxiaokuishu

Ru Xu
Member
Hi, all,

I run a single-point simulation on derecho (clm-palm, we have changed the code), when the model run after 6 years,

the cesm.log write:

NetCDF: Numeric conversion not representable
pio_support::pio_die:: myrank= -1 : ERROR:
pionfwrite_mod::write_nfdarray_double: 250 :
NetCDF: Numeric conversion not representable

It seems the model output some extreme values or nan value?
other information: when I set ./xmlchange DEBUG=True, the model even can not run for 1 second...
so I suspect my error may be because of the compiler issues?

my case is under /glade/derecho/scratch/ruxu/step5.20250316-0830/run, can you have a look!

Best
Ru
 

slevis

Moderator
Staff member
Hi, all,

I run a single-point simulation on derecho (clm-palm, we have changed the code), when the model run after 6 years,

the cesm.log write:

NetCDF: Numeric conversion not representable
pio_support::pio_die:: myrank= -1 : ERROR:
pionfwrite_mod::write_nfdarray_double: 250 :
NetCDF: Numeric conversion not representable

It seems the model output some extreme values or nan value?
other information: when I set ./xmlchange DEBUG=True, the model even can not run for 1 second...
so I suspect my error may be because of the compiler issues?

my case is under /glade/derecho/scratch/ruxu/step5.20250316-0830/run, can you have a look!

Best
Ru

@xiaoxiaokuishu
you're likely correct that after 6 years the model encounters a condition generating a NaN or Inf, and the model aborts while writing to history. Interesting that DEBUG=True fails immediately. Do you mean that you go back and start the simulation from the beginning and it does not run for even 1 second? If so, this suggests that debug mode catches the NaN or Inf very quickly and aborts.

I'm afraid there is no easy way of debugging code with NaNs or Infs. If you know how to use a debugger, this can accelerate the troubleshooting, but we do not support such a capability. Old fashioned troubleshooting includes adding "write" statements to the code in places that you suspect may cause the problem until you eventually discover the error. Sometimes NaNs and Infs can originate in badly initialized variables, division by zero, and other bugs.
 

xiaoxiaokuishu

Ru Xu
Member
@xiaoxiaokuishu
you're likely correct that after 6 years the model encounters a condition generating a NaN or Inf, and the model aborts while writing to history. Interesting that DEBUG=True fails immediately. Do you mean that you go back and start the simulation from the beginning and it does not run for even 1 second? If so, this suggests that debug mode catches the NaN or Inf very quickly and aborts.

I'm afraid there is no easy way of debugging code with NaNs or Infs. If you know how to use a debugger, this can accelerate the troubleshooting, but we do not support such a capability. Old fashioned troubleshooting includes adding "write" statements to the code in places that you suspect may cause the problem until you eventually discover the error. Sometimes NaNs and Infs can originate in badly initialized variables, division by zero, and other bugs.
Hi, Slevis,

I think when debug is turned on, the model may be more stringent and give early warning for some particular values, which may be why I can run for 6 years when it's not turned on, and as soon as it's turned on, it just runs for a few seconds and dies.... I will try to find what happened...
 
Top