Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Floating point exception in running iCESM1.2

yzLiu

Yizhang Liu
New Member
Dear all:
After I submit my <case>.run script for about less than two minutes, it would be killed. I tried to run an oxygen isotopes enabled startup PI case in our server based on the instructions of iCESM1.2 in github. And the compset is 'B1850C5', resolution is T31_g37. It's confusing that it can run successfully in another server (I tried before), but it failed in ours. I don't think the question occured in CAM POP or some other modules for no errors in these *.log.* files. In cesm.log, after the lines like 'calcsize j,iq,jac, lsfrm,lstoo ............', many 'QNEG3 from ...... mixing ratio violated at ...... points' followed, and then occured some 'BalanceCheck: soil balance error' and 'ERROR: Isotopic deep-conv precip error'. In the end, it says 'BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES' and 'slurmstepd: error: Detected 1 oom_kill event in StepId = ......' and 'srun: error: l06c41n2: task 0: Out Of Memory'.

Later I changed 'Debug' in env_build.xml to 'True', the same location in cesm.log after 'calcsize .......', the node I used says 'Caught signal 8 (Floating point exception: floating-point invalid operation) and then some information about backtrace and 'forrtl: error: floating point exception'. By the way, there are many 'NetCDF: Invalid dimension ID or name' and 'NetCDF: Variable or Attribute not found' before all of these, will it be something wrong in the netcdf module? But it worked well in running normal CESM1.2.

Following are some of the screenshots in the debugged cesm.log. And I attached the debugged cesm.log and normal cesm.log files in attachment.
I'm grateful for any relevant suggestions or solutions in solving my problem.

屏幕截图 2023-11-12 192907.png屏幕截图 2023-11-12 193133.png
 

Attachments

  • cesm.log(debug).231110-150631.txt
    427 KB · Views: 3
  • cesm.log.231112-141854.txt
    626.4 KB · Views: 2

slevis

Moderator
Staff member
If I understood correctly, this worked on one machine and not on another, so it may be a porting issue. I will let others confirm before we move to the porting forum.
 

yzLiu

Yizhang Liu
New Member
If I understood correctly, this worked on one machine and not on another, so it may be a porting issue. I will let others confirm before we move to the porting forum.
Yes, maybe it is porting problem, but I used the same way as porting it from github to our old machine.
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
HI Yizhang,

Sadly I don't think using the DEBUG flag always works properly for iCESM1.2 (I believe there was some hard-to-solve bug in the isotope physics in the land model). Also, most of the errors/warnings are isotope-specific and shouldn't end the model run, except for the "BalanceCheck" error, which will stop the model, but doesn't appear to be isotope-specific. This is however a CLM4-specific error, so could you possibly provide the lnd.log.* file from your non-debug run? That might help us track down what the actual issue is.

Thanks, and have a great day!

Jesse
 

yzLiu

Yizhang Liu
New Member
HI Yizhang,

Sadly I don't think using the DEBUG flag always works properly for iCESM1.2 (I believe there was some hard-to-solve bug in the isotope physics in the land model). Also, most of the errors/warnings are isotope-specific and shouldn't end the model run, except for the "BalanceCheck" error, which will stop the model, but doesn't appear to be isotope-specific. This is however a CLM4-specific error, so could you possibly provide the lnd.log.* file from your non-debug run? That might help us track down what the actual issue is.

Thanks, and have a great day!

Jesse
Thanks, this is the lnd.log.* file for my non-debug run, it seems that it has integrated for some timesteps.
 

Attachments

  • lnd.log.231112-141854.txt
    48 KB · Views: 9

AKAleksinski

Adam Aleksinski
New Member
Hello Yizhang,

I am encountering this error with iCESM as well. Have you resolved it?

Thank you very much,

Adam Aleksinski
 

yzLiu

Yizhang Liu
New Member
Hello Yizhang,

I am encountering this error with iCESM as well. Have you resolved it?

Thank you very much,

Adam Aleksinski
Maybe you can try the compset of B1850C5CN, thus it will open the carbon and nitrogen cycle on the land, and then change and test the ntasks of ocean module in env_mach_pes.xml, some specific number may work, or you can change the source code of overflows.F90 in pop at line 5612, the line "HUM(:,:,:) = HU(:,:,:)" can be replaced by the following code:
!$OMP PARALLEL DO PRIVATE(iblock,i,j)
do iblock = 1,numBlocksClinic
do j=1,POP_nyBlock
do i=1,POP_nxBlock
HUM(i,j,iblock) = HU(i,j,iblock)
enddo
enddo
enddo
!$OMP END PARALLEL DO
 
Top