Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

run time issue for CLM45BGC

Hi: I am runing a CLM45 BGF offline run for 20TR.Case direcotry is: /glade/p/uola0001/DL_AMIP_OBS/CLM45BGC_test2Model runs fine for 5 months then it come out of que with follwing message:"********************Exited with signal termination: Killed."*******************The error message in the cesm.log file is:"*************************ERROR: 0031-161  EOF on socket connection with node ys3039-ibINFO: 0031-639  Exit status from pm_respond = -1"*****************I am not sure about this error message. Is this job is manually kiiled or ????Please note that this run is a special case using soil mositure cliamtology file. I suuceffuly run a similar case for SP version (/glade/p/uola0001/DL_AMIP_OBS/CLM45SP_test1)
Thanks-Sanjiv Kumar     
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Is there any information in the email you got about the job?  E.g., anything about running out of memory or queue time?  It could also just be a yellowstone glitch, although I assume you've probably tried it a couple of times?  You could also try compiling and running in debug mode to see if you can get more information in the log files.
 
OK, i re-ran the case, this time it ran for 2 years and two months and then the job came out after 8 hours of wall time.I do not think wall time is an issue; because i ran the same standard configuration and it run much faster (~ 25 years in 4 to 5 hours).I do not see much error message enven after running in debug modefew last line in cesm.log file is as follows: ********** 540: Done with h2osoi_liq and h2osoi_ice calculations 408: num_nolakec =          135  fc =           99 408: num_nolakec =          135  fc =          100 408: num_nolakec =          135  fc =          101 408: num_nolakec =          135  fc =          102 408: num_nolakec =          135  fc =          103 408: num_nolakec =          135  fc =          104 408: num_nolakec =   *************I have made last line bold, where i suspect there might be an issue; because it doe not find requred numbers?Any sugegstion is most welcome.Thanks you very much.-SK   
 
Finally this issue was resolved (Thanks to Erik and Keith!)Here is the answer:I was having too many drbug statement in my modifed code making the cesm.log file larger than 10GB; that is why model was stoping.When, i commneted out most of debug sttaements, it works fine.-SK       
 
Top