Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Porting error - Fails in RUN steps

Hello, I had a working port on my local cluster, but now the runs fail a few minutes into the RUN phase. I went back and ran through the scripts_regression_tests and the tests fail at L_TestSaveTimings and again at T_TestRunRestart. I haven't made any changes to my config files since the model was running correctly so I'm not sure what's going on. Attached are the config files, version info and the cesm log file for the L_TestSaveTimings. If it's useful I can attach the build/run logs for a real case. Thanks for any insights.
 

Attachments

  • version_info.txt
    1.4 KB · Views: 1
  • config_batch.xml.txt
    1.5 KB · Views: 0
  • config_machines.xml.txt
    3.8 KB · Views: 0
  • config_compilers.xml.txt
    1.9 KB · Views: 2
  • cesm.log.16055328.220509-100238.txt
    8.7 KB · Views: 5

jedwards

CSEG and Liaisons
Staff member
From the CESM log: ERROR opening input data file
Is your inputdata directory mounted and available from the compute nodes?
Try moving inputdata and redownloading input files.
 
Ah, yes, updating the input data worked for the L_Test. The T_Test however appears to have a memory issue (see attached log). Is this related to a config setting or an issue with allocation my compute nodes? I haven't had this problem in the past and it's the same system/nodes so it seems strange that it would be a problem all of a sudden.
 

Attachments

  • cesm.log.16057575.220509-113500.txt
    8.4 KB · Views: 3

jedwards

CSEG and Liaisons
Staff member
This test isn't asking for much memory. Check your ulimits and check with your system support staff.
 
I have no memory limits aside from the limit on the nodes (~1.8GB/node) which I believe should be enough for the test, and has been in the past. I checked my build logs and they all start with 'cat: Srcfiles: No such file or directory' so I'm wondering it the real issue is elsewhere?
 
Top