Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Issues with CESM testing on local hpc

stevenDH

Member
I've got some issues with the following tests (which are part of the pre-alpha tests) :

- SMS_Ld1.f19_f19_mg16.FXSD.hydra_gnu.cam-outfrq1d
When I submit this test everything works normal and the nescessairy input files are being downloaded, however during the RUN PHASE the test errors with the following error:

ERROR: GETFIL: FAILED to get /gpfs/projects/climate/cesm/inputdata/atm/cam/met/MERRA/2000/MERRA_19x2_20000102.nc

Which I find is odd since the test downloaded other nescessairy files such as MERRA_19x2_20000101.nc
I'm puzzled by this since no of the other tests have any issues with missing files

- IRT_N3_PM3_Ld7.f19_g17.BHISTWs.hydra_gnu.allactive-defaultio
This test seems to run properly for most part but in the beginning of the cesm.log file there are some odd messages:

mca_base_component_repository_open: unable to open mca_oob_ud: libosmcomp.so.3: cannot open shared object file: No such file or directory (ignored)

This message is repeated several times for each compute node and core, afterwards the test start giving normal log messages and at the end of the test it fails with the following error:

--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 27 with PID 195130 on node node319 exited on signal 9 (Killed).
--------------------------------------------------------------------------

Does anyone have any how to solve these errors? Or potentially what they might be related to?
 

stevenDH

Member
The first one got fixed by now in an other ticket, however I still have 2 tests failing in mysterious manners, I initially used CESM2.0 for all testing

- IRT_Ld7.f09_g17.BHIST: I'm quite sure that this test fails because it is buggy. The exact failure of the test occurs when it tries to free memory in a memory region that does not belong to it. This usually happens because of errors in memory management on the side of the program. I tried to run the same test with the last stable release of cesm version 2.1.1 to check if there are any improvements on this test. However, the test fails to create the case in the new version with the error:
"Refcase not found in /projects/climate/cesm/inputdata/cesm2_init/b.e21.B1850.f09_g17.CMIP6-piControl.001_v3hist/0501-01-01"
Would it be possible to get this input data?

- IRT_N3_PM3_Ld7.f19_g17.BHISTWs: I could not get any insight on why this test fails. It just suddenly stops, there is no error. I also tried this with cesm-2.1.1 and apparently it was removed from cesm, so maybe it is not a relevant test.

Does anyone have any insight in these tests? I'm checking them in order to be sure that the porting of CESM2 on our local machine was succesful.
 
Top