Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Issue when CTSM attempts to read fsurfdata file

Carter4444

Carter Watson
New Member
Describe your problem or question:
Good afternoon! I am having difficulty running CTSM v5.2.005--while the run starts out well, as soon as CTSM attempts to read the surface dataset, the run crashes. As can be seen in the attached rsl.out.0000.txt file, CTSM initiates appropriately until it begins to read the surface dataset (in this case, surfdata_2018_d01_360.nc). Then, as can be seen in rsl.error.0000.txt, the program crashes. It cites "PIOc_openfile+0x15" and libesmf.so.

While this error is associated with WRF-CTSM, I can vouch for the integrity of WRF (it worked with a different LSM), the integrity of ESMF (it has been reinstalled multiple times), and the integrity of LILAC (as you can read in rsl.out.0000, all land-atmosphere coupling was successful). Moreover, the wonderful Dr. Olesen helped create the fsurfdat file (Segmentation fault when running mksurfdata_esmf), so I have difficulty imagining there is a problem with the file. This issue simply must lie with how CTSM is reading this file.

I appreciate any direction that you all can provide. Reading earlier message board posts, there was one similar (some question in cesm.exe). Though there was no resolution, and my problem is different in nature. There was another post that suggested ParallelIO that is locally installed within CTSM is too old compared to the version utilized in ESMF. To the best of my understanding, both are running PIO v2.6.2. Perhaps, however, CTSM's internal PIO infrastructure is a bit different than the internal ESMF PIO.

This machine is running with mpich, no batch software is being utilized, and config_compilers.xml and config_machines.xml are attached. My machine is, indeed, ported to cime.
 

Attachments

  • config_compilers.txt
    1.2 KB · Views: 0
  • config_machines.txt
    4.8 KB · Views: 0
  • rsl.error.0000.txt
    3.4 KB · Views: 1
  • rsl.out.0000.txt
    14.2 KB · Views: 2

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Unfortunately I don't see any error messages that would help me to diagnose the problem. The surface dataset you are using appears to be:

/opt/home/cwat/WRF/CTSM_inputfiles/surfdata_2018_d01_360.nc

I don't think I created a file with that name, but maybe you simply renamed the file I created (correct me if I'm wrong)? I guess I would check to make sure it is not an netCDF-4 file and make sure it is readable and not corrupted.
 

Carter4444

Carter Watson
New Member
Thank you for the help! Yes, I changed the file name (it was originally surfdata_2018_d01.nc--I thought that it could've been an issue with the coordinate system because the ESMFmeshfile did not work because it was created on a basis of -180° to 180° and LILAC only reads positive numbers, but I digress). I can view the file via ncview and Panoply, so I can only assume it is not corrupt. I also tried the other files you sent, but those gave the same errors. When running "ncdump -k surfdata_2018_d01.nc," it says the file is a cdf5 file, is this a decent format? I am running netCDF-C 4.9.2 and netCDF-Fortran 4.6.1. (I don't know if it's relevant, but I'm running HDF5 v1.14.3.) Thanks again!
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
cdf5 is fine. I'm not that familiar with WRF-CTSM, but are there any other log files, e.g., that are typical of CTSM runs, like lnd.log, cesm.log, etc.? If not, you may need to start printing out stuff in main/surfrdMod.F90 to find out where the problem in reading the file is, or run in DEBUG mode if that is an option in WRF-CTSM.
As you suggest it could also be a problem with the libraries on your machine, which is out of my expertise. You could post in the infrastructure forum with the "PIOc_openfile+0x15" and libesmf.so error you are getting.
 

slevis

Moderator
Staff member
As a troubleshooting step, I agree with the "print" statements idea.
I also have a question:
Have you gotten past this point in other wrf-ctsm simulations, e.g. with other fsurdat files? If so, this will definitely help you in your troubleshooting.
 

slevis

Moderator
Staff member
You could check the fsurdat file by running a clm-only simulation. If that doesn't work due to the fsurdat file, then you may find it easier to troubleshoot in that mode.
 

Carter4444

Carter Watson
New Member
Thanks everyone so much for your help! The problem was twofold: ESMF and CTSM were running different versions of ParallelIO--I reinstalled it on both to ensure they were both on v2.6.2. Despite fixing this, I kept getting the same error. It turns out lnd_modelio.nml was expecting "pnetcdf" instead of "netcdf". Fixing this allowed the fsurfdat file to be read. I appreciate the amazing support!
 
Top