Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Issue with setting up CESM on new system

thakur.abubakar

Abu Bakar Siddiqui Thakur
Member
Dear CESM community,

We have been trying to set up CESM v2.1.3 on the new system at our institute but we're facing a lot of problems.

As it stands now, we are able to create a case and build it successfully. When we submit the job, an error shows up (segmentation fault) in the log files (attached) but it doesn't exit the queue. Also, on running ./preview)run we find OMP_NUM_THREADS=1 even though we have set it to 48 explicitly while loading the requisite modules (attached modules_file). We load the modules as the first task after logging in.

Any help in this regard would be greatly appreciated.

Best regards,
Abu Bakar
 

Attachments

  • files.zip.txt
    98.8 KB · Views: 12

sacks

Bill Sacks
CSEG and Liaisons
Staff member
Regarding the modules: CIME typically manages module loads, so you should add the appropriate module load commands in the config_machines.xml file. See some examples in cime/config_machines.xml at maint-5.6 · ESMCI/cime (e.g., look at the cheyenne or izumi machines in that file). I'm not sure if that will solve any of the problems you are having, but it might get you further. I wonder if the problem is an inconsistency in the module environment used to build the model vs. the environment used to run it; if so, letting CIME manage the module environment might solve the problem.

The job not exiting the queue is likely a system problem; I suggest asking your system administrator about this.

The number of threads is controlled via the NTHRDS xml variable for each component (see 5. Controlling processors and threads — CIME cime5.6 documentation). Set this via ./xmlchange NTHRDS=48 to change the threading for all components. However, it is rare to run with that many threads: a typical number is between 1 and 8. (Typically CESM is run with multiple mpi tasks on a node, even if OpenMP threading is being used.)
 

thakur.abubakar

Abu Bakar Siddiqui Thakur
Member
Dear Bill,

Thank you for your response. I am getting back to you after trying the things you pointed out. It took some time to get things sorted with our system admin here. Now we can create, build and submit the case successfully.

As a simple test, we successfully ran an S compset (2000_SATM_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP) for five days. Log attached.

However, we are getting the following error on model execution for an aquaplanet case.

ERROR timeaddmonths(): MM out of range
set_time_float_from_date: error return from ESMF_TimeSet for set_time_float_fro
m_date
ERROR: CHKRC


For this case, the model throws this error while trying to read the input file ape_solar_ave_tsi_1365.nc (attached aquaplanet logs)

We tried another case (F200climo), and we got the same error but with a different file. Interestingly, we got the same error when we tried to port CESMv1.2 onto this new system. We are running out of ideas and would appreciate your input here.

I have also attached the compilers, machines, and batch config XML files for your reference.


Best,

Abu Bakar
 

Attachments

  • files.zip.txt
    45.5 KB · Views: 3

sacks

Bill Sacks
CSEG and Liaisons
Staff member
I'm not sure what is causing your latest problem. From searching the forums for this error, I found some posts suggesting that this can happen if a file got corrupted when downloading it from inputdata. So you could try deleting this file then resubmitting (which should result in it being downloaded again). If that doesn't fix the problem, I'd suggest opening a new issue in the Atmosphere (CAM) forums, where someone might have more experience with this issue. When you do, please be sure to include all of the details requested here: Information to include in help requests
 
Top