Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Failed to run on Linux machine

Hi,I need some help on the CESM1.2 running with CAM5 (F_AMIP_CAM5) with 0.9x1.25 resolution.I followed the instruction carefully and configure this on a Linux machine –(Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz, Red Hat Enterprise Linux Server release 6.2 (Santiago))pgf90 11.10-0 64-bit target on x86-64 Linux -tp nehalem - Prestaging REFCASE (ccsm4_init/b40_20th_1d_b08c5cn_139jp/1979-01-01) to BLD/CAM5_0.9X1.25/EXE/runCESM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY-------------------------------------------------------------------------Wed Aug 14 12:36:51 EDT 2013 -- CSM EXECUTION BEGINS HERE*** glibc detected *** cesm.exe: double free or corruption (out): 0x000000006e6d2ec0 ***======= Backtrace: =========/lib64/libc.so.6[0x3fe6475296]I am running mpi-serial at this moment.Just wonder if the version has been tested before releasing, looks buggy to me.
 

jedwards

CSEG and Liaisons
Staff member
"Just wonder if the version has been tested before releasing, looks buggy to me."Always a good idea to insult your support staff when you pose a question.   :-)   The bug is in your version of the pgi compiler.   Please get an update, the latest PGI compiler available is  13.x
 

jedwards

CSEG and Liaisons
Staff member
"Just wonder if the version has been tested before releasing, looks buggy to me."Always a good idea to insult your support staff when you pose a question.   :-)   The bug is in your version of the pgi compiler.   Please get an update, the latest PGI compiler available is  13.x
 
From my experience, it is not easy to configure CESM than WRF, both are developed by UCAR. It could be done better.Thanks for the reply.I wonder whether mpi is involved in this case - the term "mpi-serial" is confusing, I suggest CESM change that to serial. I found pgf90/pgcc is used in the compilation. We have used the compiler for many other projects, it worked fine. So why would CESM depend on compiler?
 
From my experience, it is not easy to configure CESM than WRF, both are developed by UCAR. It could be done better.Thanks for the reply.I wonder whether mpi is involved in this case - the term "mpi-serial" is confusing, I suggest CESM change that to serial. I found pgf90/pgcc is used in the compilation. We have used the compiler for many other projects, it worked fine. So why would CESM depend on compiler?
 

jedwards

CSEG and Liaisons
Staff member
You have an older version of the PGI compiler with a known problem that produces this error.   Update to a newer PGI compiler or use another compiler instead.  
 

jedwards

CSEG and Liaisons
Staff member
You have an older version of the PGI compiler with a known problem that produces this error.   Update to a newer PGI compiler or use another compiler instead.  
 

santos

Member
  1. "mpi-serial" is not an arbitrary name; it's the name of the MPI library we use in serial cases, which is code distributed with MCT. CESM requires an MPI implementation even in a serial case, and mpi-serial provides that.
  2. CESM comprises a very large body of code, mostly Fortran 90, from hundreds of contributors over several decades. As a result, it often uncovers compiler bugs that are not apparent in smaller code bases that are written with a more homogenous style. Furthermore, I believe that several libraries, such as netCDF, are mainly written in C, and many others are written in Fortran 77. So it's no surprise that they would work even though CESM is encountering a compiler bug.
 

santos

Member
  1. "mpi-serial" is not an arbitrary name; it's the name of the MPI library we use in serial cases, which is code distributed with MCT. CESM requires an MPI implementation even in a serial case, and mpi-serial provides that.
  2. CESM comprises a very large body of code, mostly Fortran 90, from hundreds of contributors over several decades. As a result, it often uncovers compiler bugs that are not apparent in smaller code bases that are written with a more homogenous style. Furthermore, I believe that several libraries, such as netCDF, are mainly written in C, and many others are written in Fortran 77. So it's no surprise that they would work even though CESM is encountering a compiler bug.
 
Thanks for the reply.1. I actually didn't set the mpi at all,  MPI_LIB_NAME:=    MPI_PATH:=, these two are all empty. So I don't know where can it even find MPI lib in the mpi-serial case. BTW, will ESMF be faster than MCT?2. Agreed, netCDF is a pain. Maybe this lib should be compiled by gcc or gfortran which comes with the machine - otherwise many people will suffer.
 
Thanks for the reply.1. I actually didn't set the mpi at all,  MPI_LIB_NAME:=    MPI_PATH:=, these two are all empty. So I don't know where can it even find MPI lib in the mpi-serial case. BTW, will ESMF be faster than MCT?2. Agreed, netCDF is a pain. Maybe this lib should be compiled by gcc or gfortran which comes with the machine - otherwise many people will suffer.
 

santos

Member
  1. mpi-serial comes from MCT, which is bundled with CESM. So you do not have to set any environment variables for mpi-serial; CESM will build mpi-serial out of models/utils/mct/mpi-serial. I personally don't know anything about whether ESMF is faster than MCT.
  2. Different Fortran compilers usually cannot understand each others' .mod files, so it's usually necessary to use the same compiler for both CESM and the Fortran bindings of netCDF. This is one reason why netCDF now distributes the C source and the Fortran bindings separately.
 

santos

Member
  1. mpi-serial comes from MCT, which is bundled with CESM. So you do not have to set any environment variables for mpi-serial; CESM will build mpi-serial out of models/utils/mct/mpi-serial. I personally don't know anything about whether ESMF is faster than MCT.
  2. Different Fortran compilers usually cannot understand each others' .mod files, so it's usually necessary to use the same compiler for both CESM and the Fortran bindings of netCDF. This is one reason why netCDF now distributes the C source and the Fortran bindings separately.
 
Thanks for the reply.It took 23hrs almost to finish a 5 day run on MP_SERIAL - is this normal for my setup ?(Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz, Red Hat Enterprise Linux Server release 6.2 (Santiago))64GB memory 














...   
 
Thanks for the reply.It took 23hrs almost to finish a 5 day run on MP_SERIAL - is this normal for my setup ?(Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz, Red Hat Enterprise Linux Server release 6.2 (Santiago))64GB memory 














...   
 

jedwards

CSEG and Liaisons
Staff member
A similar configuration using 1024 pes of ncar's yellowstone supercomputer ran at about 15 simulated years per wall clock day (ypd).    15ypd/1024 = 5.34 simulated days per wall clock day.   I'd say you are getting just about exactly the performance you might expect to get.            
 

jedwards

CSEG and Liaisons
Staff member
A similar configuration using 1024 pes of ncar's yellowstone supercomputer ran at about 15 simulated years per wall clock day (ypd).    15ypd/1024 = 5.34 simulated days per wall clock day.   I'd say you are getting just about exactly the performance you might expect to get.            
 
Thanks Jedwards!So if I want to run a 1x1 degree resolution, and I would like to have one simulated day in ~20mins wall clock time, which one should I use?I am confused by the coupling - if I specify the SST and sea ICE, which modules will be turned off? In other words, would the coupling still run?
 
Top