Main menu

Navigation

CSM1.4-bgc porting on a linux cluster

4 posts / 0 new
Last post
thomasfroelicher@...
CSM1.4-bgc porting on a linux cluster

We use at the university of bern the NCAR CSM1.4-bgc. We are succesfully running it on a IBM SP4 and the ouput-file seems to be right.

We are also trying to install the model on our local linux cluster (details about the machine see below). The model compiles fine, but it stops after one model-day without wirting error messages in the logfiles.
The problem is that MPI give us follow non-meaningful errror message:
------------------------------------------------------------------
(msg_cplc) connecting to msg-passing - successful
p3_15513: p4_error: interrupt SIGSEGV: 11
(initial) reading parameters from input namelist.
P4 procgroup file is pgfile.
finish start mpi program
-------------------------------------------------------------------
The model writes out only history files from the atmosphere and the coupler.

Please let me know if you need more information about our problem.

Thanks for all your help.

With best regards,

Thomas

Linux cluster
-------------------------------

- AMD, 2083 MHZ, 1 GB RAM
- commodity network with 100Mbit
- SUSE Linux with Kernel 2.4.18
- Portland Group Fortran 90 compiler, version 4.0-2
- MPICH 1.2.4
- MPI 1.3.7
- NETCDF 3.5.0

I heard from a Norwegian group about their problems with CCSM on an AMD cluster, their run always ended with SIGSEGV. The solution was to set -inherit_limit in the mpirun command, this seems to be required on AMD but not on Intel clusters.

Good luck,
Klaus

Klaus Wyser
Rossby Centre
Swedish Meteorological and Hydrological Institute

thomasfroelicher@...

Dear Klaus,

thank you for advice.

Where can I set the inherit_limit? directly in the command line with mpirun and what values can I set?

Best regards,

Thomas

Hej Thomas---

Check the thread "Porting CCSM3 to a Linux/AMD Opteron cluster" in "CCSM Porting to unsupported machines", in particular the last reply by Egil. They use ScaMPI and mpimon where you have the possibility to set -inherit_limits directly in the mpimon command.

When I used an AMD cluster (not for CCSM but for a different problem) I had to add "unlimit" or "ulimit -s unlimited" to my .chrsc (or whatever SHELL you are using at login) to make sure that the stack of all nodes was set to unlimited. Send me an e-mail if you need more info (geht auch auf deutsch: eval(unescape('%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74%6f%3a%6b%6c%61%75%73%2e%77%79%73%65%72%40%73%6d%68%69%2e%73%65%22%20%63%6c%61%73%73%3d%22%62%62%2d%65%6d%61%69%6c%22%3e%6b%6c%61%75%73%2e%77%79%73%65%72%40%73%6d%68%69%2e%73%65%3c%2f%61%3e%27%29%3b')))

Cheers,
Klaus

Klaus Wyser
Rossby Centre
Swedish Meteorological and Hydrological Institute

Log in or register to post comments

Who's new

  • 1658093099@...
  • mborreggine@...
  • kabirtam@...
  • suns@...
  • liangpeng0405@...