Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CSM1.4-bgc porting on a linux cluster

We use at the university of bern the NCAR CSM1.4-bgc. We are succesfully running it on a IBM SP4 and the ouput-file seems to be right.

We are also trying to install the model on our local linux cluster (details about the machine see below). The model compiles fine, but it stops after one model-day without wirting error messages in the logfiles.
The problem is that MPI give us follow non-meaningful errror message:
------------------------------------------------------------------
(msg_cplc) connecting to msg-passing - successful
p3_15513: p4_error: interrupt SIGSEGV: 11
(initial) reading parameters from input namelist.
P4 procgroup file is pgfile.
finish start mpi program
-------------------------------------------------------------------
The model writes out only history files from the atmosphere and the coupler.

Please let me know if you need more information about our problem.

Thanks for all your help.

With best regards,

Thomas

Linux cluster
-------------------------------

- AMD, 2083 MHZ, 1 GB RAM
- commodity network with 100Mbit
- SUSE Linux with Kernel 2.4.18
- Portland Group Fortran 90 compiler, version 4.0-2
- MPICH 1.2.4
- MPI 1.3.7
- NETCDF 3.5.0
 
I heard from a Norwegian group about their problems with CCSM on an AMD cluster, their run always ended with SIGSEGV. The solution was to set -inherit_limit in the mpirun command, this seems to be required on AMD but not on Intel clusters.

Good luck,
Klaus
 
Hej Thomas---

Check the thread "Porting CCSM3 to a Linux/AMD Opteron cluster" in "CCSM Porting to unsupported machines", in particular the last reply by Egil. They use ScaMPI and mpimon where you have the possibility to set -inherit_limits directly in the mpimon command.

When I used an AMD cluster (not for CCSM but for a different problem) I had to add "unlimit" or "ulimit -s unlimited" to my .chrsc (or whatever SHELL you are using at login) to make sure that the stack of all nodes was set to unlimited. Send me an e-mail if you need more info (geht auch auf deutsch: klaus.wyser@smhi.se)

Cheers,
Klaus
 
Top