Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

I can run with 1 node but I have a problem to run CLM with 2 nodes!!

Hi guys,I am confused. I want to run CLM regionally. I have created the required files (Domain and surface data files) to run CLM regionally. Now I can run the model with 1 node successfully (with adding “limit coredumpsize unlimited” and “limit stacksize unlimited” to my run script) but I get the following message:--------------------------------------------------------------------------
The usNIC BTL failed to initialize while trying to register some
memory. This typically can indicate that the "memlock" limits are set
too low. For most HPC installations, the memlock limits should be set
to "unlimited". The failure occurred here:

Local host: compute-2-8-ib
Memlock limit: 65536

You may need to consult with your system administrator to get this
problem fixed. This FAQ entry on the Open MPI web site may also be
helpful:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory. This typically can indicate that the
memlock limits are set too low. For most HPC installations, the
memlock limits should be set to "unlimited". The failure occured
here:

Local host: compute-2-8-ib
OMPI source: ../../../../../openmpi-1.8.6/ompi/mca/btl/openib/btl_openib.c:794
Function: ompi_free_list_init_ex_new()
Device: mlx4_0
Memlock limit: 65536

You may need to consult with your system administrator to get this
problem fixed. This FAQ entry on the Open MPI web site may also be
helpful:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
[compute-2-8-ib][[54635,1],112][../../../../../openmpi-1.8.6/ompi/mca/btl/openib/btl_openib.c:873:mca_btl_openib_add_procs] could not prepare openib device for use

[compute-2-8-ib.local:46482] 127 more processes have sent help message help-mpi-btl-usnic.txt / check_reg_mem_basics fail
[compute-2-8-ib.local:46482] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages[compute-2-8-ib.local:46482] 127 more processes have sent help message help-mpi-btl-openib.txt / init-fail-no-memBUT when I want to run the model with more than 2 nodes, first I get the following error:
Host key verification failed.^M
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
-------------------------------------------------------------------------- To solve it, I created a file named “config” in my home directory and in the folder of “.ssh” (/home/MyHome/.ssh). Then I added “StrictHostKeyChecking no” to the config file. After this, My problem about “Host key verification failed” was solved. Now I meet the following error:/share/apps/openmpi-1.8.6-intel/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory--------------------------------------------------------------------------ORTE was unable to reliably start one or more daemons.This usually is caused by: * not finding the required libraries and/or binaries on  one or more nodes. Please check your PATH and LD_LIBRARY_PATH  settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes.  Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).  Please check with your sys admin to determine the correct location to use. *  compilation of the orted with dynamic libraries when static are required  (e.g., on Cray). Please check your configure cmd line and consider using  one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a  lack of common network interfaces and/or no route found between  them. Please check network connectivity (including firewalls  and network routing requirements).--------------------------------------------------------------------------  I am so confused. What is the problem??
 
Top