Main menu

Navigation

Running CESM2 on multiple nodes

1 post / 0 new
abdulla.sakalli@...
Running CESM2 on multiple nodes

Dear All,

I've two nodes on my home cluster (node0 24 threads, node1 16 threads). I'd like to run cesm2 on 34 threads. I modified my config_machines.xml file as

  <machine MACH="nephentes2">

    <DESC>Linux 64bit</DESC>                                 <!-- can be anything -->

    <NODENAME_REGEX>none</NODENAME_REGEX>

    <OS>LINUX</OS>                                                     <!-- LINUX -->

    <COMPILERS>gnu</COMPILERS>                                     <!--gnu-->

    <MPILIBS>mpich</MPILIBS>                                       <!--mpich-->

    <CIME_OUTPUT_ROOT>$ENV{HOME}/projects/scratch</CIME_OUTPUT_ROOT>

    <DIN_LOC_ROOT>$ENV{HOME}/projects/cesm-inputdata</DIN_LOC_ROOT>

    <DIN_LOC_ROOT_CLMFORC>$ENV{HOME}/projects/cesm-inputdata/atm/datm7</DIN_LOC_ROOT_CLMFORC>

    <DOUT_S_ROOT>$ENV{HOME}/projects/scratch/archive/$CASE</DOUT_S_ROOT>

    <BASELINE_ROOT>$ENV{HOME}/projects/baselines</BASELINE_ROOT>

    <CCSM_CPRNC>$CIMEROOT/tools/cprnc/build/cprnc</CCSM_CPRNC>

    <GMAKE_J>2</GMAKE_J>

    <BATCH_SYSTEM>none</BATCH_SYSTEM>

    <SUPPORTED_BY>abdulla.sakalli@iste.edu.tr</SUPPORTED_BY>

    <MAX_TASKS_PER_NODE>12</MAX_TASKS_PER_NODE>

    <MAX_MPITASKS_PER_NODE>12</MAX_MPITASKS_PER_NODE>

    <mpirun mpilib="default">

      <executable>mpirun</executable>

      <arguments>

      <arg name="machine_file">--hostfile $ENV{HOME}/my_hosts_ip</arg>

      <arg name="num_tasks">-np 34</arg>

      </arguments>

    </mpirun>

    <module_system type="none"/>

    <environment_variables compiler="gnu">

      <env name="NETCDF_HOME">/home/as2/local/netcdf461</env>

    </environment_variables>

  </machine>

 

I do not have bach system on my machines. My my_hosts_ip file contains the IP address of my two nodes.

By running the case.submit, I got the following error message:

[mpiexec@nephentes2] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed

[mpiexec@nephentes2] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status

[mpiexec@nephentes2] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event

[mpiexec@nephentes2] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion

 

My question is: Can I run the model on multiple nodes without batch system as in my case?

 

Thank you very much for your reply in advance.

 

Kind regards,

Abdulla

Who's new

  • liaogh@...
  • liquanxin2014@...
  • marro.michele89@...
  • sowon@...
  • kiranjadhav1209@...