Main menu

Navigation

CESM 1.2 on Stempede problem

14 posts / 0 new
Last post
xinmiao@...
CESM 1.2 on Stempede problem

I am trying to run a CESM1.2 with G compset on Stampede@TACC:

./create_newcase -case $1 -res T62_g37 -compset G -mach stampede

The model has been successfully compiled and built. Howver, when I submit it, it shows the following errors.

What does it mean? Is it the model's problem or Stampede's configuration problem?

I have successfully run CESM1.1.1 in our school's cluster several weeks ago. Now I'm trying to run CESM1.2 on Stampede for a test drive.

Shane

-----------------------------------

login1$ vi output.933992
TACC: Starting up job 933992
TACC: Setting up parallel environment for MVAPICH2+mpispawn.
TACC: Starting parallel tasks...
rm: cannot remove `env_case': No such file or directory
env_case: No such file or directory.
BUILD_COMPLETE: Undefined variable.
[c559-302.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 13, pid: 107015) exited with status 1
rm: cannot remove `env_case': No such file or directory
ccsm_getenv error: problem removing env_case
rm: cannot remove `env_case': No such file or directory
ccsm_getenv error: problem removing env_case
rm: cannot remove `env_case': No such file or directory
ccsm_getenv error: problem removing env_case
[c559-302.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 0, pid: 107002) exited with status 254
ccsm_getenv error: problem removing env_case
rm: cannot remove `env_case': No such file or directory
ccsm_getenv error: problem removing env_case
....

Shane

jedwards

The Stampede machine is brand new to us and should not have been included in the release.  There are a number of issues that

we still need to work out with respect to the port.        In particular the file scripts/ccsm_utils/Machines/config_machines.xml file has several

directories pointing at a particular users $WORK directory and you don't have write permission there.    If you really want to run on Stampede you can

start by changing those paths to your own.    The following seems to work, but then there are problems in the module load command.    I apologize, stampede should not have been included in our list of supported machines.  

 

<machine MACH="stampede">
        <DESC>TACC DELL, os is Linux, 16 pes/node, batch system is SLURM</DESC>
        <OS>LINUX</OS>
        <COMPILERS>intel,intel-mic</COMPILERS>
        <MPILIBS>mvapich2,impi,mpi-serial</MPILIBS>
        <RUNDIR>$WORK/$CASE/run</RUNDIR>
        <EXEROOT>$WORK/$CASE/bld</EXEROOT>
        <DIN_LOC_ROOT>$WORK/inputdata</DIN_LOC_ROOT>
        <DIN_LOC_ROOT_CLMFORC>$WORK/lmwg</DIN_LOC_ROOT_CLMFORC>
        <DOUT_S_ROOT>$WORK/archive/$CASE</DOUT_S_ROOT>
        <DOUT_L_MSROOT>csm/$CASE</DOUT_L_MSROOT>             
        <CCSM_BASELINE>$WORK/ccsm_baselines</CCSM_BASELINE>
        <CCSM_CPRNC>$WORK/tools/cprnc/cprnc</CCSM_CPRNC>
        <BATCHQUERY>squeue</BATCHQUERY>
        <BATCHSUBMIT>sbatch</BATCHSUBMIT>
        <SUPPORTED_BY>srinathv -at- ucar.edu</SUPPORTED_BY>
        <GMAKE_J>8</GMAKE_J>
        <MAX_TASKS_PER_NODE>32</MAX_TASKS_PER_NODE>
        <PES_PER_NODE>16</PES_PER_NODE>
</machine>

 

CESM Software Engineer

xinmiao@...

I have already noticed the directory problem and fixed it in config_machines.xml. Here's my version.
<machine MACH="stampede">
        <DESC>TACC DELL, os is Linux, 16 pes/node, batch system is SLURM</DESC>
        <OS>LINUX</OS>
        <COMPILERS>intel,intel-mic</COMPILERS>
        <MPILIBS>mvapich2,impi,mpi-serial</MPILIBS>
        <RUNDIR>/home1/02489/xm7303/CESM/cesm1_2_0/scripts/$CASE/run</RUNDIR>
        <EXEROOT>/home1/02489/xm7303/CESM/cesm1_2_0/scripts/$CASE/bld</EXEROOT>
        <DIN_LOC_ROOT>/home1/02489/xm7303/CESM/inputdata</DIN_LOC_ROOT>
        <DIN_LOC_ROOT_CLMFORC>/home1/02489/xm7303/CESM/lmwg</DIN_LOC_ROOT_CLMFORC>
        <DOUT_S_ROOT>/home1/02489/xm7303/CESM/archive/$CASE</DOUT_S_ROOT>
        <DOUT_L_MSROOT>csm/$CASE</DOUT_L_MSROOT>            
        <CCSM_BASELINE>/home1/02489/xm7303/CESM/ccsm_baselines</CCSM_BASELINE>
        <CCSM_CPRNC>/home1/02489/xm7303/CESM/cesm1_2_0/tools/cprnc/cprnc</CCSM_CPRNC>
        <BATCHQUERY>squeue</BATCHQUERY>
        <BATCHSUBMIT>sbatch</BATCHSUBMIT>
        <SUPPORTED_BY>srinathv -at- ucar.edu</SUPPORTED_BY>
        <GMAKE_J>8</GMAKE_J>
        <MAX_TASKS_PER_NODE>32</MAX_TASKS_PER_NODE>
        <PES_PER_NODE>16</PES_PER_NODE>
</machine>
Also, TACC team has solved another compile problem when loading module, fixed the env_mach_specific.stampede, see below:
-------------------------------------
From: Robert McLay
Date: Tue, 11 Jun 2013 11:16:55
Subject: Errors when Compiling CESM model
 Response:
 I did miss the top of the shell script.  It is run with "-f" which means that it ignores the system cshrc as well as ~/.cshrc.  In that case it must to something to define the module command.  In that case I would recommend that they do:
    <code>source /etc/profile.d/tacc_modules.csh</code>
 This is much safer as this will always define the module command for the csh and is very unlikely to change as Lmod gets updated.
------------------------------------------------------
However, the phenomenon is the same. I still don't understand why the error below happened (in my log):
ccsm_getenv error: problem removing env_case
rm: cannot remove `env_case'env_case: No such file or directory.
I am wondering whatelse I should try? Thanks,
Shane

Shane

jedwards

Hi Shane,


I think that you are just running out of disk space - the $HOME filesystem on stampede is very small - you should use $WORK and or $SCRATCH.

CESM Software Engineer

xinmiao@...

I rebuilt the model in my $WORK directory, and it has been successfully compiled. However, the phenomenon is the same. See below. So the disk space is not the major problem. I don't understand why the model wants to remove `env_case' anyway. What is it?

-------------------------------------------

login4$ vi output.942179
TACC: Starting up job 942179
TACC: Setting up parallel environment for MVAPICH2+mpispawn.
TACC: Starting parallel tasks...
rm: cannot remove `env_case': No such file or directory
ccsm_getenv error: problem removing env_case
[c557-604.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 8, pid: 6553) exited with status 254
rm: cannot remove `env_case': No such file or directory
ccsm_getenv error: problem removing env_case
rm: cannot remove `env_case': No such file or directory
ccsm_getenv error: problem removing env_case
.....

Shane

xinmiao@...

Anyone can help me?

Shane

jedwards

Hi Shane,  The STAMPEDE port is incomplete, it should not have been in the 1.2.0 release code.   I will work over the next week or so to complete the port and provide you with instructions on running there as soon as they are ready.   

CESM Software Engineer

wagmanbe@...

Hi, I am a first-time user trying to port CESM 1.2.1 to Stampede. I saw the comment on porting to 1.2.0. Is there any documentation on that? Thanks.

jedwards

We are preparing cesm 1.2.2 for a planned release date of June 1, it will include support for Stampede.  

CESM Software Engineer

aneeshcs@...

Hi,

 I just downloaded the CESM 1.2.2 svn trunk onto Stampede and tried to compile a new case with:

login2$ ./create_newcase -list

 

But I get a warning with an XML error. I have loaded the PERL modules and MPICH2 with the compatible intel compilers. The warning error is :

WARNING:

    The perl module XML::LibXML is needed for XML parsing in the CESM script system.

        Please contact your local systems administrators or IT staff and have them install it for

 

        you, or install the module locally.

 

If anyone has successfully been able to compile and run CESM 1.2.2 on Stampede, could you please share your env variables to be loaded before compilation?

 

Thanks.

jedwards

. /etc/profile.d/tacc_modules.sh
module load perl
export CESMDATAROOT=/scratch/projects/xsede/CESM/
export PERL_LOCAL_LIB_ROOT="$CESMDATAROOT/perl5";
export PERL_MB_OPT="--install_base $CESMDATAROOT/perl5";
export PERL_MM_OPT="INSTALL_BASE=$CESMDATAROOT/perl5";
export PERL5LIB="$CESMDATAROOT/perl5/lib/perl5/x86_64-linux-thread-multi:$CESMDATAROOT/perl5/lib/perl5";

CESM Software Engineer

aneeshcs@...

Thanks !! It worked for me with these definitions.

sunshaobo133@...

Hi,

    I have same problem about the perl.  The error is :

        WARNING:The perl module XML::LibXML is needed for XML parsing in the CESM script system.

        Please contact your local systems administrators or IT staff and have them install it for   you, or install the module locally.

    Could you help me?

Thank you very much!

jedwards

If you are on stampede follow the instructions in the post above.  If you are on another system get XML::libXML from cpan:

http://search.cpan.org/~shlomif/XML-LibXML-2.0117/LibXML.pod

CESM Software Engineer

Log in or register to post comments

Who's new

  • Nicholas.Davis@...
  • numarsanifa@...
  • bingdian_46@...
  • mxy2832029@...
  • nthg2000@...