Welcome to the new DiscussCESM forum!
We are still working on the website migration, so you may experience downtime during this process.

Existing users, please reset your password before logging in here: https://xenforo.cgd.ucar.edu/cesm/index.php?lost-password/

MPI issues with scripts_regression_tests.py script file

fmiville

fmiville
New Member
Dear CESM community,
I'm a beginner with CLM5.0 but I have already spent several months trying to make it work without success. I'm really looking now for some help or advice.

In fact, I'm not able to perform a complete run since issues appear at the ./case.submit step.
As far as I have understood, I have issues with my mpi compiler.

If I try a simple piece of code like this :
Code:
export CIME_MACHINE=randlab-1
export CIME_MODEL=cesm
export CIMEROOT=$HOME/clm5.0/cime
export CASE=test
export CASEROOT=~/projects/cases/$CASE
export DIN_LOC_ROOT=~/projects/cesm-inputdata

./create_newcase --case $CASEROOT --res 1x1_mexicocityMEX --compset I1PtClm50SpGs --machine $CIME_MACHINE

cd $CASEROOT
./case.setup
./case.build
./case.submit

I will get an error in the early phase of the ./case.submit step :
Code:
ERROR: RUN FAIL: Command 'mpiexec -np 1 /home/me/projects/scratch/test/bld/cesm.exe  >> cesm.log.$LID 2>&1 ' failed
See log file for details: /home/me/projects/scratch/test/run/cesm.log.210511-142940

I have already looked in the log file, but I'm not able to understand what is happening (I put in attachment the log file).

So now I'm following the instructions here in order to port and validate CIME on a new platform.
I was able to execute correctly the MPI example.
I also put my config_machines.xml file in the .cime directory and the following command was successful.
Code:
xmllint --noout --schema $CIME/config/xml_schemas/config_machines.xsd $HOME/.cime/config_machines.xml

And now I have also issues with the last step : executing the scripts_regression_tests.py script file.

Thoses tests were successful :
Code:
./scripts_regression_tests.py A_RunUnitTests
./scripts_regression_tests.py B_CheckCode
./scripts_regression_tests.py G_TestMacrosBasic
./scripts_regression_tests.py H_TestMakeMacros
./scripts_regression_tests.py I_TestCMakeMacros
./scripts_regression_tests.py J_TestCreateNewcase
./scripts_regression_tests.py M_TestWaitForTests
But when I execute :
Code:
./scripts_regression_tests.py L_TestSaveTimings
I also get a similar error :
Code:
ERROR: RUN FAIL: Command 'mpiexec -np 1 /home/me/projects/scratch/scripts_regression_test.20210511_143519/SMS_Ln9_P1.f19_g16_rx1.A.randlab-1_gnu.fake_testing_only_20210511_143604/bld/cesm.exe  >> cesm.log.$LID 2>&1 ' failed

Now I have no idea where to look...
Any hints or help would be really appreciated. Also, do you have a really small piece of code to reproduce, the most easiest CLM5.0 example possible ?
Thanks in advance for your help,
Best regards,
François
 

Attachments

  • cesm.log.210511-142940.txt
    7.5 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
It looks like the problem may be with memory on your system. Have you set the ulimit stack size to the maximum allowed value? Do
you have a system administrator that you can consult?
 

fmiville

fmiville
New Member
OK, on previous trials I had set the stack size to unlimited with :
Code:
ulimit -s unlimited
But for some reason, for the example I posted, it get back to a lower value. So I set it again the to unlimited and indeed the cesm log file is now not the same anymore, but still I'm not able to understand it. I put it in attachment.
Thanks for your help.
F.
 

Attachments

  • cesm.log.210511-153135.txt
    7.7 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
Rebuild with DEBUG enabled:
./xmlchange DEBUG=TRUE
./case.setup --reset
./case.build --clean-all
./case.build
 

fmiville

fmiville
New Member
I followed the instructions and get a new log file (in attachment).
 

Attachments

  • cesm.log.210511-154348.txt
    9 KB · Views: 8

fmiville

fmiville
New Member
I am still struggling with the comprehension of this log file.

Does CLM5.0 has a minimal system requirements ?

I have an issue with my MPI compiler I guess... but I have no idea where to look.

This piece of code works.
Code:
mpif90 fhello_world_mpi.F90 -o hello_world
mpirun -np 64 ./hello_world

How can I control if everything is fine with MPI ?
 
Top