m_mineter@ed_ac_uk
Member
Hello
I'm porting CESM2 to a linux cluster using gnu compilers and openmpi1.10.1. We will be running on a 40 core node, but I hope to test on 16 cores
I'm using the machine name eddie and that seems to be recognised ok when I run the scripts_regression_tests.py
CIMEROOT is set up correctly (from .bashrc)
I've 3 questions please...
1. the batch system is Univa Grid Engine.
I've created .cime/*.xml files, which I attach. I've checked them against the xsd schema.
In running the scripts_regression_test, the new cases say:
"Batch_system_type is univa
ERROR: Did not find univa in valid values for BATCH_SYSTEM: ['nersc_slurm', 'lc_slurm', 'moab', 'pbs', 'lsf', 'slurm', 'cobalt', 'cobalt_theta', 'none']"
Have I configured univa wrongly in the config_batch.xml, or missed something else?
(Using univa Grid Engine is the same as the old Sun Grid Engine)
Attaching .cime/*.xml (with .txt attached )
Also the output to terminal from scripts_regression_tests.py > eddie_login_node_tests.txt 2>&1
And the output from ./describe_version >eddie-version.txt
2. We have login nodes that can run qsub, and worker nodes that can't run qsub (unless they script an ssh back into a login node to do the qsub )
Each job can last up to 2 days, so automated resubmission will be needed for CESM.. Can configuration allow resubmission/continuation jobs to work via the ssh ?!
3. In running scripts_regression_tests.py I find the test_pylint tests fail on the login nodes because it cant create more threads (see eddie_login_node_tests.txt)
Can I easily configure scripts_regression_tests.py so that it does no multithreading and I might be able to run the whole set of tests on the login node, if much more slowly?
Thanks for your attention!
Mike
I'm porting CESM2 to a linux cluster using gnu compilers and openmpi1.10.1. We will be running on a 40 core node, but I hope to test on 16 cores
I'm using the machine name eddie and that seems to be recognised ok when I run the scripts_regression_tests.py
CIMEROOT is set up correctly (from .bashrc)
I've 3 questions please...
1. the batch system is Univa Grid Engine.
I've created .cime/*.xml files, which I attach. I've checked them against the xsd schema.
In running the scripts_regression_test, the new cases say:
"Batch_system_type is univa
ERROR: Did not find univa in valid values for BATCH_SYSTEM: ['nersc_slurm', 'lc_slurm', 'moab', 'pbs', 'lsf', 'slurm', 'cobalt', 'cobalt_theta', 'none']"
Have I configured univa wrongly in the config_batch.xml, or missed something else?
(Using univa Grid Engine is the same as the old Sun Grid Engine)
Attaching .cime/*.xml (with .txt attached )
Also the output to terminal from scripts_regression_tests.py > eddie_login_node_tests.txt 2>&1
And the output from ./describe_version >eddie-version.txt
2. We have login nodes that can run qsub, and worker nodes that can't run qsub (unless they script an ssh back into a login node to do the qsub )
Each job can last up to 2 days, so automated resubmission will be needed for CESM.. Can configuration allow resubmission/continuation jobs to work via the ssh ?!
3. In running scripts_regression_tests.py I find the test_pylint tests fail on the login nodes because it cant create more threads (see eddie_login_node_tests.txt)
Can I easily configure scripts_regression_tests.py so that it does no multithreading and I might be able to run the whole set of tests on the login node, if much more slowly?
Thanks for your attention!
Mike