how to change the number of processors in CESM2

huazhen · Jul 7, 2019

I am trying to run CESM2 on our super computer. I use intel/2017.u2 compiler. I get the following job process messages when I am trying to validate a CESM port with prognostic components in http://esmci.github.io/cime/users_guide/porting-cime.html by running "./create_test --xml-category prealpha --xml-machine cheyenne --xml-compiler intel --machine spartan --compiler intel". And I contacted with system users about the (MaxCpuPerAccount) problem. They told me that my jobs are requesting 272 cores which are more than the limit(200). I have to reduce my core request to 200 cores or below, and then they will run. But I don't know how to change processors of each jobs like previous version CESM1.2 http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/usersguide/x715.html#case_conf_setting_pesI found the "env_mach_pes.xml" file in the output root /data/cephfs/punim0769/cesm/scratch/NCK.f19_g17.B1850.spartan_intel.allactive-defaultiomi.20190705_222158_70busqBut every time I run my script, it will create a new file in /data/cephfs/punim0769/cesm/scratch/. So the problem cannot be fixed if I just change the contents in this file. I think there should be some other files could control this setting.Can I change the default settings in path /home/huazhenl/my_cesm_sandbox/cime_config/config_pes.xml to reduce the core request? Do I need to change the settings in path /home/huazhenl/.cime/config_machines.xml? I will attach the following three files./data/cephfs/punim0769/cesm/scratch/NCK.f19_g17.B1850.spartan_intel.allactive-defaultiomi.20190705_222158_70busq/env_mach_pes.xml/home/huazhenl/my_cesm_sandbox/cime_config/config_pes.xml/home/huazhenl/.cime/config_machines.xml Any help is much appreciated. Thanks a lot. JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 9929460 physical NCK.f19_ huazhenl PD 0:00 13 (PartitionTimeLimit) 9929608 physical NCK.f19_ huazhenl PD 0:00 13 (PartitionTimeLimit) 9930100 physical NCK.f19_ huazhenl PD 0:00 13 (MaxCpuPerAccount) 9931908 physical PET_PM.f huazhenl PD 0:00 26 (MaxCpuPerAccount) 9929181 physical ERI.f09_ huazhenl PD 0:00 9 (MaxCpuPerAccount) 9930627 physical ERS_D.f0 huazhenl PD 0:00 9 (MaxCpuPerAccount) 9930640 physical IRT_Ld7. huazhenl PD 0:00 9 (MaxCpuPerAccount)

jedwards · Jul 7, 2019

The parameters of the batch system are set in cime/config/cesm/machines/config_batch.xmlhttps://github.com/ESMCI/cime/blob/master/config/cesm/machines/config_batch.xml#L227Shows the settings of queues for a particular system. In this case if the job runs on 18 nodes or less it will default to the share queue with a walltime of 6 hours.If you use more than 18 nodes it will go to the regular queue (because it's listed first) with a wallclock of 12 hours. In this case the premium and economy queues are defined but will not be used by default.

huazhen · Jul 8, 2019

Hi jedwards, Thanks a lot for your reply.Do you mean that I can adjust the number of jobmax from 18 to 8 or nodemax from 4032 to 8 in cime/config/cesm/machines/config_batch.xml? Will this reduce the core demand to below 200? This will only affect the speed of the jobs, and will not affect the running of the model, right?

huazhen · Jul 15, 2019

Hi jedwards,I have adjusted the number of jobmax from 18 to 8 and nodemax from 4032 to 8 in cime/config/cesm/machines/config_batch.xml as you suggested. But I still got the following issues like I mentioned before. The core request of each job is 273 or more which is more than the limit (200). It seem like the core request is not controlled by this file (cime/config/cesm/machines/config_batch.xml). Do you have any suggestions to reduce the core request of each dependency job? Thanks a lot. JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 10081983 physical NCK.f19_ huazhenl PD 0:00 19 (MaxCpuPerAccount) 10081034 physical ERS_Ld5. huazhenl PD 0:00 9 (MaxCpuPerAccount) 10081051 physical ERI.f09_ huazhenl PD 0:00 9 (MaxCpuPerAccount) 10082120 physical ERS_D.f0 huazhenl PD 0:00 9 (MaxCpuPerAccount) 10082307 physical IRT_Ld7. huazhenl PD 0:00 9 (MaxCpuPerAccount) 10083041 physical ERS_Ld5. huazhenl PD 0:00 9 (Resources) 10082971 physical ERS_Ld7. huazhenl PD 0:00 6 (Priority) 10084057 physical NCK_Ld5. huazhenl PD 0:00 6 (Priority) 10085645 physical PFS.f09_ huazhenl PD 0:00 9 (Priority)

jedwards · Jul 15, 2019

Using the full cheyenne test list to test your port may not be the best straegy. Have you run the ect test? You will need to individually tune compsets for your system and then you can set a default pe configuration using config_pes.xml

huazhen · Jul 16, 2019

Hi jedwards,I am following this link (http://www.cesm.ucar.edu/models/cesm2/python-tools/) to run the ect test. And it includes the following contents.NoteThese simulation runs must be created via script ensemble.sh and then be verified by ECT* The ensemble.sh script is located in cime/tools/statistical_ensemble_test But I can't find the ensemble.sh script in cime/tools/statistical_ensemble_test (https://github.com/ESMCI/cime/find/master). Do you have any suggestions to find or create the ensemble.sh script? Thanks a lot.

jedwards · Jul 16, 2019

That documenation is out of date - the scripts are in pyth9on https://github.com/ESMCI/cime/blob/master/tools/statistical_ensemble_test/ensemble.py

how to change the number of processors in CESM2

huazhen

Member

jedwards

CSEG and Liaisons

huazhen

Member

huazhen

Member

jedwards

CSEG and Liaisons

huazhen

Member

jedwards

CSEG and Liaisons