Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

how to change the number of processors in CESM2

huazhen

Member
I am trying to run CESM2 on our super computer. I use intel/2017.u2 compiler. I get the following job process messages when I am trying to validate a CESM port with prognostic components in http://esmci.github.io/cime/users_guide/porting-cime.html by running "./create_test --xml-category prealpha --xml-machine cheyenne --xml-compiler intel --machine spartan --compiler intel".  And I contacted with system users about the (MaxCpuPerAccount) problem. They told me that my jobs are requesting 272 cores which are more than the limit(200). I have to reduce my core request to 200 cores or below, and then they will run. But I don't know how to change processors of each jobs like previous version CESM1.2 http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/usersguide/x715.html#case_conf_setting_pesI found the "env_mach_pes.xml" file in the output root /data/cephfs/punim0769/cesm/scratch/NCK.f19_g17.B1850.spartan_intel.allactive-defaultiomi.20190705_222158_70busqBut every time I run my script, it will create a new file in /data/cephfs/punim0769/cesm/scratch/. So the problem cannot be fixed if I just change the contents in this file. I think there should be some other files could control this setting.Can I change the default settings in path /home/huazhenl/my_cesm_sandbox/cime_config/config_pes.xml to reduce the core request? Do I need to change the settings in path /home/huazhenl/.cime/config_machines.xml? I will attach the following three files./data/cephfs/punim0769/cesm/scratch/NCK.f19_g17.B1850.spartan_intel.allactive-defaultiomi.20190705_222158_70busq/env_mach_pes.xml/home/huazhenl/my_cesm_sandbox/cime_config/config_pes.xml/home/huazhenl/.cime/config_machines.xml Any help is much appreciated. Thanks a lot.              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)           9929460  physical NCK.f19_ huazhenl PD       0:00     13 (PartitionTimeLimit)           9929608  physical NCK.f19_ huazhenl PD       0:00     13 (PartitionTimeLimit)           9930100  physical NCK.f19_ huazhenl PD       0:00     13 (MaxCpuPerAccount)           9931908  physical PET_PM.f huazhenl PD       0:00     26 (MaxCpuPerAccount)           9929181  physical ERI.f09_ huazhenl PD       0:00      9 (MaxCpuPerAccount)           9930627  physical ERS_D.f0 huazhenl PD       0:00      9 (MaxCpuPerAccount)           9930640  physical IRT_Ld7. huazhenl PD       0:00      9 (MaxCpuPerAccount) 
 

jedwards

CSEG and Liaisons
Staff member
The parameters of the batch system are set in cime/config/cesm/machines/config_batch.xmlhttps://github.com/ESMCI/cime/blob/master/config/cesm/machines/config_batch.xml#L227Shows the settings of queues for a particular system.  In this case if the job runs on 18 nodes or less it will default to the share queue with a walltime of 6 hours.If you use more than 18 nodes it will go to the regular queue (because it's listed first) with a wallclock of 12 hours.  In this case the premium and economy queues are defined but will not be used by default.  
 

huazhen

Member
Hi jedwards, Thanks a lot for your reply.Do you mean that I can adjust the number of jobmax from 18 to 8 or nodemax from 4032 to 8 in cime/config/cesm/machines/config_batch.xml? Will this reduce the core demand to below 200? This will only affect the speed of the jobs, and will not affect the running of the model, right?
 

huazhen

Member
Hi jedwards,I have adjusted the number of jobmax from 18 to 8 and nodemax from 4032 to 8 in cime/config/cesm/machines/config_batch.xml as you suggested. But I still got the following issues like I mentioned before. The core request of each job is 273 or more which is more than the limit (200). It seem like the core request is not controlled by this file (cime/config/cesm/machines/config_batch.xml). Do you have any suggestions to reduce the core request of each dependency job? Thanks a lot.             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)          10081983  physical NCK.f19_ huazhenl PD       0:00     19 (MaxCpuPerAccount)          10081034  physical ERS_Ld5. huazhenl PD       0:00      9 (MaxCpuPerAccount)          10081051  physical ERI.f09_ huazhenl PD       0:00      9 (MaxCpuPerAccount)          10082120  physical ERS_D.f0 huazhenl PD       0:00      9 (MaxCpuPerAccount)          10082307  physical IRT_Ld7. huazhenl PD       0:00      9 (MaxCpuPerAccount)          10083041  physical ERS_Ld5. huazhenl PD       0:00      9 (Resources)          10082971  physical ERS_Ld7. huazhenl PD       0:00      6 (Priority)          10084057  physical NCK_Ld5. huazhenl PD       0:00      6 (Priority)          10085645  physical PFS.f09_ huazhenl PD       0:00      9 (Priority) 
 

jedwards

CSEG and Liaisons
Staff member
Using the full cheyenne test list to test your port may not be the best straegy.   Have you run the ect test? You will need to individually tune compsets for your system and then you can set a default pe configuration using config_pes.xml
 

huazhen

Member
Hi jedwards,I am following this link (http://www.cesm.ucar.edu/models/cesm2/python-tools/) to run the ect test. And it includes the following contents.NoteThese simulation runs must be created via script ensemble.sh and then be verified by ECT* The ensemble.sh script is located in cime/tools/statistical_ensemble_test But I can't find the ensemble.sh script in cime/tools/statistical_ensemble_test (https://github.com/ESMCI/cime/find/master). Do you have any suggestions to find or create the ensemble.sh script? Thanks a lot.
 
Top