Problem of increasing processing speed of CESM2.1.3

ykp990521

ykp990521
Member
While running B1850, f19_g17 case, my processing speed remains slow no matter how I change the pelayout. I let the components run in sequence with ROOTPE=0, NTHRDS=1 for all, increased NTASKS from 168 (28pes/node) to 420 and found that the processing speed remain less than 5 years/wallclock day.
Then I changed the OCN to be ran concurrently with other components, I found the best performance is NTASKS_OCN=168 with other NTASKS=336 under NTHRDS=1, the speed is almost 9 years/wallclockday. However, setting NTASKS_OCN=168, other NTASKS=336 as a reference, no matter how I increase the NTASKS of OCN or other components, the speed remains lower than 9.5 years/day. I have checked the timing provided by CESM2's official website and found that they could process with more than 20yrs/d with more than 1000 or even 2000pes at B1850 f19_g17, does anyone know why I couldn't get a higher speed by increasing pes? (My total pes is only no more than several hundreds and the speed won't increase!)
Thanks a lot! Attached is a cpl log for NTASKS=336 for components other than OCN while NTASKS_OCN=168.
 

Attachments

dbailey

CSEG and Liaisons
Staff member
Load balancing is always tricky. What machine are you on? I have moved this to the infrastructure subforum.
 

jedwards

CSEG and Liaisons
Staff member
The coupler log isn't very helpful here. Please provide the output of ./pelayout and a cesm_timing file.
 

ykp990521

ykp990521
Member
The coupler log isn't very helpful here. Please provide the output of ./pelayout and a cesm_timing file.
1635205963124.png
I could only find the 3 timing profiles in the timing directory, however, they don't look like the ones provided by NCAR official website, do you know where I can find the ones look like NCAR timing?
 

Attachments

jedwards

CSEG and Liaisons
Staff member
Look for examples at cesm/cime_config/config_pes.xml The lnd and ice can run concurrently after atm, that is:
ntasks_lnd=168,ntasks_rof=168,ntasks_ice=168,rootpe_ice=168
 

ykp990521

ykp990521
Member
Look for examples at cesm/cime_config/config_pes.xml The lnd and ice can run concurrently after atm, that is:
ntasks_lnd=168,ntasks_rof=168,ntasks_ice=168,rootpe_ice=168
Thanks a lot, changing pelayout (running concurrently) did increased a little, but the important issue confusing me is why I couldn't increase my speed by adding pe number? The NCAR official website could use more than 1000 or 2000 pes to run while keeping the speed high (more than 20 yrs/d with all components active), but my speed will drop when NTASKS is only 200-300.
 

ganbaranaito

takufuu
Member
Hello, I thought it may be a common issue. I met the same problem in the NUIST supercomputer. I also ran B1850 f19_g17 case for my purpose. If I ran in consecutive processing with total ntasks are 168, the model's output speed is about 4(model years)/day. Then I increase ntasks to 196, the speed will become slower than 168. Similarly, I also let ocean model be parallel with other models. Ocean model's ntasks are 140, other models' ntasks are also 140, output speed is about 6.5 (model year)/day. Then I increase both ntasks to 168, the speed increase is negligible. It is about 7(model year)/day. (nthrds are all set to 1)
 
Back
Top