computation speed from CICE4 to CICE6

wangsz@fio_org_cn · Jun 2, 2022

After I upgrade the sea ice component of CESM1.2.1 from CICE4 to CICE6 (and keep other components untouched), the computation became much slower (20 model years per wall time day vs. 35 model years before the upgrade). According to the timming information, the sea ice component is the bottle neck. I wonder if this is a normal situation for CICE6? I did not find online info about computation speed comparision among CICE versions.

I am using the f19_g16 grid on my own server.

Thanks in advance.

dbailey · Jun 9, 2022

I don't believe we have done a comprehensive timing of CICE6 versus CICE4. Part of this is the default physics options may have changed. Also, CICE6 is dynamically allocated where CICE4 was statically allocated. Did you try different computational decompositions and pe layouts?

dupontf · Aug 31, 2022

jumping into this discussion, I have tested CICE/main (6.4.0) standalone today with different MPI partitions for a large domain 4320x3604 (ORCA12) on a icelake cluster, compiled with inteloneapi-2022.1.2 and associated MPI library and obtained this plot which shows the speedup relative to 2160 computational cores. The performance seems to stall after 3500 cores, that is between 2840 and 3996 cores or between a local block size of 108x51 and 59x67, although 59x67 does not appear to be that small to me. I suspect that global MPI operations may be the issue (given the large number of total cores) although I am running with halos on and add_mpi_barriers = .true. Has this been documented for massively parallel architectures?

dupontf · Sep 9, 2022

I realized that the diagnostics were "on" at each timestep. Removing them does help a bit for the highest cpu test, but does not change the fact that that test suffers from MPI communications overhead.

dupontf · Sep 28, 2022

on a better note, If I use a distribution of 80x54=4320, I am back in business (i.e., closer to the diagonal curve). it sounds like the 74x54=3996 did cause something strange (one node was not fully used but not sure why it could be a limitation)

computation speed from CICE4 to CICE6

wangsz@fio_org_cn

New Member

dbailey

CSEG and Liaisons

dupontf

Frédéric Dupont

New Member

dupontf

Frédéric Dupont

New Member

dupontf

Frédéric Dupont

New Member