The influence of the initial condition file, which is used to start the whole
assimilation, would persist only through the atmospheric state, because during the
forecasts the SSTs would come completely from the data ocean input file to CESM.
The IC file influence would diminish with time, as the assimilation blends the
observations with the influences of the data ocean and the ICs.
When we start an assimilation we typically see a "large" RMSE of the ensemble,
relative to the observations. As more observations are assimilated,
the RMSE will fall to some fairly stable value, at which point it has forgotten
about the ICs, and the ensemble error is in balance with the observational error.
If we've started from a very naive ensemble, the initial RMSE will be very large.
If we've started from something close to the observations (like, presumably, GOES5)
then the RMSE won't be as large, but it will still be larger than the steady values
later in the assimilation. It's generally safe to assume that an analysis like
GOES5 has its own biases, which are different from CESM's and it will take a few
assilation cycles for the assimilation to resolve those differences.
I don't have any experience with GOES5, so I don't know what dates are available.
Lots of observations at high altitude will definitely help overcome sub-optimal SSTs,
but there will be a persistent struggle between them. You could use your 2008 case
to test how big of an effect this is by running 2 assimilations with the 2 kinds
of SSTs you're considering, keeping everything else the same; time period, ICs,
observation set, .... Meanwhile, I've contacted a collaborator who has used WRF+DART
in field campaigns to find out how he handled SSTs, or whether it was even an issue,
maybe because his entire domain was over land. I'll let you know what he says.
Kevin