Accounting for model error in strong‐constraint 4D‐Var data assimilation

The strong‐constraint formulation of four‐dimensional variational data assimilation (4D‐Var) assumes that the model used in the process perfectly describes the true dynamics of the system. However, this assumption often does not hold and the use of an erroneous model in strong‐constraint 4D‐Var can lead to a sub‐optimal estimation of the initial conditions. We show how the presence of model error can be correctly accounted for in strong constraint 4D‐Var by allowing for errors in both the observations and the model when considering the statistics of the innovation vector. We demonstrate that, when these combined model error and observation‐error statistics are used in place of the standard observation error statistics in the strong‐constraint formulation of 4D‐Var, a statistically more accurate estimate of the initial state is obtained.


Introduction
Four-dimensional variational data assimilation (4D-Var) is a method for combining a time window of observations with a model. The output, known as the analysis, should provide a more accurate estimate of the state of the system at the start of the time window than either the observations or model can provide alone. 4D-Var is the data assimilation method of choice at many operational meteorological centres, for example to initialize numerical weather prediction (NWP) forecasts at the Met Office (Rawlins et al., 2007) and ECMWF (Rabier et al., 2000) and for reanalysis (Dee et al., 2011).
The model equations are often implemented as a strong constraint within the 4D-Var algorithm (SC4DVar). This assumes that the observations over the time window (accounting for the observation-error statistics) are consistent with the model if initialized with the true state. However, the error in the model is often non-negligible and so a model trajectory initialized from the truth may not be consistent with the observations. This has the effect that the 4D-Var scheme is unable to find an analysis in good agreement with the truth. The problem of model error is greater as the assimilation window is increased and the error at each time step accumulates.
There are multiple sources of model error. These include: • finite model resolution, which means that subgrid-scale phenomena cannot be modelled explicitly; • errors in subgrid-scale parametrizations; • errors in boundary conditions; • numerical errors arising from temporal discretization.
Weak-constraint 4D-Var (WC4DVar) relaxes the perfect model assumption by assuming model error is present at each time step (Sasaki, 1970). This allows WC4DVar to provide an estimate of both the initial state and the model error over the time window. However, in its implementation, it requires an estimate of the model-error statistics throughout the assimilation window (Trémolet, 2006). It is not simple to estimate the model-error covariances and in practice many estimates are only available in the space of the observations rather than the full model space, as required by WC4DVar (Todling, 2015).
An additional difficulty with WC4DVar is how to best use the estimate of the model error to correct the model in the resulting forecast. For example, how long is the estimate of model error valid for beyond the assimilation window? If the model is not corrected, then it is possible that the forecast will be worse than using strongconstraint 4D-Var (Trémolet, 2007). In practice, there also exists a third difficulty in implementing WC4DVar related to separating out the effects of model and observation bias (Lea et al., 2008).
This article takes an alternative approach to account for model error in 4D-Var. Instead of trying to correct the model, a method is developed to account for the model error when comparing the model evolution to the observations by combining the modelerror and observation-error statistics. In practice, this is seen to effectively inflate the observation errors over time, giving the observations nearer the end of the time window less weight in the analysis. The combined model-error and observation-error statistics can be estimated using samples of observation minus model differences. This is a similar technique to the use of covariance matching proposed by Dee (1995) and Desroziers et al. (2005) and has been used extensively to estimate the observation-error statistics only (e.g. Weston et al., 2014;Waller et al., 2016b). This new approach has many potential practical benefits over WC4DVar: • There is no need to explicitly define an estimate of the model-error covariance matrix. Instead an estimate of the matrix in observation space is required and can be obtained on-line. • The initial state alone is estimated, therefore the size and complexity of the problem is smaller than WC4DVar. • Although this method does not try to correct the model, the analysis of the initial state is consistent with the assumed (erroneous) model dynamics. Therefore, the chance of an initialization shock will be reduced.
We develop this new technique using linear theory, consistent with the assumptions regularly made in 4D-Var. However, we illustrate this method using two simple dynamical models; firstly a linear model and then a nonlinear model. We show the novel ideas developed within the linear model framework can be successfully applied to both linear and nonlinear dynamical models.
The paper is organized as follows. In section 2 we outline SC4DVar with a perfect model and then show the impact that the breakdown of the perfect model assumption has on the analysis and its error. In section 3, we show that combining the model-error and observation-error statistics does indeed restore the optimality of the analysis estimated with SC4DVar in the presence of model error. This can be shown to give a statistically more accurate analysis than using the sub-optimal SC4DVar which does not account for model error. The expression derived for the combined model-error and observation-error covariances requires an estimate of the elusive model-error statistics. In section 4 we formulate a method to estimate the combined model-error and observation-error statistics using a sample of observation-model misfits. In section 5 we introduce two dynamical systems, the linear advection equation and a nonlinear idealized coupled atmosphere-ocean model. Section 6 involves use of these numerical models to demonstrate the successful implementation of the theory developed in this article and investigate sensitivities of the results. Note that in these numerical experiments we only consider the effect of the model-error variances as this is more feasible for larger systems. Finally, we summarize and conclude our findings in section 7.

Strong-constraint 4D-Var
The aim of SC4DVar is to find the most probable state of a system given a background estimate and a time window of observations. A key underlying assumption of SC4DVar is that the model equations perfectly describe the evolution of the system. In this section we begin by introducing the notation to be used throughout the article.

Observations and background
Let the column vector x i ∈ R m represent the model state vector at time t i . Observations, y i ∈ R p i , at time t i are (linearly) related to the true model state, x t i , at time t i by where o i ∈ R p i represents the observation errors at time t i . These are assumed to be drawn from a normal distribution with mean zero and error covariance matrix R i ∈ R p i ×p i . The linear observation operator H i : R m −→ R p i maps the state from model space to observation space.
The background, x b , is related to the true state x t 0 at time t 0 such that where the column vector x b ∈ R m contains a prior estimate of each of the model state variables. The column vector b ∈ R m contains errors in the background model state and is assumed to be drawn from a normal distribution with mean zero and error covariance matrix B ∈ R m×m .

Strong-constraint 4D-Var with perfect model equations
We assume use of a linear model. The evolution of the state at time i − 1 to time i is given by Let us assume we have observations at N + 1 times throughout the assimilation window and write the 4D-Var cost function as where The hat notation represents vectors and matrices that are valid over the entire assimilation window, not just at individual time steps. For simplicity of presentation, we assume that the observation errors are not correlated in time, but the theory we develop is also applicable to the more general case. The minimum of (4) should provide the minimum variance estimate of the true model state at the initial time, x t 0 , if the Gaussian and linear assumptions are correct, and B and R correctly represent the covariances of x t 0 − x b and y − Hx t 0 respectively (Lewis et al., 2006).
The analysis which minimizes (4) is given by where K is the gain matrix given by K = B H T ( HB H T + R) −1 and d b is the innovation vector given by d b = y − Hx b . Therefore the gain matrix K determines the appropriate weighting of the observational information to be added to the background model state. The error covariance of the analysis in this case is given by (Lewis et al., 2006).

The effect of an erroneous model in strong-constraint 4D-Var
Let us now consider the case when the model is no longer perfect.
In this article the erroneous model equations take the form The model matrix M e {i−1}→i ∈ R m×m is the best known description of the evolution of the state vector from the previous time t i−1 to time t i , and is used to assimilate the data and to produce the subsequent forecast. We assume additive model error of a random nature is present at each time step, where to acquire the true model state where η i ∈ R m is a column vector containing the random model error at time t i . The vector of random model error is assumed to be unbiased and to have a Gaussian distribution such that η i ∼ N (0, Q i ), where Q i is the model-error covariance matrix at time t i . We can extend (8) to relate the true state at time i to the initial true state evolved using the erroneous model The linearity of the model (7) allows us to define the erroneous operator, H e ∈ R ( N i=0 p i )×m , as When erroneous model matrices of the form M e {i−1}→i , (7), are used to evolve the model state, the operator H which contains the perfect model dynamics is replaced with H e containing the erroneous model matrices in the 4D-Var cost function, (4). The difference between the true state and the observations evaluated using the erroneous model, y − H e x t 0 , no longer has a variance consistent with R.
Let o * = y − H e x t 0 . This can be evaluated, at time i, in terms of the observation error, o , and the model error, η, using (1), (3), (7) and (9): Equation (10) explicitly states that, at each observation time t i , the error in the comparison of observations and the model state initial conditions evolved with an erroneous model is a combination of observation error and model error. We next seek to obtain the statistics of this error term o * i . By taking the statistical expectation of the error term * oi (10), and using the assumptions that both the observation errors and model errors are distributed with a zero mean, it is clear that < o * i >= 0. The covariance of the combined error terms at observation times t i and t k respectively can be evaluated by taking the statistical expectation using (10) and assuming that both the model errors and observation errors are uncorrelated in time and the model and observation errors are uncorrelated between them. Evaluating the expectations in (11) leads to where Q * (i,k) are the terms present due to the errors in the model, A more detailed derivation of (12) and (13) is given in Appendix A.
We note that no model evolution is required in the comparison of observations and the model state at time t 0 , hence the observationerror covariance matrix R 0 fully describes the error statistics in the innovation at time t 0 . We can present the full combined modelerror and observation-error covariance matrix R * , containing the covariance submatrices R * (i,k) (12) as Note that the inclusion of time-correlated observation errors could also be included in the off block diagonal elements of (14), but for convenience we will continue to assume that the time correlations are only due to the accumulation of model error.
If the analysis scheme were to use R * to weight the observation term in (4), we would expect the assimilation system to be optimal as the correct error statistics are used. (This will be explored further in section 3.) However, if this model error, Q * , is not accounted for, the analysis estimated using SC4DVar is given by with the gain matrix given by and the innovation vector d e b = y − H e x b , analogous to (5) in the perfect model case. The corresponding error in the analysis is evaluated as We take the statistical expectations of e a to calculate the mean of the error in the analysis giving < e a >= 0, using the assumptions that the background error, observation errors and model errors all have a zero mean. The corresponding analysis-error covariance matrix is where Q * = R * − R, using the assumption that the background errors, observation errors and model errors are uncorrelated (Appendix B gives a more detailed derivation of (18)).

An alternative 4D-Var approach
In this section we derive an alternative 4D-Var approach that amends the 4D-Var cost function (4) in order to correctly account for the error statistics when unbiased random error is present in a model. We demonstrate that, by using a combined modelerror and observation-error covariance matrix to replace the observation-error covariance matrix in the 4D-Var cost function (4), we improve the accuracy of the analysis when error of a random nature is present in a model. Subsequently we use a simple scalar example to investigate the sensitivity of the improvement to analysis accuracy with respect to the size of the model error, background error and observation error.
To recapitulate, when a perfect model is used in the SC4DVar cost function (4), the error to be accounted for in the J o term is simply the observation-error covariance matrix R. However, when an erroneous model of the form (7) is used in the strongconstraint 4D-Var cost function (4), the error to be accounted for in the J o term is the combined model-error and observationerror covariance matrix R * given by (14). Compared with R, the matrix R * • contains an additional term, Q * (i,i) , as described by (13), on the block diagonal submatrices R * (i,i) . From (13) we see that this is an accumulation of the model error over the assimilation time window; • contains off-diagonal block covariance submatrices, Q * (i,k) , as described by (13). This is the presence of time correlations caused by the accumulation of error in the model trajectory.
We propose that use of the combined model-error and observation-error covariance matrix R * (14), as opposed to R, in the cost function (4), will statistically improve the analysis accuracy when random error is present in the model.

Improvements to analysis accuracy
The analysis when the erroneous model (7) is used within the SC4DVar cost function (4) and the error in the model is accounted for, by using R * as opposed to R, is with the gain matrix K * = B H e T ( H e B H e T + R * ) −1 and the innovation vector d * b = y − H e x b , which is the same as d e b defined previously. The corresponding error in the analysis is evaluated as * We take the statistical expectation of * a to calculate the mean of the error in the analysis giving < * a >= 0, using the assumption that the background errors, observation errors and model errors all have a zero mean. We next evaluate the analysis-error covariance matrix, using the assumption that the background errors, observation errors and model errors are uncorrelated (Howes, 2016). Replacing R by R * leads to the analysis-error covariance matrix A * (21) having the same form as the analysis-error covariance matrix of the SC4DVar estimate in the absence of model error, given by (6). This suggests that the analysis solution (19) is the optimal minimum variance estimate of the linear strongconstraint 4D-Var problem, when the model has error of a random nature at each time step, and therefore implies that the analysis x a * 0 (19) is more statistically accurate than the analysis x ae 0 (15). An important point to be noted is that when model error is not accounted for, the analysis error covariance matrix A e (18) is unbounded. As the model error increases, the diagonal of the matrix term K e Q * ( K e ) T present in (18) increases and therefore the error variances of the analysis variables increase. If the model error is large, it is feasible that the analysis is less accurate than the background. However the analysis-error covariance matrix A * (21) is bounded by the background-error covariance matrix B. As the model error increases, the analysis-error covariance matrix A * tends to the background-error covariance matrix B. Note that, although A * can tend to the background-error covariance matrix B, it cannot be equal A * = B for non-zero B and R * .
Model error can be regarded as a representivity-like error, which can be defined loosely as an error in the generalized operator H. The method we have developed can therefore be related to those developed to account for other sources of representivity error, such as errors that result from the discretized model state not being able to represent the fine scales that the observation measures (Hodyss and Nichols, 2015;Li et al., 2015). These two different types of representivity error can affect the comparison of observations with the model state mapped to observation space, although the source and structure of the two errors are very different, and so need to be treated separately.

Simple scalar example
We now investigate how changes in the accuracy of the model, observations and background respectively affect the increase in accuracy obtained when using combined model-error and observation-error statistics as opposed to observation-error statistics alone. Consider an erroneous linear scalar model over one time step of the form x i = β e i x i−1 , with the erroneous model constant β e i ∈ R describing the evolution of the model state from time t i−1 to time t i . To obtain the true model state at time t i , where η i is random error normally distributed around a zero mean with variance σ 2 qi . Consider a direct observation y i of the model state (observation operator h i = 1) with error normally distributed around a zero mean with variance σ 2 oi and a background model state x b at time t i−1 with error normally distributed around a zero mean with variance σ 2 b . We consider only two times; the background time t i−1 and the observation time t i , therefore to simplify notation we set i = 1. The analysis when the erroneous linear scalar model x 1 = β e x 0 is used within the strong-constraint 4D-Var cost function (4) and the error in the model is not accounted for is where the gain constant Using (14), we evaluate the combined error variance σ * 2 o = σ 2 o + σ 2 q . When the erroneous linear scalar model x 1 = β e x 0 is used within the strong-constraint 4D-Var cost function (4) and the error in the model is accounted for by replacing σ 2 o with σ * 2 o , the analysis is where the gain constant k In this scalar framework, it is simple to show that the analysiserror variances when not accounting for model error, σ e2 a , will always be greater than or equal to the analysis-error variances when the model error is correctly accounted for, σ * 2 a . The analysiserror variances when not accounting for model error can be evaluated using (18): The analysis-error variances when the model error is correctly accounted for can be evaluated using (21): The ratio of the error variance in the analysis outputs, σ e2 a /σ * 2 a , is where r 2 b = σ 2 b /σ 2 o , and r 2 q = σ 2 q /σ 2 o . This ratio (27) is plotted in Figure 1 as a function of r 2 b and r 2 q . We see that, as the magnitude of the model error variance increases, the difference between σ e2 a and σ * 2 a also increases as expected. We also see that the difference between σ e2 a and σ * 2 a increases initially as r 2 b increases. This is because, as the observation error becomes smaller than the background error, the difference between the two schemes and the weight they give the observations is more apparent. The peak in the value of σ e2 a /σ * 2 a as a function of r 2 b is seen to depend on the value of β e , because the observation is at a different time to the background, e.g. a dispersive model (β e < 1, Figure 1(a)) reduces the apparent background error at the observation time. However, as r 2 b tends to infinity, we see from (27) that σ e2 a /σ * 2 a tends to 1 as in this case the Kalman gains for methods k e and k * both tend to 1/β and so the analyses are identical. The proposed method relies on the specification of the modelerror covariance, which is assumed to change throughout time. In practice model-error statistics are difficult to estimate. However, a key advantage of the proposed method is that an explicit estimate of Q i is not needed, as the algorithm only needs the combined observation-and model-error covariances in the (often reduced) observation space. Therefore, we next present a method for estimation of the combined model-error and observation-error statistics required for specification in R * (14).

Estimation of the combined model-error and observationerror statistics
We now return to the general multi-dimensional problem. The definitions of observation error (1) and background error (2) along with (9) enables the innovation involving the erroneous model at each time t i to be defined as We calculate the expectation of the following innovation product, for times t i and t k using (28). The definition of R * (i,k) as defined in (11) allows us to deduce Assembling the entries at all observation times into the vector d * b we have where R * is defined by (14). Therefore if the evolution of the background-error covariance matrix throughout the assimilation time window can be computed, we have a way to explicitly estimate R * : where the tilde indicates that this is an estimate. For a more detailed derivation of the estimated combined model-error and observation-error covariance matrix R * (32), we refer to Howes (2016). Note that this estimation R * (32) assumes an accurate specification of the background-error statistics in B.
To gain a sample of innovation vectors d * b we require a sample from the observation error and background error (distributions of both can be assumed to be known) and from the model error (distribution unknown). One possible way to gain a sample of model error is to make use of ensemble prediction systems, such at those that have been developed at the Met Office and ECMWF, which represent random error in the model forecast using stochastic physics (Buizza et al., 1999;Shutts, 2015;Brankart et al., 2015). However, this alone may not be enough to represent the full model error.
To evolve the B-matrix to the time of the observations, we propose using the randomization technique to estimate the diagonal elements of HMBM T H T , as conducted previously in experiments at ECMWF using the atmospheric forecasting model (Andersson, 2003). This same methodology can be extended and applied to evaluate the diagonal elements of H e B H e T required in (32). Therefore use of (32) is potentially operationally feasible in areas of high observational frequency, as long as we restrict ourselves to estimating the diagonal elements of the combined error matrix R * only. In section 6, it is shown that neglecting the effect of model-error covariances still leads to more accurate analysis than not accounting for model error at all.

Dynamical models
In this section we introduce two dynamical models which will be used to demonstrate the methods we have developed in this article. Specifically we present the linear advection equation, followed by a nonlinear idealized coupled atmosphere-ocean model.

Linear advection equation
The one-dimensional linear advection equation is used to model the transportation of a scalar quantity u(x, t) carried along by a flow with constant speed v in one direction, where x and t are independent variables representing space and the time respectively. Although this model is far more simplistic than any operational model representing the dynamics of the atmosphere or ocean, the model dynamics (33) can be interpreted to represent an idealized description of a passive tracer transported in the atmosphere in one direction by a constant flow, e.g. water vapour carried along by a constant light breeze. The partial differential equation (PDE) (33) does not describe the shape of the passive tracer, only its movement in space and time. We use the spatial domain x ∈ [0, 10) and describe the shape of the passive tracer u(x, t 0 = 0) at the initial time as (Causon and Mingham, 2010), which is an exponential function centred around x 0 = 5. The PDE (33) can be solved analytically, subject to the initial conditions (34). Equation (33) is solved numerically over the time domain t ∈ [0, 1] with periodic boundary conditions. The Crank and Nicolson (1947) scheme is an implicit method that computes a second-order approximation of the spatial derivative and a first-order approximation of the time derivative. We use this unconditionally stable method, with the spatial step x = 0.1 and time step t = 0.1 resulting in a time-stepping approximation of the PDE (33).

Idealized coupled atmosphere-ocean model
The idealized coupled atmosphere-ocean model we use was developed by Molteni et al. (1993) and is given by with parameter values σ = 10, r = 30, b = 8/3, k = 0.1, = π/10, w * = 2 and coupling parameter α = 1 as specified by Dubois and Yiou (1999). With the coupling parameter α = 0, the atmosphere is represented by the Lorenz 63 equations (Lorenz, 1963) with the state variables x, y and z and the ocean is represented by two linear equations with the state variables w and v. However, with the coupling parameter α = 1, the atmosphere variables x and y are coupled with the ocean variables v and w. The coupled model (35) describes the relationship between convection in the atmosphere and the influences that the sea surface temperature (SST) has on the convection in the atmosphere, as interpreted by Dubois and Yiou (1999). The atmospheric variables describe properties in a layer of fluid of uniform depth in the atmosphere. The atmospheric state variable x is proportional to the intensity of convective motion. The temperature difference between ascending and descending currents is proportional to the atmospheric state variable y. The atmospheric state variable z is proportional to the distortion of the vertical temperature profile from linearity. The model variables w and v represent equatorial SST anomalies' influence on the global system. The coupling parameter α can be interpreted as heat flux at the atmosphere-ocean interface. We apply the Runge-Kutta second-order scheme (Stuart and Humphries, 1998) to provide a time-stepping approximate solution to the coupled ordinary differential equations (ODEs) (35) with the fixed time step t = 0.01.

Linear advection equation
We first perform experiments using the linear advection equation to demonstrate the success of the sampling method to estimate the combined observation-and model-error covariances. In this linear system, this matrix can also be calculated explicitly using (12) and (13).
Let the erroneous linear model be the time-stepping solution of the linear advection equation over the spatial domain x ∈ [0, 10), as described in section 5.1. The true model state at each time t i differs from the erroneous model state by random error η i , as defined by (8). For simplicity, we define the structure of the model-error covariance matrix Q i to be diagonal with variance σ q 2 = 0.01 at each model time step. The backgrounderror correlations are given by the second-order auto-regressive (SOAR) function with correlation length-scale L = 0.4, where the SOAR correlation function is given by where r k is the separation of two points. The backgrounderror variances are set to σ 2 b = 0.04. We assume we have direct observations y i of all spatial points every two time steps over an assimilation window length of eight time steps. The observationerror covariance matrices are specified to be R i = σ 2 o I (for i = 2, 4, 6, 8) with variance σ 2 o = 0.04.

Estimation of the combined error statistics
In this example, the combined model-error and observation-error statistics R * (i,i) ∈ R 100×100 at observation times t i (i = 2, 4, 6, 8) are calculated using the explicit equation (14), with the results shown in Figures 2(a)-(d)). These matrices maintain a diagonal structure at all observation times, due to the diagonal structure of Q and R and the linear nature of the model. At each subsequent observation time, the variances in the combined error matrix increase, in this case σ * 2 o = 0.06 at t 2 , σ * 2 o = 0.08 at t 4 , σ * 2 o = 0.10 at t 6 and σ * 2 o = 0.12 at t 8 . This represents the increase in uncertainty of the model trajectory throughout the assimilation window, due to the errors present in the model. We next estimate the same error covariance matrices R * (i,i) (i = 2, 4, 6, 8) with (32). To do this we need to set up a numerical environment where the required sample innovation data are available. We take the true initial conditions of the model state as defined in (34) and from this compute the true trajectory which contains an additive stochastic term as in (8). We repeat Table 1. The RMSE of the matrix elements between R * (i,i) and R * (i,i) for each observation time t i (i = 2, 4, 6, 8). This process gives us a sample size of 5000 innovation vectors (d * b ) i for the estimation of R * (i,i) ∈ R 100×100 at each observation time t i (i = 2, 4, 6, 8) in (32). The estimation is found to be very successful in representing the structure and values of the combined error covariance matrix. For example, the difference between the central (51st) row of R * (i,i) and R * (i,i) for each time t i (i = 2, 4, 6, 8) can clearly be seen in Figures 2(e)-(h). The overall root mean square errors (RMSEs) between R * (i,i) and R * (i,i) for each time t i (i = 2, 4, 6, 8) are shown in Table 1. These RMSEs are of an order of magnitude less than the specified error variances in B, R i and Q i and are only non-zero due to sampling error. Note that the larger the sample size, the smaller sampling error. Although the R * matrix we are estimating at each observation time only contains diagonal elements, we are not making use of this information and so a sample size of 5000 is relatively small in comparison to the 10 000 entries we are estimating.

Use of combined error statistics in 4D-Var
We now demonstrate that the replacement of the observationerror covariance matrix R, with combined error statistics in R * , leads to an analysis of greater statistical accuracy when random error is present in the linear advection equation. Note that in these experiments the time correlations in the accumulated model error are not taken into account. We use the numerical set-up as described previously at the start of section 6.1. We Table 2. List of data assimilation conditions for error covariance matrices used in strong-constraint 4D-Var with the linear advection equation with random error. Both Q i and R i are diagonal matrices, whereas B is constructed using the SOAR function with correlation length-scale L = 0.4.

Conditions Covariance Variance matrix
define two data assimilation methods which will be performed and compared.
• Method 1: the evaluation of the analysis x ae 0 using (15), i.e. not accounting for model error, The RMSE for the sample of analysis outputs from Method 1 and Method 2 are calculated and compared. The resulting analysis RMSEs, using the three conditions as specified in Table 2, are shown in Figure 3  the increase in uncertainty of the model trajectory throughout the assimilation window, due to the errors present in the model. Therefore the data assimilation scheme puts less weight on the comparison of the observations with the model evolved state as time increases and hence allows a more accurate estimation of the initial conditions. We illustrate the analysis outputs (Method 1 and Method 2) from one data assimilation cycle (from the sample) in Figure 4 for conditions B. From this figure we see that by accounting for model error (comparing dashed and dash-dot lines) the shape of the initial function is smoother.

Sensitivity of results
Work in section 3.2 showed that, for a scalar model, there is a more significant increase in analysis accuracy (of x a * 0 compared with x ae 0 ) when the observations increase in accuracy (in comparison with the background accuracy) and when the size of the model error increases. These conclusions are not limited to models of a scalar nature and we demonstrate that these properties also hold when using the linear advection equation with random error in 4D-Var. Firstly, if we reduce the standard deviation of the observation errors by a factor of five so that R i = 0.0016I (i =2, 4, 6, 8), as shown by conditions B in Table 2, the corresponding combined error covariance matrix at each observation time is diagonal with the variances in the combined error matrix increasing with time. In this case σ * 2 o = 0.0216 at t 2 , σ * 2 o = 0.0416 at t 4 , σ * 2 o = 0.0616 at t 6 and σ o * 2 = 0.0816 at t 8 . The resulting analysis RMSE for Method 1 and analysis RMSE for Method 2 are shown in the central two bars of Figure 3(a). These results demonstrate that an increase in observation accuracy leads to a more significant increase in analysis accuracy, when accounting for the model error as opposed to not (in comparison to conditions A). An interesting remark is that, when the model error is not accounted for, this increase in observation accuracy causes a degradation in the analysis x ae 0 , as can be seen comparing Method 1 results from experiment B with Method 1 results from A. When Method 1 is used, a decrease in observation variance results in an analysis in which the trajectory of the erroneous model is confined to be closer to the observations, even at the end of the assimilation window, and so the effect of the model error is magnified. However, when accounting for the model error in the analysis x a * 0 in Method 2, an improvement is made to the analysis when the observations are of increased accuracy. With Method 2 the analysis is able to benefit from the increased accuracy of the observations.
Next we increase the standard deviation of the model error variance by a factor of two so that Q i = 0.04I (i = 1, ...., 8), as stated by conditions C in Table 2. The corresponding combined error covariance matrix at each observation time is diagonal with the variances in the combined error matrix increasing over time. In this case σ * 2 o = 0.12 at t 2 , σ * 2 o = 0.20 at t 4 , σ * 2 o = 0.28 at t 6 and σ * 2 o = 0.36 at t 8 . The resulting analysis RMSEs are shown in Figure 3(a) (right two bars), where the analysis x a * 0 from Method 2 is again more accurate than when not accounting for the model error in the Method 1 analysis x ae 0 . When not accounting for this significant increase in model error, there is a notable decrease in analysis accuracy. However, when accounting for this significant increase in model error, using the combined error covariance matrix, this has minimal effect on the accuracy of the analysis x a * 0 , as can be seen comparing the analysis RMSE from experiment A with the analysis RMSE from experiment C.
In Figure 3(b) it is seen that, despite the increased accuracy of the analysis, the RMSE at the end of the assimilation window is slightly worse for Method 2 than for Method 1. This is because the analysis derived using Method 1 is more constrained to the observations at the end of the assimilation window than the analysis derived using Method 2. In the next section we will show the implications of this for the accuracy of a forecast beyond the assimilation window using the nonlinear model.
We have demonstrated the work in section 3.2 for a scalar model, showing that there is a significant increase in analysis accuracy when the observations increase in accuracy (in comparison with the background accuracy) and when the size of the model error increases; this also holds when using a non-scalar erroneous linear model. We next extend our investigation to use an erroneous model of a nonlinear nature.

Idealized coupled atmosphere-ocean model
We now demonstrate that use of combined model-error and observation-error statistics can account for random error in a model of a nonlinear nature and hence improve the analysis accuracy. We use the discretized solution to the idealized coupled atmosphere-ocean model, as described in section 5.2, as our nonlinear erroneous model of the form where x i = (x i y i z i w i v i ) T is the model state vector. The true model state can be obtained at time t i as where the vector of model error η i ∼ N (0, Q i ). The forms of the erroneous model (37) and true model (38) are the same as (7) and (8) used in derivation of both the combined model-error and observation-error covariance matrix R * (14) and the estimated combined error covariance matrix R * (32), with the exception that here nonlinear model equations are used, as opposed to linear model matrices. One of the key objectives in this section is to show that the theory developed with linear models is also successfully applicable using models of a nonlinear nature. We have discussed the capabilities of NWP centres, such as ECWMF, to estimate the diagonal entries of the background model-error covariance matrix evolved using the model matrix and subsequently mapped to observation space using the randomization method. If our developed method to compute R * (32) were to be implemented operationally using similar randomization techniques, then only the diagonal elements would be specified in the combined error covariance matrix. Therefore, we wish to show in our numerical experiments with the idealized erroneous model (37) that, even when only the diagonal elements of the combined error covariance matrix are calculated and used within the data assimilation process, improvements to the analysis accuracy can be obtained. Specifically, this ignores the presence of time and multivariate cross-correlations in both the observation error and model error.
For experiments in this section we use an assimilation window length of 50 time steps. We define the true initial conditions:  1, 2, ..., 50) to be diagonal, with variances along the diagonal set to 0.02,

Estimation of the combined error statistics
We compare the diagonal entries of the combined error covariance matrix R * computed using (14) with the diagonal entries of R * estimated using (32) with sample innovation data. The elements of R * calculated directly with (14) require linearization of the nonlinear model (37) around the true model state trajectory, while the diagonal entries of R * are estimated with (32), which requires linearization of the nonlinear model (37) which is conducted around the background model state trajectory. The estimation of R * i uses a sample size of 1000 innovations at each observation time t i (i = 10, 20, 30, 40, 50). Figure 5 compares the diagonal entries (variances) of the three matrices R, R * and R * at the observation times t i (i = 10, 20, 30, 40, 50). The observation error covariance matrix R i is static throughout the window, whereas the variances x y x y x y x y x y x y x y . Analysis RMSE (left column), the subsequent RMSE of the analysis trajectories over the assimilation window (middle column) and the RMSE of the forecast trajectories beyond the assimilation window over 50 time steps (right column). Results are from applying Methods 1, 2 and 3 over a sample of 100 data assimilation runs using the covariance specified in Table 3 for ( in the combined error covariance matrix R * computed directly with the formulae (14) are significantly larger as time increases, accounting for the uncertainty in the model state trajectories caused by the errors in the model. Note that the variances estimated in the combined model-error and observation-error covariance matrix R * do not always increase with time, as can be seen for the atmospheric variables x, y and z in Figure 5. Whether R * or R * contain the most appropriate combined model-error and observation-error statistics to use in the 4D-Var cost function is an area of future work. The estimate R * uses a sample of innovations and therefore better recognises the spread of the model error at each specific observation time. However, the estimate R * also uses the linearization of the nonlinear model around the background model state trajectory (as opposed to the true model trajectory) and this could have a detrimental effect on the estimation of the combined statistics. Another key point to note is that the variances evaluated in R * and estimated in R * are much closer in value for the ocean variables than the atmospheric variables. The reason for this is that the theory for the derivation of R * (14) and estimation of R * (32) is based on the use of linear models and the ocean variables w and v are of a less chaotic nature than the atmospheric variables x, y and z, which are of a highly nonlinear chaotic nature. In the next section we perform strong-constraint 4D-Var, firstly not accounting for the error in the erroneous idealized coupled model, and compare the analysis outputs with strong-constraint 4D-Var performed when accounting for the error in the model.

Use of combined error statistics in 4D-Var
We define three data assimilation methods that vary depending on the error covariance matrix used in the J o term of the nonlinear 4D-Var cost function.
• Method 1: the evaluation of the analysis x ae 0 using the nonlinear 4D-Var cost function with no changes; • Method 2: the evaluation of the analysis x a * 0 by replacing R in the nonlinear 4D-Var cost function with the diagonal entries of the combined error covariance matrix R * (14); • Method 3: the evaluation of the analysis x a * 0 by replacing R in the nonlinear 4D-Var cost function with the diagonal entries of the estimated matrix R * (32) from sample innovation data.
In the left column of Figure 6, the analysis RMSE from a sample of 100 analyses is plotted when using the three different methods for each state variable. The five different covariance conditions are specified in Table 3 (note that Conditions I correspond to the covariances used in the previous subsection). The results show, in general, a significant reduction in analysis error for all model state variables when combined model-error and observation-error statistics are used in the strong-constraint 4D-Var cost function, compared with observation-error statistics only. When compared with the observation-error variances (used in Method 1), both the error variances in R * (Method 2) and the estimated error variances in R * (Method 3) give less weight to the comparison of the model evolved state with observations, due to the uncertainty in the model trajectory. Let us focus first on Figure 6(a), Conditions I. The reduction in analysis RMSE, when accounting for the model error, is largest for the atmospheric variables y and z. We hypothesise that this is down to two factors: the first is the ratio between the background-error and observation-error variances and the second is the size of the model error. We will investigate this further in section 6.2.3. We find that accounting for errors in the model with the combined error statistics reduces the number of minimization iterations required to reach the tolerance level. This is demonstrated in this Table 3. List of error covariance matrices used in strong-constraint 4D-Var with the erroneous idealized coupled atmosphere-ocean model. case with the number of minimization iterations reducing from 12 for Method 1, to 10 for both Method 2 and Method 3. This is because the increase in variances in the 4D-Var cost function leads to the erroneous model trajectory no longer being so tightly constrained to the observations, and therefore a solution is more easily found. This work can be related to that of Haben et al. (2011), where it was shown that the 4D-Var problem becomes better conditioned, and therefore involves fewer minimization iterations, with increased observation error variances. An increase in analysis accuracy does not necessarily lead to an increase in forecast accuracy. This is demonstrated in Figure 6, which shows the RMSE of the analysis trajectories throughout the assimilation window (middle column) and the RMSE of the forecast trajectories beyond the assimilation window over 50 time steps (right column). When we include the model-error statistics in the combined error covariance matrix, we are letting the analysis model trajectory depart further from the observations (which are measuring the truth). Even if the true initial conditions were used, we would expect a poor forecast due to the use of the erroneous model. In contrast, SC4DVar when not accounting for model error tries to fit all observations throughout the window, and so the poor analysis at the initial time compensates for the model error to give an improved forecast at a later time; this effect has also been noted by Fowler and Lawless (2016) and Griffith and Nichols (2000).

Sensitivity of results
Work with an erroneous scalar model in section 3.2 showed that there was the most improvement to the analysis accuracy when replacing the observation-error statistics in the cost function with combined error statistics, in the presence of large model-error variance and a large background-error variance in comparison with the observation-error variance. Next we investigate whether these characteristic features hold with use of the erroneous nonlinear model (37) in SC4DVar.
We investigate the effect of increasing and decreasing the size of the model error on the analysis accuracy. We previously defined Q i at each time t i using Conditions I in Table 3. We reduce the standard deviations of the model error for each of the model state variables by a factor of two and label these Conditions II, whereas in Conditions III we increase the model error standard deviations by a factor of two. Conditions IV are the initial model error standard deviations increased by a factor of five. For each of the Conditions II, III and IV we recalculate the diagonal entries in R * i at observation times (i =10, 20, 30, 40, 50) with a sample size of 1000 and subsequently conduct 100 strong-constraint 4D-Var runs. Each individual data assimilation run uses independent model error vectors, vectors of observations and background vectors and minimizes the nonlinear 4D-Var cost function using Methods 1, 2 and 3. This produces 100 analysis vectors for each method. The results are shown in Figures 6 Comparing the levels of model error within the data assimilation process, where Conditions II is lowest and is Conditions IV highest, we demonstrate that the larger the model error, the more significant the increase in accuracy is when accounting for the model error (Methods 2 and 3), as opposed to when not (Method 1). The number of iterations that the minimization algorithm performs is lower when accounting for model error than when not accounting for model error. We also note, similarly to the previous experiment, that an increase in analysis accuracy does not necessarily lead to an increase in forecast accuracy. In Figures 6(b)-(d), it is seen that the larger the model error, the greater the degradation in the forecast when using Methods 2 and 3 than with Method 1.
The last of these experiments was conducted with very large model error variances (Conditions IV in Table 3). The decrease in analysis RMSE, when using the estimated combined error variances in Method 3, is very significant (Figure 6(d)). We re-emphasise that this remarkable improvement to the analysis accuracy is due to the specification of the diagonal entries in the estimated matrix R * (32) which did not require explicit knowledge of the model-error statistics in Q i . We also have disregarded any cross-covariance information in R * , as we are only using the diagonal entries in these experiments. The impact of including this information in R * could produce even further improvement in the analysis accuracy and is an area of further work.
The ratio between the accuracy of the background error and observation error is also important when investigating the difference between the accuracy of the analysis obtained when accounting for model error, as opposed to not. We demonstrate this property by both increasing the accuracy of the background model state (by reducing the standard deviations in B by a factor of five) and decreasing the accuracy of the observations (by increasing the standard deviations in R i by a factor of five; Conditions V in Table 3). The observations are now far less accurate than the background. We set Q i back to Conditions I as stated in Table 3. We recalculate the diagonal entries of R * i , with the error covariance matrices as specified in Conditions V using a sample size of 1000. We subsequently conduct 100 strong-constraint 4D-Var runs for Methods 1, 2 and 3. Each individual data assimilation run uses independent model error vectors, vectors of observations and background vectors. This produces 100 analysis vectors for each method. Figure 6(e) shows that, in general, accounting for the error in the model can still produce an analysis of higher statistical accuracy, however the significance of this increase is minimized when the ratio r = σ 2 b /σ 2 o is significantly reduced for each of the model state variables. For Conditions I, the ratio r = σ 2 b /σ 2 o for each of the model state variables x, y, z, w and v is 1.11, 3.33, 4.20, 27.5 and 13 respectively. However, for Conditions V the ratios r are far smaller at 0.002, 0.005, 0.007, 0.044 and 0.0208 for each of the respective model state variables x, y, z, w and v.

Conclusion
In this article we have developed a SC4DVar method which is able to account for model error over the assimilation window. Unlike WC4DVar, this method does not need an explicit estimate of the model-error covariances and does not try to estimate the model error but instead allows the analysis to be less constrained by observations at the end of the assimilation window. In doing so, this new method is able to find a more accurate analysis than SC4DVar when model error is not accounted for.
The new method accounts for model error by weighting the mismatch between the observations and the evolved model state using a combined model-and observation-error covariance, R * rather that the observation-error covariances alone, R. In theory this provides an optimal estimate of the true initial conditions, satisfying the assumption that this is the minimum variance estimate for a linear system.
Use of these combined error covariance statistics requires specification of the model-error statistics, which are often unknown. Therefore, in section 4 we formulated a method to estimate the combined model-error and observationerror covariance matrix using a sample of backgrounds and observations. This follows a similar methodology to the use of covariance innovation consistency diagnostics (Dee, 1995;Desroziers et al., 2005;Ménard, 2016) for estimating observationerror variances alone. The difference here is that we need a window of observations to allow also for an estimate of model error. There are many potential difficulties to using such estimates of the error covariances in practice. To use the sampled estimate of the combined model-and observation-error covariance it is necessary to subtract an estimate of the evolved backgrounderror covariances. This could potentially be estimated using the randomization method in which a sample from a standard normal distribution is transformed using the square root of the background-error covariances matrix and than propagated forward to the time of interest using the (potentially nonlinear) model. The sensitivity of the new method to this estimate of HB H T should be investigated. Another potential issue is that the estimate of R * relies on the accuracy of B which itself is difficult to quantify. This is a known issue when attempting to estimate R alone using samples of background minus observations (Ménard, 2016;Waller et al., 2016a). The sampling method also relies on a sample of the model error. We anticipate that the use of stochastic physics alone may not be enough to represent all sources of model error. Assuming some degree of homogeneity and isotropy in the model error, an alternative approach could be to estimate R * off-line allowing for averaging over time and space. This will be investigated in future work. This new method was demonstrated on two dynamical systems, the linear advection equation and a nonlinear idealized coupled atmosphere-ocean model. In these numerical experiments, only the accumulated model-error variance was accounted for, as this is more feasible. However, it was shown that, when model error of a random nature is present at each time step, replacing the observation-error covariance matrix with combined modelerror and observation-error variances statistically improves the analysis accuracy over a number of sample experiments. It was also demonstrated that an improved analysis does not necessarily lead to improvements in the forecast accuracy, due to the erroneous model. Therefore, this methodology could be best suited to the application of reanalysis, as opposed to forecasting.
In the experimental set-up, only the diagonal elements of R * were estimated and used. Future work should look at the sensitivity of the method to the use of spatial/multivariate crosscovariances, and to temporal cross-correlations.