On the representation error in data assimilation

Representation, representativity, representativeness error, forward interpolation error, forward model error, observation‐operator error, aggregation error and sampling error are all terms used to refer to components of observation error in the context of data assimilation. This article is an attempt to consolidate the terminology that has been used in the earth sciences literature and was suggested at a European Space Agency workshop held in Reading in April 2014. We review the state of the art and, through examples, motivate the terminology. In addition to a theoretical framework, examples from application areas of satellite data assimilation, ocean reanalysis and atmospheric chemistry data assimilation are provided. Diagnosing representation‐error statistics as well as their use in state‐of‐the‐art data assimilation systems is discussed within a consistent framework.


INTRODUCTION
At its core, data assimilation relies on comparing each available observation of a variable with a prior estimate of the variable, generally taken from a discrete dynamical model, to deduce a revised estimate on the model grid. In the Bayesian formulation of data assimilation, this process of comparing and updating requires knowledge of, or assumptions on, the error characteristics or uncertainty properties of the observed value and prior estimate. Among the difficulties encountered in this process is the fact that the discrete geophysical model is not able to represent all of the spatial and temporal scales, nor all the physical processes, of the observed geophysical state and additional approximations are needed to represent the equivalent of any observation. The prior estimate may therefore differ substantially from the observed value, even in the absence of any measurement (or instrument) error, and this difference results in a perceived error that must be accounted for in order to update the prior estimate properly. For example, a perfect (measurement-error-free) observation of surface pressure at the centre of a strong tropical cyclone will typically be much lower than the forecast value from a numerical weather prediction model, resulting at least in a perceived bias to be estimated in some way.
This basic difference between the modelled representation of an observation and what is actually observed has generally been handled by introducing what has been variously called representation, representativity or representativeness error in the literature. Thus, the observation error has two components, the representation error and the measurement error. The aim of this article is to review some of the literature that has grown around representation error in recent years and, in so doing, to explain and consolidate the terminology that has evolved in different disciplines. The terminology has sometimes been used inconsistently between different authors, so we first introduce the basic terminology used in this article and discuss some of the variations that have appeared in the literature. Representation error is distinct from, and does not include, the measurement (or instrument) error, which is the error associated with the measuring device alone, independently of how the measurements are used, for instance in the data assimilation process. One component of the representation error arises due to a mismatch between the scales represented in the observations and the model fields. For instance, an observation may represent the value of a geophysical variable at a single point in space and time, whereas the model prior will represent a spatial and temporal average, depending, among other things, on the model's discretization. The observation and the prior will then differ, in a way that depends on the true geophysical variability at scales different from those represented by the model. An example is illustrated in Figure 1, where the satellite image of a cyclone shows much more spatial structure than its coarser resolution model counterpart, in which deep convection is parametrized. This component of representation error is referred to as the error due to unresolved scales and processes.
Another component of representation error arises from the observation operator. Particularly for remote-sensing observations, such as satellite radiances, the observed variables are not usually state variables. In this case, approximations are typically involved in formulating the observation operators needed to pass from state space to observation space. The resulting observation-operator error, or forward model error, also contributes to the representation error.
Finally, quality control or pre-processing of observations can introduce another type of representation error. Quality-control procedures are required in practice to reject observations that cannot be modelled adequately, such as those affected by incomplete knowledge of the appropriate observation operator or by instrument calibration problems. These procedures often depend on particulars of either the geophysical model or the data assimilation algorithm (for example, height assignment of atmospheric motion-vector observations or superobservations of radar data). Furthermore, pre-processing is at times performed to derive a quantity that is closer to the state variables of the forecast model. This may introduce further errors depending on the pre-processing algorithm (for example, the retrieval of atmospheric variables from satellite radiances). Errors associated with imperfections in these procedures that depend on the geophysical model, observation operator or data assimilation algorithm, here called pre-processing or quality-control errors, will also be considered as part of the representation error. Those that arise from incomplete knowledge of the observation operator can, of course, be thought of as part of the observation-operator error. Pre-processing of the observations that is independent of components of the data assimilation process (i.e. the geophysical model and data assimilation algorithm) will not be considered as part of representation error, since in that case the observations and the geophysical model can still be considered as two independent sources of information on the current state of the geophysical system.
Thus the representation error, as defined here, encompasses error due to unresolved scales and processes, forward model or observation-operator error and pre-processing or quality-control error. We will show in section 2 how representation error can in theory be separated into these three parts, although the separation is cleanest in the case of linear observation operators. Such a separation becomes problematic, however, when it comes to the practical matter of diagnosing representation-error statistics, for it is difficult to distinguish between observation error and model error, as discussed for instance by Dee (1995;. We return to this point in section 5. Lorenc (1981;1986) pointed out that the state that we want to estimate in atmospheric data assimilation is defined by the model and therefore "observation error contains contributions from variations on scales smaller than those we wish to analyze in both space and time." Even high-resolution models in use today are not able to capture all scales and resolve all processes of geophysical systems. Lorenc (1986) used the term representativeness error to describe a difference between the observation and the model's equivalent of the observation. In the 1990s, both forward interpolation error (Daley, 1993;Mitchell and Daley, 1997a;1997b) and representativeness error (Lorenc, 1986;Cohn, 1997) were used. Forward interpolation error has its origin in the term forward model, which was usually used to indicate a linearized version of the observation operator in inverse theory literature. The term error due to unresolved scales was introduced by Mitchell and Daley (1997b) to distinguish the part of the error that arises from subgrid-scale processes. More recent literature (Waller et al., 2014b) has introduced the term representativity error. Note that this term is not new when referring to the sampling error, due to the spacing in the observations. Daley (1991) also uses the term representativeness error to refer to the sampling error of the observation grid. For Schutgens et al. (2016), the term (spatial) sampling error is a synonym for error due to unresolved scales and processes. Etherton and Bishop (2004) and van Leeuwen (2015) use the term representation error for error due to unresolved scales and processes. Ponte et al. (2007) use the term representation error for error that is composed of pre-processing error and error due to unresolved scales and processes, while Janjić and Cohn (2006) and Oke and Sakov (2008) use it for error due to unresolved scales and processes and observation-operator error.
Although in this article we will focus on representation error in the data assimilation context, there is a wide body of literature on representation error in other contexts, such as when two different types of observations are intercompared, or in the context of forecast verification. Common to these applications is the idea that two quantities are compared that represent different scales or processes, e.g. different sampling volumes of two instruments. In these contexts as well, different terminology has been used. For example, Zawadzki (1975) used space smoothed data, Berenguer and Zawadzki (2008;2009) refer to scale-dependent errors and Bulgin et al. (2016b) and Seed et al. (1996) refer to (spatial) sampling error for error due to unresolved scales and processes. The term representativeness error was used as well in Kitchen and Blackall (1992), Ciach and Krajewski (1999) and Mandapaka et al. (2009). While there is some commonality in the underlying issues in these other areas, here our attention is restricted to the data assimilation context.
We introduce the notation of this manuscript in section 2. In section 3 we illustrate the representation error in different applications and through theoretical considerations. Proper specification of representation-error statistics is important for optimal use of the observations, because it tells us how the observations are to be assimilated in order best to adapt to the model's resolution. Also, accurate specification of these statistics is important for verification of forecast results (Hamill, 2001;Bowler, 2008;Candille and Talagrand, 2008). However, it is also necessary to modify the data assimilation algorithms to include the representation error and its characteristics (i.e. state and time dependences and spatial correlations). In section 4, we will summarize a method of Janjić and Cohn (2006) derived for this purpose. In section 5, we review methods for modelling observation-error statistics and, since representation error introduces correlations in observation errors, we will discuss the use of correlated observation errors in practice. We describe previously unpublished experiments demonstrating the use of ensembles for computing part of the representation-error statistics. Further, we indicate how the representation error is included in the observation-space diagnostic approaches of Hollingsworth and Lonnberg (1986) and Desroziers et al. (2005) and we discuss current and future research challenges in diagnosing and implementing correlated observation errors in operational data assimilation systems. Finally, in section 6 we discuss the scale-matching approach, which attempts to filter the data to a resolution similar to that of the model before assimilation. Section 7 concludes the article and summarizes directions for future research.

DEFINITIONS
As illustrated in Figure 1, the observation error, o , consists of a measurement error, m , and a representation error, R .
On the left we see visible satellite imagery, considered here to be an observable arising from the true (continuum) atmospheric state w, but not to be a state variable itself, and on the right similar imagery is shown at low resolution to represent imagery from a numerical weather prediction model on its grid. An arrow between these two images represents the (unknown) map of the true state to the true (resolved) model state w r . The instrument taking a measurement is represented by h c , the (generally nonlinear) continuum observation operator that acts on the true state to produce the observation y that is contaminated with the measurement error. If the observation y is pre-processed, it could be contaminated with the pre-processing error as well (see the Appendix for more details in this case). To compare the model-produced data and the observed data, it is necessary to apply a discrete operator, h, to the model output. The difference between the observation and the prior, y − h(w r ), is then the sum of the representation error and the measurement error. Both the full w and the resolved state w r are defined on the continuum and therefore can be written as a vector of functions in space x and time t. Each component of these vectors is one dynamical variable (e.g. temperature) and therefore mathematically a function, for both resolved and full states, in the same function space (e.g. in  2 ). In what follows, we always assume that there is a one-to-one correspondence between the geophysical model and w r (x, t), a property we will use explicitly in section 5. For example, if the discretization of the numerical model is spectral, w r (x, t) is a finite sum over weighted spherical harmonics, although the model itself would contain spherical harmonic coefficients.
The observation-operator error is an intrinsic part of the representation error, because the dynamical model dictates the discrete observation operator. Here the nonlinear observation operators will be denoted with small letters and linear operators with capital letters. If the result of the observation operator acting on a state is a scalar then it is denoted by h(w r ) and if it is a vector then h(w r ).
In the following, we would like to define mathematically the categories of error due to unresolved scales and processes and observation-operator error, as well as motivate their names. The observation error can be written as Here, h(w r ) is a vector in observation space, denoting the nonlinear observation operator h acting on the resolved state w r (x, t) defined on the continuum (in space and time). The true (unknown) nonlinear observation operator h c can act on both the full atmospheric state w(x, t) and the resolved state w r (x, t), since we assume w r (x, t) is one possible realization of the state of the atmosphere, i.e. mathematically both functions belong to the same function space. The term m is the measurement error and each instrument will have different measurement error characteristics.
The term ′′′ denotes pre-processing error. This error is different for each observation type and will be described in more detail in section 3.
The term is the error due to unresolved scales and processes. Thus, the error due to unresolved scales and processes defined in Equation (2) represents the difference between a perfect (noise-free) observation and a perfect observation of the true resolved signal that we would like to have. Note that, since the observation operator is not linear, However, in the case of a linear observation operator, from Equation (2) ′ = H c (w − w r ). The equation in the linear case motivates the name, since it is then clear that the error depends on all of the scales and processes unresolved by the geophysical model. Therefore this error will exist as long as we are not able to describe completely the full dynamical system being observed. However, as will be illustrated in Example 2 of section 3.4, if h c filters the true atmospheric signal to be of a lower resolution than w r , it may be possible to minimize this error.
The term is associated with the observation-operator error. The observation-operator error contains the error caused by an approximation of the operator h c with h, for example representing the infinite-dimensional operator with its finite-dimensional approximation that acts only on resolved scales or not knowing perfectly all properties of the true system necessary to describe h c , or simply errors in approximations that are made to minimize computational cost. As with error due to unresolved scales and processes, for nonlinear observation operators we Each of the components of observation error can have a bias, which would need to be accounted for in the application of data assimilation algorithms. Using the Ide et al. (1997) notation, the observation-error covariance matrix will be denoted R, the instrument-error covariance matrix E and the representation-error covariance F and R = E + F. The states defined only at discrete times, such as analysis, forecast and their covariances, will have fixed time k as subscript.

EXAMPLES OF REPRESENTATION ERROR
How the representation error appears and is treated in data assimilation applications is illustrated through examples of the assimilation of radiance data in numerical weather prediction (NWP: section 3.1), for ocean reanalysis (section 3.2) and for atmospheric chemical modelling (section 3.3). Further, three theoretical examples are given in section 3.4 that discriminate mathematically between the observation-operator error and the error due to unresolved scales and processes. These are useful to aid our thinking, but the boundaries between the categories may be blurred for some observations, as described next.

Use of radiance observations in NWP
By far the largest number of observations assimilated for global NWP are satellite radiances. Most commonly, the satellite radiances are used in clear-sky conditions only and a clear-sky radiative transfer model serves as the observation operator (e.g. Collard and McNally, 2009;Bormann et al., 2013). However, increasingly, efforts to treat these observations in all-sky conditions are bearing fruit and here the radiative transfer model includes the effects of clouds or rain . We will consider both of these examples here.
For the representation error in satellite data assimilation, we can think along the lines of the three categories introduced earlier, which can be identified as follows.
• Observation-operator error is the error associated with mapping the model fields to the observation equivalent.
In the case of radiances, this is the error due to uncertainties or approximations in the radiative transfer model used to assimilate the data, for instance RTTOV (Matricardi and Saunders, 1999). Uncertainties in spectroscopic parameters contribute to this error, along with inaccuracies in line-shape models or assumed gas concentrations that may not be consistent with the truth (e.g. Dudhia et al., 2002;Ventress and Dudhia, 2014). Other approximations are also usually made, such as discretization, approximations for computational speed (e.g. Sherlock, 2000) or the representation of the atmosphere as a single vertical column. Further uncertainties arise from the instrument characterization: response functions are often not known accurately and are approximated (e.g. Chen et al., 2013). In strong-constraint 4DVAR, the forecast model also contributes to the observation-operator error, a contribution that is particularly relevant for assimilation of cloud-affected radiances due to uncertainties in the physics parametrizations (e.g. Geer and Bauer, 2011). • Pre-processing or quality-control error is the error associated with imperfections in the preparation and selection of the observations, in terms of either the derivation of a quantity (e.g. the retrieval of an atmospheric profile from satellite data; height for atmospheric motion vectors, see Cordoba et al., 2016) or our ability to identify observations that have unmodelled contributions and hence should be rejected (Waller et al., 2016a). For clear-sky radiance assimilation, this contribution is primarily the result of failures in cloud detection, aimed at removing cloud-affected observations so that a clear-sky radiative transfer model can be used (e.g. McNally and Watts, 2003). Such quality control is never perfect and this can contribute significantly to the error budget. For clear-sky radiance assimilation, one could consider this error as observation-operator error instead, as it is an effect of neglecting clouds in the forward model. However, in operational practice, reductions in pre-processing error are usually achieved through changes in the quality control rather than the forward model, so it is useful to think of it as a pre-processing or quality-control error. The increasing focus on convective-scale modelling and the assimilation of cloudy radiances will remove the need to discard cloud-affected observations and hence reduce the pre-processing error. However, some quality-control errors may still exist, for instance when identifying situations with known deficiencies in the model clouds or radiative transfer. • Error due to unresolved scales and processes is the error associated with spatial and temporal scales, as well as features and processes represented in the observations and not in the NWP model. For satellite sounding radiances, the footprint sizes of most instruments vary between 15 and 45 km, whereas the spatial scales represented in models are around four times the current typical horizontal model resolution of 10-20 km for operational global models and 1.5-3 km for mesoscale models. For some features, such as clouds, the spatial representativeness in models may be much larger than this and is linked to the predictability of these features and physical parametrizations (e.g. Geer and Bauer, 2011). The differences lead to a mismatch between the representation of spatial or temporal scales in observations and NWP models. One could argue that the error due to unresolved scales and processes would diminish as the model's resolution increases. However, even high-resolution global models are not able to capture all observed atmospheric scales. This is illustrated in Figure 2. The upper two panels of Figure 2 show that there appears to be more detail and structure in the clouds in the Met Office UKV model at 1.5 km resolution compared with the SEVIRI observations at approximately 6 km resolution. Conversely, the lower two panels of Figure 2 show that there is more detail and structure in the SEVIRI observations than in the modelled cloud in the Met Office global model at approximately 17 km resolution. As illustrated in the figure, the differences between both model forecasts and the observations are still very large, indicating that the error due to unresolved scales and processes would have different structure and make a large contribution to the observation error if the satellite data sets were to be assimilated. If the unresolved scales contain a lot of energy, as they may in the boundary layer or in convective situations, then neglecting the representation error in data assimilation would alias the unresolved signal on to the resolved scales.
In satellite radiance assimilation, all three of these error categories will contribute to a varying degree. For clear-sky assimilation of mid-tropospheric infrared temperature-sounding radiances, the quality-control and observation-operator errors are likely to dominate. In contrast, the error associated with how clouds are represented in the forecast model is likely to be the largest source of representation error in the assimilation of cloudy radiances.
In the context of satellite data assimilation, the representation error is usually estimated with diagnostic techniques discussed in section 5, i.e. the representation-error covariance matrix F is estimated, together with the measurement-error covariance matrix E in the joint observation-error covariance matrix R. FIGURE 2 (a, c) Observed and (b, d) simulated SEVIRI 10.8 μm IR channel imagery for (b) Met Office UKV and (d) global models over the North Atlantic and European area. The UKV case is taken from 1500 UTC on 24 August 2015 and the global case is taken from 0600 UTC on 19 October 2015. Simulated imagery is produced by running a radiative transfer model on NWP model output and preserves the resolution of the model (Blackmore et al., 2014)

Example on ocean reanalysis
For ocean reanalysis, representation error arises primarily from the error due to unresolved scales and processes. The topic of representation error is handled briefly in most oceanographic literature (e.g. Guinehut et al., 2012;Good et al., 2013). An exception is the careful examination of the representation error associated with satellite altimetry by Oke and Sakov (2008). Here we revisit Oke and Sakov (2008) in the context of satellite altimetry and then examine two additional measurement systems: tropical moored temperature time series, where unresolved temporal scales are evident, and surface drifter velocity, where unresolved physical processes are important contributors to the error due to unresolved scales and processes.
1. The error due to unresolved scales and processes is the error introduced by the presence in the observations of signal due to motion at scales below those resolved by the model, as well as signal due to processes that are not included in the model (Lorenc, 1986;Oke and Sakov, 2008). The current generation of ocean reanalyses are built around global general circulation models of the ocean and sea ice systems, solving the primitive equations of motion and conservation equations for temperature and salt. These equations, as implemented in the current generation of models such as the Geophysical Fluid Dynamics Modular Ocean Model version 5, include a number of approximations, such as the assumptions that gravity is constant (excluding gravitational tides) and that motion has time-scales longer than a day (damping internal gravity waves and diurnal convection) and parametrization of important unresolved processes (such as salt fingering and eddy mixing). Current model implementations have typical resolutions of 0.25 • × 0.25 • × 10 m in the upper ocean. Several warm-and cold-core eddies are evident, with amplitudes of tens of cm. Sea-level anomaly derived from altimetry must be corrected for a number of unresolved processes. One of the largest corrections is for aliasing due to the presence of primarily semidiurnal gravitational tides (blue), which is unresolved by the 10 day sampling of the JASON altimeters. Data were provided by Dr Eric Leuliette of the NOAA Laboratory for Satellite Altimetry Such a model resolution is, for example, insufficient to describe the ocean eddy field, the scales of which vary from 200 km in the Tropics to 10 km in the Arctic. A satellite altimeter is a radar that measures the time it takes for a radar pulse to travel from the satellite to the ocean surface and back and thus infers sea level, typically along the nadir of the satellite track. The JASON series of satellites are in an exact repeat orbit with a period of slightly less than 10 days, an equatorial spacing of the orbit tracks of 3 • and an along-track sampling of about 4 km (signal averaging during 1 s of satellite flight time). An example of the resulting sea level for the western South Atlantic is shown in Figure 3. Oke and Sakov (2008) use an objective mapping with a spatial scale of a few hundred Temperature at 100 m shows even larger 0.5 • C quasi-daily fluctuations that are most prominent in northern spring, when the thermocline is shallow at this location kilometres and a two-week time-scale, such as that shown in Figure 3a to represent the model. The difference between this map and the 1 s sea-level sampling (black line in an example shown in Figure 3b) is treated as the error due to unresolved scales and processes. A visual inspection reveals the presence of an unresolved signal of several centimetres, which also includes the 2 cm measurement error.
Although not emphasized in Oke and Sakov (2008), their use of the observations to estimate the error due to unresolved scales and processes means that spatially correlated errors in the observations, such as the 1000 km scale correction for aliasing by semidiurnal tides (Figure 3b, blue line), will appear in both and thus be excluded from the difference in their approach. Representation error due to unresolved temporal scales is most evident in pure form in observation time series such as those produced by the Rama (Indian Ocean), TAO/Triton (Pacific Ocean) and PIRATA (Atlantic Ocean) tropical moored arrays. Figure 4 shows a 10 month time series of six-hourly temperature at 40 m depth (black) and 100 m depth (blue) at a location a few degrees north of the Equator in the Eastern Pacific. Temperature at 40 m depth shows 0.25 • C daily fluctuations superimposed on dramatic seasonal warming, reflecting the onset of the 2015/16 El Niño. These daily fluctuations reflect local surface heating and wind-stirring processes unresolved by the ocean reanalyses. At 100 m depth, larger 1 • C daily fluctuations result from internal wave related vertical excursions of the thermocline, the explicit physics of which is excluded by the numerical time-stepping and hydrostatic assumption. These 100 m fluctuations are most evident in northern spring, when the stratification is strongest at this depth, thus illustrating how seasonality may enter the representation error due to shifts in the background state.
Velocity observations from a surface drifter also contain representation error that is due to unresolved physics. This error is spatially and temporally correlated, due to the Lagrangian nature of the observations. Surface drifters and freely floating surface buoyed drifters with drogues at 15 m depth are designed to track the horizontal movement of water in the mixed layer. In Figure 5, the changing position of one drifter in the subtropical North Pacific during a 35 day period is displayed. The position track shows the characteristic scallop shape of local wind-forced inertial oscillations, the period of which at this latitude is about 28 h. These inertial oscillations may introduce velocity errors of 10 cm s −1 or larger, however their spatial inhomogeneity makes removal by simple time-filtering problematic. How best to handle such Lagrangian observations remains an open question. 2. For the altimetric measurements, pre-processing error would, for example, include correction of data for tides through tide models. Representation error due to modelled tides is usually set to a constant value, although it is known that spatial variability exists. In addition, altimetric data are corrected for the wrong atmospheric pressure with the simple inverted barometer correction. This part of pre-processing error is usually larger than tide errors (Ponte et al., 2007).
For the current generation of ocean reanalyses, the representation errors are typically modelled by combining them with an estimate of measurement error into a single observation-error covariance matrix, which is also assumed homogeneous, spatially uncorrelated and stationary in time (e.g. Carton and Giese, 2008;Li et al., 2015).

Example on air pollution/atmospheric chemistry/greenhouse gases
In atmospheric chemistry studies, the representation errors are crucial in explaining the mismatch between model and observations. They are significantly larger than the instrument errors. The representation errors are mostly of the unresolved scales and processes type but can also be due to what we defined as observation-operator error, depending on the nature of the control variables.
That is why we will, in the following, distinguish between (a) the case where the control variables are pollutant concentrations directly related to concentration measurements, e.g. in air-quality forecast studies, and (b) the case where the pollutant emission fluxes are the control variables, e.g. in greenhouse gas inverse modelling studies.
In the air pollution context, where the control variables are the pollutant concentrations, representation error is mostly due to unresolved scales and unaccounted-for subgrid processes. In situ measurements of pollutants are strongly impacted by the locations of the observation stations. The air pollutant concentrations depend on the topography, local meteorological climatology and proximity to sources and sinks of primary species (Koohkan and Bocquet, 2012). That explains why, in most air pollution studies, a qualitative classification is used to discriminate between the stations. One distinguishes (a) background stations, which are meant to be representative over large distances and possibly match the model resolution, (b) rural stations, also quite representative but affected by rural (chemical) conditions, and (c) suburban and urban stations, more impacted by human activity: dense traffic and industries, urban heating, and urban topography. The classification of the stations can be obtained using statistics of past observations (Joly and Peuch, 2012). Based on such a classification, the observations would be used differently or not used at all. In data assimilation studies, their error statistics would be determined by such a preliminary classification (Elbern et al., 2007). The errors due to unresolved scales and processes, both spatial and temporal, are also present when using a satellite data estimation of constituent concentrations (Boersma et al., 2015), in addition to the observation-operator error due to retrieval assumptions.
Representation error also plays a major role in atmospheric chemistry inverse modelling studies, focused on the retrieval of pollutant emissions. In this context, the control variables are the discretized emission fluxes, which model the true pollutant emission field. For these control variables, representation error is often called aggregation error. Quite often, especially when the transport and chemical model is approximately linear, the model is defined by the sensitivity matrix of the observations to the emission control variables, which is then called the observation operator (or Jacobian, or source-receptor matrix, etc.), as it relates the observations directly to the unknown fluxes. In this case, representation error is due to both unresolved scales of the emission fields (the coarse discretization of the control space of the fluxes) and observation operator error, i.e. any error in the construction of the forward operator. Aggregation error has a strong impact in inverse modelling studies of pollutant sources, fluxes and sinks. One way to regularize an inversion with high-resolution control space of unknown emission fluxes, i.e. to make it less underconstrained, is to aggregate these fluxes. This comes with the price of representation error (Kaminski et al., 2001). The issue has been examined quite early in inverse modelling of greenhouse gas fluxes, meant to refine the greenhouse gas budget. The goal is to find the optimal compromise minimizing the total error between the analysis error (due to underdetermination) and the representation error (Peylin et al., 2001). In the absence of model error and using a formal expression for the representation error, it was shown that such a balance does not exist in theory. Higher resolutions systematically increase the information gain in the inversion Wu et al., 2011), provided the representation error is well estimated and accounted for in the data assimilation scheme. This theoretical result is little affected by resolution-dependent model error and approximation in the inversion scheme (Turner and Jacob, 2015).
In the context of atmospheric chemistry data assimilation, the representation error issue can be partly addressed using several distinct methods. In air-quality studies, one can make use of the station classification mentioned above, which would be determined in a prior stage (Gaubert et al., 2014). The representation error in inversion studies can be estimated and used to design a flux-space adaptive grid that, for any given number of parameters, minimizes the representation error while maximizing the information content of the observations . Other studies use more general error-estimation techniques, such as those discussed in section 5, in order to estimate the representation error (Schwinger and Elbern, 2010). Representation error can sometimes be parametrized using an adaptative statistical approach, i.e. the new parameters that measure the representation error of each station are estimated within the data assimilation scheme (Koohkan and Bocquet, 2012).

Illustration
We illustrate the estimation of the representation error with the inversion of sources of carbon monoxide (CO) over France (Koohkan and Bocquet, 2012) using in situ measurements.
The observation network has 80 stations that measure CO hourly concentrations. The stations that are close to urban areas or in the vicinity of industries are mainly and strongly impacted by local sources. A Eulerian chemistry transport model at a resolution of 25 km cannot account for those subgrid processes. As a result, while the measurements can be highly peaked, the coarse-resolution CO simulation is not able to reproduce those peaks quantitatively. This can be seen in Figure 6, where the blue curve corresponds to the observation profile, while the red curve corresponds to the free simulation. A 4D-Var estimation of the fluxes is highly impacted by the representation error and increases the estimated fluxes artificially so as to account for the bias. A subgrid-scale model that relates the source to the local inventory defined at coarse resolution is parametrized for each station by an a priori unknown representation factor. The 80 factors, one for each station, are jointly estimated with the fluxes. This allows estimation of the representation error, an objective characterization of the stations, and leads to an important correction in the 4D-Var estimates (black curve), with a bias that is very significantly reduced. Carbon monoxide forecasts are considerably improved using this method.

Theoretical examples
Both error due to unresolved scales and processes and observation-operator error depend on the geophysical model and observation operators. Here, we illustrate them with idealized examples and show how they interact depending on the observation operator properties and the resolved scales (Janjić, 2001). Example 1. For x ∈ [0, 2π], let us define w r (x, t) as a Fourier truncation of a scalar field w(x, t), i.e.
where N is given through the dynamical model andŵ(n 1 , t) are Fourier coefficients of the full state w(x, t). Furthermore, suppose that there is a single observation and that H c and H are bounded, linear operators, i.e. the result of applying the observation operator is a scalar: for some instrument weighting functions c(x) and d(x). For example, c(x) can be a Gaussian with the length-scale L c , i.e. up to a constant c( . From Equation (2), the error due to unresolved scales and processes is hence, by the Parseval-Plancherel formula and Equation (5), containing all the wavelengths not resolved by the model. The observation-operator error, from Equation (4), can similarly be calculated as We see that ifd(n 1 ) =ĉ(n 1 ) for n 1 = −N, … , N, then Also, if we discretize Equation (7) further, then we would have, for example, whered(x i ) are interpolation coefficients here. Therefore, the observation-operator error may not be zero even ifd(n 1 ) = c(n 1 ). It would depend on the scales present in the model, how well we are able to represent function c with d and what kind of quadrature formula we have used in Equation (7). Example 2. Now consider the case where the observation operator has a lower resolution than the model, analogously to the satellite data. If we wish a perfect filter,ĉ(−n 1 ) = 0 for n 1 > p, where p is some integer smaller than N, then the error due to unresolved scales and processes from Equation (9) is zero. If, on the other hand, the weighting function is Gaussian, the error due to unresolved scales and processes will not be zero and will again depend on all of the scales unresolved by the model.
For the perfect filter case, the observation-operator error can be calculated as From Equation (12), we note that the observation-operator error could be made smaller once appropriate filtering is performed on the model's fields in order to compare them with the low-resolution observations. Therefore, when the observation operator has a lower resolution than the model, scale mismatch will still depend on the unresolved scales and processes. Further representation error might be smaller if the appropriate filtering of the model's fields could be found.
Example 3. So far, we have considered a spectral discretization of the model. In this example, suppose that w r (x, t) is the piecewise constant approximation of w(x, t) on intervals with equally spaced collocation points x n 1 ∈ [0, 2π], n 1 = −N, ..., N and distance Δx. Further, suppose that there is a single observation and that H c and H are bounded, linear operators given by Equations (6) and (7). Since H c is linear, the error due to unresolved scales and processes once again depends on w(x, t) and c(x), i.e.
dx. (13) Equation (13) has been used, for example, in estimating the representation error from the data in the study by Oke and Sakov (2008). Note that the observation-operator error ′′ = 0 if This example illustrates the relative nature of the representation error, the properties of which (compare Equation (9) and (13)) depend on the geophysical model one is using.

INCLUDING REPRESENTATION ERROR IN THE DATA ASSIMILATION ALGORITHMS
As pointed out by Cohn (1997), in order to take representation error into account, data assimilation in the continuum needs to be considered. Here, we first review the methods of including the error due to unresolved scales and processes in the Kalman-filter algorithm that were presented in Janjić and Cohn (2006). The interested reader may also like to explore Cohn (1997), Bocquet et al. (2011), Hodyss and Nichols (2015) and van Leeuwen (2015), who use a Bayesian approach. A brief treatment of the observation-operator error is given in section 4.2.

Including error due to unresolved scales and processes in the Kalman-filter algorithm
In this section, we focus only on the error due to unresolved scales and processes; the observation-operator error and pre-processing error are assumed negligible. To simplify the notation, the state we would like to estimate is assumed to be a scalar field in this section. In the presence of error due to unresolved scales, the observation error is spatially and temporally correlated. However, let us consider the augmented vector of the resolved and unresolved scales, where both the resolved, w r , and unresolved scales, w u = w − w r , are defined on the continuum (see section 2). The Kalman-filter equations can be formally written for the augmented vector. Our objective is to estimate w r (x, t k ) at a fixed time t k , which represents the truth that is resolved by the model. The dynamics of the large and small scales of the true signal are typically coupled in nonlinear systems. Janjić and Cohn (2006) assume instead the following equation for the continuum dynamics: In Equation (16), the resolved scales are influenced only by the "resolved"-scale dynamics. Subgrid-scale parametrizations approximate the feedback from unresolved scales to resolved ones, so that  r k+1,k can represent the large-scale part of the true dynamics more accurately. The subgrid-scale parametrization can be assumed already to be a part of  r k+1,k . The alternative of adding them explicitly in Equation (16) through  ru k+1,k is not explored here. Although the unresolved scales evolve depending on both the resolved and unresolved scales, we do not have a geophysical model of their evolution.
The observations contain contributions from both resolved and unresolved scales. Note that, for the augmented state vector, the observation error is not spatially and temporally correlated, since it consists of measurement error only. We assume here that the observation operator h c k is such that we can write the observation as a sum of h k (w r (⋅, t k )) and h u k (w u (⋅, t k )). Here h k and h u k are observation operators that take us from the corresponding state space to observation space and act only on the spatial dimension denoted with a dot, for example h k (w r (⋅, t k )). A simple derivation of the main idea can be performed assuming linear observation operators, i.e. h = H, h u = H u and h c = H c , as in Janjić and Cohn (2006), under the assumption ΔH = H c − H = 0. In order to take nonlinear observation operators into account, the ensemble Kalman filter approach could be used once the equations are derived; this approximates the covariances through the ensemble on which nonlinear observation operators are applied first.
The forecast (analysis) error covariance for the augmented state for any two points in space x 1 , x 2 and discrete time k can be written in the form Here, the covariances B ru and B ur describe the cross-correlations between the resolved and unresolved scales, while B uu is the covariance of the unresolved scales. From the Kalman-filter equations for the augmented space, we can obtain equations for the estimation of resolved scales only, which, however, require us to estimate the unresolved scales simultaneously.
As a first approximation, Janjić and Cohn (2006) suggest disregarding the estimation equation for w u and this yields the Schmidt-Kalman filter (Jazwinski, 1970, p. 285). In the Schmidt-Kalman filter, w r,f k , w r,a k , B rr,f k and B rr,a k , which represent the forecast and analysis of the resolved scales and their error covariances, are estimated. The error covariance of the unresolved scales is not calculated during the assimilation and is therefore different from B uu k . In order to emphasize this, in the following we will label the error covariance of the unresolved scales by W uu k . Besides W uu k , there are a couple of new terms compared with the standard equation for the Kalman filter (e.g. Nakamura and Potthast, 2015) in the Schmidt-Kalman-filter equations. These equations require estimates of the mean of the unresolved scales H u k ⟨w u (⋅, t k )⟩ at the observation locations, the covariance between the resolved and the unresolved scales B ru,a k and estimates of the unresolved covariance at the observation points. A term H u k [H u k W uu k (⋅, ⋅)] T appears in the estimation equation for w r,a k in order to reduce the accuracy of the observation, taking into account the fact that we do not have an observation of w r k only. The inclusion of the cross-covariance B ru,a k allows a small-scale signal in the observations to influence the estimation of the resolved scales. Neglecting this term gives too little weight to the resolved-scale covariance compared with the observations. Once the analysis is computed, the evolution in time of B ru,a k will require knowledge of the evolution of the unresolved-scale dynamics  ur k+1,k ,  u k+1,k . Neglecting the correlation between the resolved and unresolved scales in the Schmidt-Kalman filter formulation, we come to This is the traditional filter, which was suggested in Lorenc (1986). In Cohn (1997), a Bayesian derivation is proposed for this filter, which requires that the temporal correlation, as well as the cross-correlation between scales, must be neglected. The appropriate equivalents of the traditional filter equations are used in variational data assimilation methods, where, to our knowledge, equations similar to the Schmidt-Kalman filter have not been derived to date. Note that the Schmidt-Kalman filter does not require us to make an assumption of no correlation between the background and observation error, which is not valid due to the presence of unresolved scales. Both filter formulations require that additional covariances be estimated, although the traditional filter formulation requires estimates of only the mean and error covariance of the unresolved scales (see Equations (19) and (22)). Grooms et al. (2014) suggest the use of stochastic physics for the mean and covariance of the unresolved scales.
We believe that this would be a promising approach for the Schmidt-Kalman formulation as well.
The difference between the traditional and Schmidt-Kalman filter results may not be very large, depending on the amount of energy in the resolved scales and the decorrelation time between the resolved and unresolved scales. For example, Janjić and Cohn (2006) have an idealized two-dimensional example of a passive tracer being advected on a sphere in the presence of wind shear. In it, both the traditional and Schmidt-Kalman filter perform very well, converging to the solution. Only small differences are seen in their performance towards the end of the assimilation, when the trace of true covariance is smaller than the estimated one for the traditional filter.

Including observation-operator error in the Kalman-filter algorithm
In section 4.1, we made an assumption that ΔH = H c − H = 0. Now we consider the case where this assumption does not hold. Let us assume that we will be using either the Schmidt-Kalman filter or traditional filter to include the error of unresolved scales when estimating w r k (x). In the case ΔH ≠ 0, the estimation equation for w r k (x) needs to be augmented further for the correction due to the error in the observation operator (Gelb, 1974) or the correction terms can simply be derived as in Jazwinski (1970, p. 245). As shown in Jazwinski (1970), the error in the observation operator would produce a bias, m a k (x), that needs to be corrected for in w r,a k (Dee, 2005;Auligne et al., 2007). The bias would be propagated in time with the resolved-scale dynamics and would initially be zero, while at analysis times it would satisfy the formula (23) If we correct only the bias, the analysis-error statistics calculated through either the Schmidt-Kalman filter or the traditional filter would not be correct and therefore, in addition, evolution equations for the correlation between the error and the state would need to be included (Jazwinski, 1970;Gelb, 1974).

DIAGNOSING AND USING CORRELATED OBSERVATION-ERROR COVARIANCE MATRICES IN DATA ASSIMILATION
Representation error depends on the continuum state of the dynamical system; therefore, it is state-and time-dependent. Due to its dependence on the state of the geophysical system, representation error can introduce spatial correlations in the observational-error statistics. The contribution of the representation error is difficult to estimate because it depends on both the geophysical model and the observations we use. If the representation error is underestimated, then we would fit the analysis to what the model considers to be noise, while if it is overestimated then we are discarding useful information. In practice, assigned variances of observation error are commonly inflated, to counteract and reduce the effect of neglected observation-error correlations (Courtier et al., 1998;Bormann et al., 2016). For variable model resolutions, estimates of the representation error also need to be scale-adaptive. Attempts have been made to estimate statistics of the error due to unresolved scales and processes by assuming that a higher resolution model represents the true state (Etherton and Bishop, 2004;Ponte et al., 2007;Waller et al., 2014b) or that higher resolution observations represent the truth (Oke and Sakov, 2008). In addition, statistical adaptation within the data assimilation scheme has been suggested as a novel means to estimate the statistics of this error (Koohkan and Bocquet, 2012), as well as stochastical models (Grooms et al., 2014). Attempts have also been made to estimate the full representation-error statistics by assuming a structure for the covariance and then estimating its parameters (Ménard et al., 2000;Janjić and Cohn, 2006). Ponte et al. (2007) estimate pre-processing errors for altimeter data based on tide model errors and differences between atmospheric models for pressure-driven signal corrections. The interested reader may also like to explore other techniques, such as the maximum-likelihood method of Dee and da Silva (1999), the method based on analysis innovation statistics of Desroziers and Ivanov (2001), the online estimation method of Li et al. (2009), the adjoint sensitivity method of Daescu and Todling (2010), the method proposed by Karspeck (2016), which uses an ensemble of model simulations, and the Bayesian estimation approach of Ueno and Nakamura (2016). In addition, methods based on observations that have different sampling volumes can be used to estimate the statistics of representation error (Ciach and Krajewski, 1999;Berenguer and Zawadzki, 2008;2009;Bulgin et al., 2016b).
Recently, a number of authors have begun to consider estimating and using representation-error statistics in data assimilation using diagnostic methods. In this section, we review techniques for diagnosing the full observation-error covariance matrix from observation minus forecast and observation minus analysis residuals and from alternatives, physically based error inventories and an ensemble approach. We also discuss methods for implementing fully correlated observation-error covariance matrices in data assimilation.

Diagnostic methods
Quantifying observation-error correlations is not a straightforward problem. A particular issue is that the distinction between biased and correlated errors can be blurred in practical contexts (Wilks, 1995, section 5.2.3). Methods considered in this section assume a priori that biases in observations and in background model states are removed. In addition, for practical applications of the diagnostics, temporal and/or spatial averaging may be needed in order to obtain sufficient samples. Hence, any state dependence in the errors will only be detectable if it is slowly varying (Waller et al., 2014a). Furthermore, these methods make no attempt to calculate the separate contribution from each source of representation error.
In this section, we focus on the techniques currently enjoying the most popularity: the Hollingsworth-Lönnberg method (Hollingsworth and Lonnberg, 1986) and the Desroziers et al. (2005) diagnostic. Recently, Hodyss and Nichols (2015) and Hodyss and Satterfield (2016) have pointed out that these methods only deliver correct observation-error covariance estimates with typical current assimilation systems if there is no model error on the resolved scales, otherwise the estimates will include a portion of the background-error covariance. New versions of the diagnostics that allow estimation of the model error covariance have been published recently (Bowler, 2017;Howes et al., 2017). However, in this article we only discuss the standard diagnostics. These diagnostics are relatively simple to implement and use data that are commonly output from operational assimilation systems.
The Hollingsworth-Lönnberg method (Hollingsworth and Lonnberg, 1986) makes use of forecast residuals, often called innovations. These are defined as and represent the difference between the observation y and the mapping of the model forecast vector, w f , on to observation space by the modelled (nonlinear) observation operator h. Equation (24) can be expanded to make its dependence on representation error more explicit, using the notation of section 2, with w r representing the truth. Thus, we have where H is the linearized version of the observation operator, m is the instrument error and f is the background error. The representation error R contains errors due to unresolved scales, observation-operator error and pre-processing error. If the background errors and observation errors are mutually uncorrelated, then taking the statistical expectation of the outer product of the innovations results in Note that B, E and F (see section 2) all represent the true covariance matrices in this equation, but H is the linearization of the approximate observation operator, which may be quite inaccurate. The assumption, used in the calculation, that observation and background errors are mutually uncorrelated may not hold in practice, e.g. if background fields are used in observation pre-processing or in the presence of unresolved scales. However, it is a commonly used assumption in data assimilation.
The Hollingsworth-Lönnberg method separates contributions from background and observation errors in innovation statistics, assuming that the background errors carry spatial correlations while the observation errors do not. For example, Stewart et al. (2014) used the method for IASI data to estimate observation-error variance. With additional assumptions on the background-error statistics, the method was modified to account for correlated errors in the observations by Garand et al. (2007) for AIRS data, showing significant interchannel error correlations.  and  applied the method to ATOVS, AIRS and IASI data used in the ECMWF analysis, again demonstrating considerable correlation structures in certain wavelength bands. However, when observation errors are correlated, deciding how to split the contributions between observation and background errors may be difficult and is subjective. Furthermore, this splitting is often obtained by fitting correlation functions to the innovation statistics; in this case, the resulting observation and background-error statistics are highly dependent on the choice of the fitted correlation function .
Continuing under the assumption that background errors and observation errors are mutually uncorrelated, Desroziers et al. (2005) found a method to separate observation and background errors with autocorrelations. Initially proposed as a consistency check, this method uses post-analysis diagnostics from linear estimation theory to approximate the covariances of the observation errors. We assume that the analysis is determined using where H is the observation operator linearized about the current state andR andB are the assumed observation and background-error covariances used to weight the observations and background in the assimilation. This notation makes explicit the distinction between assumed covariance matrices used in the assimilation (with tildes) and covariance matrices describing the true distributions (without tildes, as in Equation (26)). The analysis residuals are then Desroziers et al. (2005) show that an estimate of the observation-error covariance matrix can be obtained by taking the expectation of the outer product of the analysis and background residuals, where R e is the estimated observation-error covariance matrix and B, E, F are the exact background, instrument-error and representation-error covariance matrices. If the observation and forecast errors used in the assimilation are exact, In practice, the statistics used in the assimilation will not be exact, but Desroziers et al. (2005) show that in this case the diagnostic may still be used to gain an estimate of the observation-error variances and correlations. As with Equation (26), there is also an implicit assumption that d o f and d o a are unbiased, although results using bias-corrected data may also be valid (Waller et al., 2016a). The initial work of Desroziers et al. (2005) suggested applying the diagnostic in successive iterations. Theoretical and idealized results relating to the diagnostic under some simplifying assumptions provide information on how to interpret the results of iterating the diagnostic when the errors used in the assimilation are not exact (Chapnik et al., 2004;Desroziers et al., 2005;2009;Ménard et al., 2009;Ménard, 2016). It is important not to iterate on both the estimates of background and observation errors concurrently, but to treat them separately. Concurrent iteration results in convergence in one step to a solution that may or may not be close to the true statistics (Ménard et al., 2009;Ménard, 2016). Furthermore, iterating the diagnostic can be computationally costly and time-consuming and may produce disappointing results, due to the many assumptions that are already required to permit operational assimilation. For example, Desroziers et al. (2005) state that it "appears that the adjustment of background and observation-error variances is only relevant if those errors have different structures." As a result, it is often stated that the method will not yield an accurate result if the scales in the background and observation-error statistics are similar (Bormann and Stewart et al., 2014;Weston et al., 2014). However, it is actually the convergence of the iterations that may be slow or even fail if the scales in the true observation and background or assumed observation and background-error covariance matrices are proportional. Although this scale separation causes problems for the iteration procedure, it may not result in the failure of the diagnostic (Waller et al., 2014a).
In some cases, the computational framework for including correlated errors in the assimilation is not yet developed and hence iteration of the diagnostic is not always feasible. Indeed, most of the studies using the diagnostic in operational NWP to date have considered only the first iterate and still gained useful information. For example, the diagnostic has also been applied to calculate satellite interchannel error covariances (Stewart et al., 2009;Weston et al., 2014;Waller et al., 2016a) and spatial error covariances (Cordoba et al., 2016;Waller et al., 2016a;2016c) in variational assimilation systems, as well as in ensemble data assimilation systems (Lange and Janjić, 2016;Schraff et al., 2016). It has been applied in atmospheric chemistry (Schwinger and Elbern, 2010) as well. Further work investigating the diagnostic in simple model experiments includes both variational (Stewart, 2010) and ensemble (Li et al., 2009;Miyoshi et al., 2013) data assimilation systems and its use to estimate time varying observation errors (Waller et al., 2014a).
Nevertheless, as pointed out by Todling (2015), careful thought must be applied in interpreting the results from the diagnostic. The idealized study of Waller et al. (2016b) shows the dependence of the first iterate on the assumed statistics used in the cost function. These results have potential use for interpreting the derived covariances estimated using an operational system. Even though the results of the diagnostic are subject to uncertainty, they usually still provide useful information. For example, hypotheses about sources of error can be tested by varying the choices made in the assimilation (background errors, superobbing, observation operator, etc., e.g. Waller et al. (2016a;2016c)). However, using more than one approach is likely to be more successful, for example using the diagnostics in conjunction with the error inventory approaches described in section 5.2.

Uncertainty budgets
While the diagnostic approaches discussed in section 5.1 use output from the assimilation system to diagnose errors, assuming all the components of the system are in place, the uncertainty budget method computes uncertainty estimates without using the data assimilation system. Thus uncertainty budgets can include state-dependent errors (e.g. Forsythe and Saunders, 2008;Geer and Bauer, 2011). There is a large body of literature on this topic, since most, if not all, observing systems used in operations have had some kind of uncertainty study associated with them. In this section, we give only a few examples of the types of study mainly useful for estimation of representation uncertainty for data assimilation. Precise metrological studies (e.g. Bulgin et al., 2016a) take the propagation of uncertainty approach defined by the Joint Committee for Guides in Metrology (2008). If we consider a variable z, related to a set of input quantities u i , i = 1, 2, … m, by z = f (u 1 , u 2 , … , u m ), then 2 (z), the estimated variance associated with z, is given by where (u i , u j ) is the estimated covariance associated with the two inputs u i and u j (see Joint Committee for Guides in Metrology, 2008, equation (13)). A rigorous propagation of uncertainties following Equation (32) is, however, often not practical, so Monte Carlo simulations, together with error inventory approaches, are at times used instead. For instance, this has been attempted to estimate contributions from cloud screening, radiative transfer and spatial representativeness error for the assimilation of hyperspectral infrared radiances (e.g. Chun et al., 2015). Several studies have carried out simulation studies to examine pre-processing errors (e.g. Bormann et al., 2014;Lean et al., 2015). Errors arising from observation-operator uncertainty have been considered by Sherlock et al. (2003) and Matricardi (2009) in the context of fast radiative transfer modelling.

5.3
Ensemble method for error due to unresolved scales and processes The missing covariance of error due to unresolved scales and processes in Equation (22) of the traditional filter formulation can also be approximated through an ensemble, in the following way.
Usually, we would take a sample of forecasts at different times to form an initial ensemble of size N ens . Instead, we take a larger sample, on which we perform a singular-value decomposition and order the singular values from largest to smallest. We hypothesize that singular values smaller than the N ens th value correspond to the unresolved scales and construct the full unresolved scales matrix H u k [H u k W uu k (⋅, ⋅)] T in Equation (22) as a sample covariance by applying the observation operator to those singular vectors. The sample of size N ens , as well as the larger sample, will contain only scales resolved by the model. Therefore, this approach is similar to the method used in estimating the model error covariance, which sets it proportional to the background-error covariance, except that in this case we use an estimate from singular vectors that are not contained in the ensemble of size N ens that we are propagating during assimilation for estimation of background error.
In order to illustrate this approach, we consider two models for unresolved-scale covariance at the locations where the observations are. One is a diagonal matrix with equal values on the diagonal corresponding to the trace of the unresolved-scale singular values. The other model uses the full unresolved-scale matrix at the observation locations. This was applied to assimilation of sea-surface temperature (SST) retrievals in experiments identical to those described in Losa et al. (2014), except for the observation-error specification where a diagonal matrix with standard deviation (SD) of 0.8 • C was used in Losa et al. (2014). The recommended error for SST retrievals is a SD of 0.6 • C. Figure 7 illustrates the verification against independent in situ salinity measurements through time at Arkona station in the Baltic Sea. In the figure, the verification is shown for (a) the free model run, the diagonal observation-error covariance with the SD values 0.6, the inflated diagonal of 0.8 and 1.2 and (b) the diagonal plus the correlation structure estimated at time 0 from the ensemble for the 0.6 and 0.8 cases. Inclusion of the correlation degrades the verification results in the 0.6 SD case, but improves the results in 0.8 SD case. Following the study by Kivman et al. (2001) and Losa et al. (2004), in Losa et al. (2014) the maximum entropy approach was suggested as an additional criterion for assessing the assumed prior error statistics in ensemble-based systems in situations when little is known about model and data quality (a typical case in oceanographic applications). Calculation of the entropy values (Losa et al., 2014) leads to 3.59, 3.99 and 4.10 for inflated variance values of 0.6, 1.2 and 0.8. Once the full unresolved error covariance is added in the 0.8 case, the entropy value increases further to 4.24, indicating a best verification result. Therefore, this simple approach of including the full covariance matrix of the unresolved scales through an ensemble could give a benefit over further inflating the diagonal if variances are not underestimated. A similar approach could be used for missing covariances in the Schmidt-Kalman filter formulation.

Implementation issues
Due to the complexity of diagnosing and using full observation-error covariance matrices in practice, it is natural to question whether accounting for correlations has any advantage compared with using diagonal approximations. When observation errors are incorrectly assumed to be uncorrelated, increasing the observation density beyond some threshold value has been shown to yield little or no improvement in analysis accuracy (Liu and Rabier, 2003;Berger and Forsythe, 2004;Dando et al., 2007;Jacques and Zawadzki, 2014). Furthermore, Stewart et al. (2008) and Stewart (2010) showed that the observation information content in the analysis is severely degraded. Such studies, combined with examples demonstrating that ignoring correlation structure hinders the use of satellite data (e.g. constraining channel selection algorithms: Collard, 2007), suggest that error correlations for certain observation types have an important role to play in improving numerical weather forecasting. When the correlated observation errors are accounted for, it has been shown to lead to a more accurate analysis (Healy and White, 2005;Stewart, 2010;Stewart et al., 2013) and improvements in the forecast skill score (Weston et al., 2014;Bormann et al., 2016). Indeed, Stewart et al. (2008; and Healy and White (2005) show that even the use of a crude approximation to the observation-error covariance matrix may provide significant benefit.
However, the computational demands of using full observation-error correlation matrices appear to be significant. The size of the matrices to be stored is reduced by assuming that the observation-error covariance matrix has a block-diagonal structure, with (uncorrelated) blocks corresponding to different instruments. If spatial correlations are neglected, the size of the full submatrices can be reduced further, as has been done for the operational representation of interchannel correlations at the Met Office (Weston, 2011;. The representation of spatial correlations is less straightforward and may require different parallelization strategies for the assimilation scheme. A number of approximated forms of spatial correlation matrices (or their inverses) have been proposed in the literature to increase numerical efficiency while preserving observation information content and analysis accuracy (Fisher, 2005;Healy and White, 2005;Stewart et al., 2008;Stewart, 2010).
A further issue in variational assimilation is the speed of convergence of the minimization problem. Typical operational systems are pre-conditioned with the square root of the background-error covariance matrix (Bannister, 2008). This is a sensible approach for cost functions with diagonal observation-error covariance matrices, where the conditioning of the minimization problem is dominated by the condition number of the background-error covariance matrix (Haben, 2011;Haben et al., 2011). However, Weston (2011) found in an operational application that reconditioning of diagnosed observation-error covariance matrices was necessary to ensure convergence of the variational assimilation problem.
Including observation-error correlations changes how observation information is filtered in the analysis (Daley, 1991, section 4.8). Using spatial correlations, Seaman (1977) noted an increase in the accuracy of gradients of observed fields represented in the analysis and Rainwater et al. (2015) an improvement in the smallest scales resolved by an NWP model. Weston et al. (2014) and Bormann et al. (2016) note that accounting for interchannel correlations modifies the weighting of the observations in a situation-dependent way. When observation-minus-background departures project strongly on to the leading eigenvectors of the covariance matrix (associated with the largest eigenvalues), taking error correlations into account will result in a relative downweighting of the observations. However, if the departures project strongly on to the higher order eigenvectors, taking error correlations into account will increase the relative weight on these data.

SCALE-MATCHING APPROACH
In operational practice, it has often been assumed that the observation errors are uncorrelated, as this allows them to be treated simply and computationally cheaply, with a diagonal error covariance matrix. In most cases, to compensate for the omission of error correlation, the observation-error variances are inflated so that the observations have a more appropriate lower weighting in the analysis (e.g. Courtier et al., 1998;Hilton et al., 2009;Bormann et al., 2016). Furthermore, data reduction methods are employed to help ensure that the zero-correlation assumption holds (or almost holds) in practice. For example, observations are spatially thinned so that the distances between assimilated observations are greater than the observation-error horizontal correlation length. Another technique, known as superobbing, reduces the density of the data by averaging innovations in a region and assigning this average (plus the background value) as a single superobservation value. Within the context of convective-scale data assimilation, the generation of superobservations is quite necessary for Doppler radar observations and, with the rapid increase in satellite data at finer pixel spacing, there is a clear need for superobservations. A mathematical derivation by Berger and Forsythe (2004) shows that the superobbing procedure reduces the uncorrelated portion of the error; however, the correlated error is not reduced. A similar result was derived by van Leeuwen (2015) for averages of raw observations: the correlated part of the error does not decrease as the number of observations included in the average increases. However, the assumptions required for the derivations may not hold in practice. Hence, superobbing or averaging is often used alongside thinning (e.g. Waller et al., 2016c). The question arises as to whether there is a benefit in filtering the data before they are assimilated into the model, to the approximate resolution of the model. Such approaches are particularly attractive if accounting fully for the representation error is more difficult, for instance due to the presence of spatial error correlations. Sources of spatial correlations can be the error due to unresolved scales and processes, observation-operator error or pre-processing error. Daley (1993) and Liu and Rabier (2002) found that there is an optimal match between the analysis grid spacing and the instrument spatial averaging, which results in the minimum representation error. Wu et al. (2011) showed that it is possible to design a grid of control variables in such a way that representation error is minimized for a given observation network. Janjić et al. (2012b) assimilated time-varying dynamical ocean topography data (Skachko et al., 2008;Janjić et al., 2012a) filtered to three different spatial resolutions into a global finite-element ocean model (Danilov et al., 2004;Wang et al., 2008) with an ensemble Kalman-filter algorithm.
The results indicate that assimilating data that contain representation errors does not seem to degrade the accuracy of the large-scale analysis as long as the observation-error variance is inflated appropriately. However, in the study, assimilation of the higher resolution data did not affect the SST analysis significantly in higher spectral bands. This might be a result of using a diagonal observation-error covariance matrix that most likely limits us from exploring the full data resolution further (Rainwater et al., 2015).
In a second study, carried out at the Met Office (Peter Weston, 2016; personal communication), default Cross-track Infrared Sounder (CrIS) data from the ∼14 km resolution field of view (FOV) was compared against averaged CrIS data created by averaging the 3 × 3 FOV in each field of regard (FOR) to create superobservations with an effective resolution of ∼42 km. The motivation for this study was better to match the scales between the observations and the models being used, where the forecast model is N768 (∼17 km horizontal resolution) and the assimilation model is N216 (∼60 km resolution). The error characteristics of both datasets were estimated using a posteriori diagnostics, such as those described in section 5, and showed that the averaged dataset had smaller error standard deviations and weaker correlations due to smaller representation errors and lower instrument noise through the averaging. Another effect of the averaging is that the number of observations suitable for assimilation is reduced, due to more of the larger FOVs being contaminated by cloud. When compared in NWP assimilation trials (including correlated observation-error covariances and using a 4DVar algorithm), the results were broadly neutral, with very slight degradations of up to 0.5% in background fits to observations sensitive to mid-tropospheric temperature and humidity. Therefore, it appears that the negative effects of the reduction in the number of observations assimilated due to cloud contamination has more of an effect than the smaller errors due to the better scale matching and reduced instrument noise.

CONCLUSIONS
Updating the state of a geophysical system given by a discrete dynamical model by assimilating observations of the system through time in the data assimilation process requires quantification of the errors, or uncertainties, in the observations and in the model. The observation error is often much larger than the measurement error, which is the error associated with the measuring device alone, particularly in the case of remotely sensed observations. We have suggested the term representation error to refer to the totality of observation error distinct from measurement error. The need to quantify representation error arises in many different earth science disciplines. It has given rise simultaneously to successful approaches and a sometimes discipline-specific array of terminologies. To help foster overall progress in data assimilation and to help enable effective communication between researchers in different disciplines, we have attempted to review the literature on representation error and its quantification and to consolidate the terminology used in different disciplines.
To consolidate the terminology, we have partitioned the representation error into the error due to unresolved scales and processes, the observation-operator error and the pre-processing error and shown how these can be described mathematically. We have illustrated these aspects of representation error by means of examples in satellite radiance data assimilation, ocean reanalysis and atmospheric composition analysis. We have shown how the error due to unresolved scales and processes and the observation-operator error can be treated, once quantified at least, in Kalman filter-type data assimilation algorithms. A promising avenue to treat the error due to unresolved scales and processes in the context of (ensemble) Kalman filtering is to use stochastic physics to determine the mean and covariance of the unresolved scales as they evolve.
We have described a variety of methods that have been used, or are currently being explored, to diagnose the statistics of the representation error and its components. A large number of studies have used forecast and analysis residual diagnostics to determine observation-error correlations in bulk, i.e. without attempting to distinguish among components of the representation error. Further progress with residual diagnostics may ensue by attempting to distinguish them, for example by making and testing hypotheses on the different representation-error components (Waller et al., 2016c). Ensemble methods are being explored to estimate the error due to unresolved scales and processes. Statistical adaptation and stochastic modelling are also just beginning to be explored.
It has become clear that observation-error statistics are just as important as background-error statistics in data assimilation and that methods to estimate statistics of the representation error must therefore begin to receive much more research attention. Our hope is that this article helps to focus efforts in this direction.

APPENDIX
In a case in which the observations have been pre-processed, we can describe this mathematically in terms of a function g acting on the observationsỹ that have not been pre-processed, i.e. y = g(ỹ). Denoting with a tilde the observation operators and measurement error associated with observationsỹ, then the observation error can be written as o = y − g(h(w r )) = g(h c (w) +̃m) − g(h(w r )).
If we set h c ∶= g(h c ) and h ∶= g(h), we recover Equation (1).
Note that, in the examples discussed in section 3 of pre-processing for clear-sky radiance assimilation and correction of data for tides, on one hand the pre-processing would reduce the error due to unresolved scales and processes while on the other hand it would introduce pre-processing error.