Parameter and state estimation with ensemble Kalman filter based algorithms for convective‐scale applications

Representation of clouds in convection‐permitting models is sensitive to numerical weather prediction (NWP) model parameters that are often very crudely known (for example roughness length). Our goal is to allow for uncertainty in these parameters and estimate them from data using the ensemble Kalman filter (EnKF) approach. However, to deal with difficulties associated with convective‐scale applications, such as non‐Gaussianity and constraints on state and parameter values, modifications to classical EnKF are necessary. In this article, we evaluate and extend several recently developed EnKF‐based algorithms that either incorporate constraints such as mass conservation and positivity of precipitation explicitly or introduce higher order moments on the joint state and parameter estimation problem. We compare their results with the localized EnKF for a common idealized test case. The test case uses perfect model experiments with the one‐dimensional modified shallow‐water model, which was designed to mimic important properties of convection. We use a stochastic dynamical model for parameters, in order to prevent underdispersion in parameter space. To deal with localization for estimation of parameters, we introduce a method called global updating, which is a computationally cheap modification of spatial updating and was proven successful in this context. The sensitivity of the results to the number of ensemble members and localization, as well as observation coverage and frequency, is shown. Although all algorithms are capable of reducing the initial state and parameter errors, it is concluded that mass conservation is important when the localization radius is small and/or the observations are sparse. In addition, accounting for higher order moments in the joint space and parameter estimation problem is beneficial when the ensemble size is large enough or when applied to parameter estimation only.

Representation of clouds in convection-permitting models is sensitive to numerical weather prediction (NWP) model parameters that are often very crudely known (for example roughness length). Our goal is to allow for uncertainty in these parameters and estimate them from data using the ensemble Kalman filter (EnKF) approach. However, to deal with difficulties associated with convective-scale applications, such as non-Gaussianity and constraints on state and parameter values, modifications to classical EnKF are necessary. In this article, we evaluate and extend several recently developed EnKF-based algorithms that either incorporate constraints such as mass conservation and positivity of precipitation explicitly or introduce higher order moments on the joint state and parameter estimation problem. We compare their results with the localized EnKF for a common idealized test case. The test case uses perfect model experiments with the one-dimensional modified shallow-water model, which was designed to mimic important properties of convection. We use a stochastic dynamical model for parameters, in order to prevent underdispersion in parameter space. To deal with localization for estimation of parameters, we introduce a method called global updating, which is a computationally cheap modification of spatial updating and was proven successful in this context. The sensitivity of the results to the number of ensemble members and localization, as well as observation coverage and frequency, is shown. Although all algorithms are capable of reducing the initial state and parameter errors, it is concluded that mass conservation is important when the localization radius is small and/or the observations are sparse. In addition, accounting for higher order moments in the joint space and parameter estimation problem is beneficial when the ensemble size is large enough or when applied to parameter estimation only.

INTRODUCTION
The initiation and development of clouds in convection-permitting models can be highly sensitive to crudely known parameters, such as roughness lengths and the turbulent length-scale. Modest changes to these parameters can alter the model output significantly, making their uncertainty a serious source of forecast errors (Quintanar et al., 2016). In most operational weather prediction systems (Gustafsson et al., 2017), the uncertainty in these parameters is not taken into account, which contributes to overconfidence in the model output. In this article, we investigate the feasibility of estimating parameters simultaneously with the dynamical state with ensemble Kalman filter (EnKF) type algorithms by means of state-space augmentation (Jazwinski, 1970;Evensen, 2009). This approach allows the parameters to be updated objectively according to data, leaving them flexible to adjust to recent conditions. In addition, the uncertainty in the parameters is taken into account, thereby capturing the uncertainty of the model output more accurately. EnKF-based algorithms for joint state and parameter estimation have shown potential for meso-and synoptic-scale systems (Annan et al., 2005;Aksoy et al., 2006;Emerick and Reynolds, 2012). However, it is well accepted that data assimilation at convective scales suffers from the nonlinearity and non-Gaussianity associated with rapidly developing and intermittent convective storms. In the case of convective-scale data assimilation, where different regimes are triggered by on-off switches, the sensitivity to the parameters may be non-smooth. In addition, since the coupling between parameters and model equations is strongly nonlinear and state variables that need to be estimated jointly with parameters such as rain, snow and ice need to be non-negative, non-Gaussian distributions of errors are more frequent. As a consequence, data assimilation methods such as the EnKF that rely on Gaussian assumptions might yield poor estimates of the state and the parameters or even generate unphysical values. In Kivman (2003), where perfect model experiments with the stochastic Lorenz'63 system (Evensen and van Leeuwen, 2000) were conducted, this phenomenon was clearly illustrated. The same study highlights the advantages of particle filters, which, in contrast to the EnKF, use full error statistics, making them suitable for non-Gaussian systems. Although the benefits of particle filters have been established in many joint state and parameter estimation studies (Doucet et al., 2001;Losa et al., 2003), the computational demands are too severe for high-dimensional systems (Snyder et al., 2008). Promising efforts like implicit sampling (Chorin et al., 2010;Weir et al., 2013) and localized particle filters (Poterjoy, 2016) are made to alleviate this problem, but performance on applications such as weather forecasting has yet to be demonstrated. For this reason, EnKF-based algorithms are still a popular branch of research. Indeed, despite their suboptimality in non-Gaussian frameworks, they have proven capable of dealing with high-dimensional systems, which is vital for many practical applications. Therefore, instead of modifying particle filters to make them suitable for high-dimensional systems, in this article we are interested in strategies to modify the EnKF to deal with non-Gaussian distributions.
Ensemble-based smoothers have demonstrated their worth in dealing with nonlinearity. The OSA-EnKF (Gharamti et al., 2015) and the IEnKS (Bocquet and Sakov, 2013a; have shown great potential for combined state and parameter estimation. However, smoothers require additional model evaluations, which constitute the dominating computational expense for convective-scale applications. Furthermore, for smoothers, localization is not straightforward due to the temporal dimension (Bocquet, 2016;Desroziers, 2016). An approach for addressing non-Gaussianity to deal with the problem that some state variables and parameters should be non-negative (or in the certain range) is modifying the problem using Gaussian anamorphosis to make it more suitable for the EnKF (Bertino et al., 2003;Simon and Bertino, 2012;Lien et al., 2013;. However, this approach for state variables can lead to instability (Bocquet and Sakov, 2013b). Static anamorphosis is more efficient for distributions that are not too dynamical (Bocquet and Sakov, 2013b), which is not the case for convective-scale applications. Further, state properties such as mass conservation can be lost  and transformed observation errors can lead to biased estimates (Bishop, 2016). The GIGG-EnKF (Bishop, 2016), on the other hand, deals with non-Gaussianity directly by allowing for each observed variable and corresponding observation the assumption of either a gamma or an inverse gamma prior distribution and/or likelihood. However, the unobserved variables are updated according to the classical EnKF. Therefore, in our opinion, this algorithm cannot be exploited fully for parameter estimation applications, since parameters are typically not represented in observation space.
In this study, we evaluate two recently developed ensemble algorithms that either incorporate constraints such as mass conservation and positivity of precipitation (Quadratic Programming Ensemble (QPEns): Janjić et al., 2014) explicitly or introduce higher order moments such as skewness (Quadratic Filter (QF): Hodyss, 2011;2012) to deal with non-Gaussian distributions associated with both the state and parameters. Although the importance of respecting physical conservation principles has long been recognized in numerical weather prediction modelling, it has only recently been shown that imposing them during data assimilation might have a positive impact as well Zeng et al., 2017). However, whether or not the impact would be positive for convective-scale data assimilation is not obvious. It is clear, though, that non-Gaussianity in convective-scale data assimilation needs to be taken into account. Here, both QPEns and QF were shown to be able to handle non-Gaussian error statistics by using different approaches: QPEns through imposing constraints and QF by explicitly introducing skewness. We compare their performance with the EnKF (Evensen, 1994;Houtekamer and Mitchell, 1998) using the one-dimensional modified shallow-water model (Würsch and Craig, 2014), which was designed to mimic the most important aspects of convective motion. These aspects include an acute regime switch when convection is triggered (conditional instability) and a significant time lag between the onset of convection and its observation. Comparison between QF, QPEns and EnKF should provide insight into the relative impact of ignoring conservation laws and physical bounds and the skewness of relevant distributions for convective-scale applications. This will aid in establishing a course for further research, i.e. either combining the algorithms or putting more effort into one direction instead. The algorithms are introduced in more detail in section 2.
A good overview of additional challenges regarding joint state and parameter estimation with ensemble Kalman filter type algorithms is given in Ruiz et al. (2013a). As pointed out in Aksoy et al. (2006), one of them is that localization in both state and parameter space is needed when the posterior distribution is undersampled. To deal with the typically lower dimension of the parameters, the spatial updating technique was introduced. In section 3 we elaborate on this technique and propose an alternative formulation to reduce computational costs. In the second part of this section, we describe how we deal with severe underdispersion in parameter space, which is a common phenomenon for estimation of static parameters (see for example Simon et al., 2015).
In section 4 we present the experimental set-up, where the modified shallow-water model is explained and the settings for both model and data assimilation algorithms are specified. This is followed by a discussion of the results in section 5, in which we first focus on the sensitivity of the algorithms to the localization radius, the observation coverage and the observation frequency without parameter uncertainty. Then, model error due to parameter uncertainty is introduced and we investigate how parameter estimation affects the estimate of the dynamical state for different algorithms. Finally, a summary is provided and some conclusions and further research questions are stated in section 6.

DATA ASSIMILATION ALGORITHMS
The data assimilation algorithms that are investigated in this article are modified versions of the stochastic EnKF (Evensen, 1994;Houtekamer and Mitchell, 1998). With the goal of indicating the respective modifications clearly, we introduce the cost function for each ensemble member i, i = 1, 2, … , N: where N is the ensemble size. At the minimum of cost function J, x i = x a i − x f i is the analysis increment of size n × 1 that is used to calculate analysis ensemble member x a i based on forecast ensemble member x f i (all of size n × 1). The matrix is the forecast-error covariance matrix, withx f representing the ensemble mean, R the observation-error covariance matrix and H an observation operator, which for simplicity we assume is linear, of size p×n. The innovation vector i , from a sample with bias̄o. For the EnKF, the analysis increments x i are found by solving or, equivalently, The formulations of the EnKF in Equations 2 and 3 allow us to identify clearly the differences among the algorithms discussed in this article. Note, however, that since P is not invertible, our abuse of notation in Equation 1 means, strictly speaking, that a projection into lower-dimensional space would need to be performed first before minimization.

Quadratic programming ensemble
A side effect of using data assimilation with the EnKF is the generation of a possibly unphysical analysis state (Kivman, 2003;Pan and Wood, 2006;Simon and Bertino, 2009;Janjić et al., 2014;Zeng and Janjić, 2016). In such cases, the simple solution is to set the unphysical values to the nearest physical one. For example, in convective-scale applications, negative precipitation values could be set to zero. As a consequence, total mass is altered and bias is introduced (see Janjić et al., 2014 for a simple example). The QPEns algorithm, suggested in Janjić et al. (2014), addresses this obstacle by applying physical constraints to the minimization problem (Equation 2): for i = 1, 2, … , N, where m 1 and m 2 indicate the number of inequality and equality constraints, respectively. Here, c j and g k will depend on the physical constraints that one wishes to impose. In Janjić et al. (2014), the effect of imposing positivity and mass constraints was investigated for simple test cases.
Since a full minimization problem has to be solved for each ensemble member, the computational cost of the QPEns depends heavily on the number and nature of the constraints. For example, inequality constraints are algorithmically more challenging than equality constraints. Sophisticated minimization algorithms, such as interior point methods and sequential quadratic programming (SQP), handle inequality constraints either by incorporating them in the cost function with a penalty term or by solving a sequence of equality-constrained subproblems instead. The nonlinearity level of the constraints is an important factor as well. The higher the nonlinearity, the harder the problem.
In this article we consider linear constraints only, such that the problem in Equation 4 can be solved using quadratic programming (QP). For further details regarding QP, we refer to Gill (1981), where various constrained minimization algorithms are discussed thoroughly. For the extension of QPEns to nonlinear equality constraints we refer to Zeng et al. (2017).

Quadratic filter
The most restricting assumption that is needed to derive the EnKF is that all relevant distributions are Gaussian. At convective scales, this is typically not the case. In fact, the very nature of convection is highly nonlinear and spatially irregular, which leads to skewed (and even multimodal) distributions. The QF alleviates this issue by calculating third and fourth moments from the ensemble, in addition to the first two. The algorithm is based on an expansion that represents the analysis increment in powers of the innovation. Truncation of this expansion at the quadratic term, along with ignoring higher order terms between three or more variables, yields the following formulation of the QF: with ⊙ representing elementwise (Schur) multiplication. The subscript n in Equation (5) indicates that only the first n rows of the vector concerned are needed, where n is the dimension of x i . The matrices P 3 (R 3 ) and P 4 (R 4 ) hold information about third and fourth moments, respectively, and are obtained byP =ZZ T , wherẽ Here r d = diag(R) and T and F are the third and fourth moment matrices of the observation likelihood, respectively. Note that, for computational reasons, only higher order moments between at most two variables are taken into account. This ensures that the dimension of P 3 (R 3 ) and P 4 (R 4 ) equals that of P (R). In particular, P 3 ij (R 3 ij ) and P 4 ij (R 4 ij ) correspond to the same variables as P ij (R ij ), thereby avoiding complications with respect to localization. Additional computational costs with respect to the EnKF arise mainly from an augmented observation space. In cases where running the model forecast dominates the computation time, the additional costs affiliated with the QF may be very reasonable. Also, in contrast to the QPEns, the QF is suitable for square-root formulations. However, like the EnKF, physical bounds are not necessarily respected (Posselt et al., 2014), as is the case for the QPEns.
A full derivation of the algorithm, along with a discussion of the effect of skewed distributions in the EnKF context, can be acquired in Hodyss (2011;2012).

AUGMENTED-STATE PARAMETER ESTIMATION
There are two difficulties related to augmented-state parameter estimation in ensemble-based algorithms. Localization in parameter space is not straightforward, since it is not obvious how best to relate global model parameter values to the analysis grid points. This complication can be circumvented by using the spatial updating technique introduced by Aksoy et al. (2006). Here, we suggest an alternative to it. Secondly, augmented-state data assimilation algorithms suffer from underdispersion in parameter space. Our approach to address this problem is discussed in section 3.2.

Localization in parameter space
To illustrate our approach, we consider the case in which one state field and one global scalar parameter are estimated using the EnKF with covariance localization, i.e. P Loc = C ⊙ P, where C is a localization matrix and P Loc is the localized error covariance matrix. Global parameters become an issue when localization is required. Localizing only the state and not the parameters yields indefinite matrices, which can cause any algorithm to fail. Let be the matrix we localize the augmented state with, where C (x,x) is a correlation matrix of size n × n, C ( ,x) is of size 1 × n and C ( , ) = 1 is a scalar. By Schur complement, the matrix C is positive semidefinite if and only if C (x,x) − C T ( ,x) C ( ,x) is. If no localization is applied to the parameters, then C ( ,x) = e T , where e is the vector with elements equal to 1 of appropriate size. However, e T C (x,x) e − e T ee T e < n 2 − n 2 = 0 and therefore the localization matrix C is not positive semidefinite when only the state is localized. In addition, localization is typically applied to suppress spurious correlations caused by undersampling, which is an issue that should also be dealt with in parameter space.
To localize global parameters in the same way as the state, Aksoy et al. (2006) proposed to map global parameters to homogeneous fields of appropriate dimension l (depending on the dimension one wishes to apply parameter localization to). These "dummy" fields serve to create an artificial spatial dependence of parameters, thereby allowing similar localization techniques in both state and parameter space. After the analysis increment [ x T i , T i ] T of size (n + l) × 1 is computed, the now spatially inhomogeneous parameter field i is mapped back to a scalar by performing the spatial average for each ensemble member i. In mathematical terms, the parameter for each ensemble member is updated according to where P ( * , * * ) contains the covariances between * and * * and Note that P ( e,x) of size l × n consists of l identical rows equal to P ( ,x) of size 1 × n and, if l = n, then C ( e,x) can be chosen equal to C (x,x) of size n × n. One could argue that creating an artificial spatial dependence leads to a bias in the use of observations. Indeed, the weight that an observation is given for the parameter update is modified by a spatial dependence that does not exist. One can remedy this by using the identity matrix as correlation matrix for the parameters, C ( e,x) = I. This way, each observation is given a relative weight that is determined solely by its error statistics and its correlation to the parameter, so that no biases of observation weights should occur. For C ( e,x) = I, one can rewrite Equation 6 as where for l = n we substituted (1∕l)e T C ( e,x) ⊙ P ( e,x) = (1∕n)diag{P ( e,x) } = (1∕n)P ( ,x) . Note that augmenting the parameter space to a higher dimension is not necessary here, which confirms the absence of an artificial spatial dependence of global parameters. We therefore refer to this method as global updating. The formula for updating the parameter in Equation 7 can be obtained by a simple scalar parameter augmentation of the state that ensures positive definiteness of the localized error covariance matrix (see the Appendix). Global updating, where l = 1, has the additional advantage of being computationally cheaper than spatial updating, where l > 1. For the EnKF, this saves  (n(l − 1)) operations. For the QF, the reduction in computational costs is even larger. However, it is the QPEns that benefits most from the new approach, as the optimization problems are solved in model space instead of observation space.

Dynamical model for parameters
For any simple augmented-state algorithm, the parameter spread eventually collapses due to the static nature of the parameters. For N → ∞ and under perfect (Gaussian) conditions this is justified, since the only uncertainty related to parameters is from the initial conditions, which should be recovered by the algorithm. However, for smaller ensemble sizes this spread collapse leads to underdispersion in parameter space. A good overview of possible approaches to circumvent this phenomenon is provided by Ruiz et al. (2013b). We choose to impose stochastic dynamics on the parameters to represent the parameter uncertainty, as for example proposed in Gelb (1974). This approach allows the user partially to choose the parameter distribution (and hence its uncertainty), leaving the mean to be determined by the estimation algorithm. Specifically, we set, for each ensemble member . The relation between the parameters t,i and t,i is chosen such The degree of freedom that is left can, for instance, be used to set the variance or shape of the distribution. Note that, since the Beta distribution is bounded, the parameters cannot exceed the prescribed range [l , u ] as long as the posterior ensemble members do not exceed it. This is an advantage over using a Gaussian distribution or inflation-type methods, where one would need to truncate the parameter values of the ensemble members that exceed the bounds, which affects the mean and spread of the ensemble. An alternative would be to use a transformation of variables to guarantee that the parameters stay within their bounds, which may introduce additional nonlinearities in the parameter estimation problem.

EXPERIMENTAL SET-UP
The purpose of this study is to compare the performance of the QPEns, QF and EnKF and evaluate the relative importance of taking higher order moments into account explicitly and respecting physical conservation laws and bounds for convective-scale applications. To that end, we conducted a set of numerical "twin" experiments with the modified shallow-water model, where a model run (further referred to as the nature run) is considered the true state of the atmosphere.
The modified shallow-water model (Würsch and Craig, 2014) consists of the following equations for the velocity u, rain r and water height level of the fluid h, respectively: Here, h c represents the level of free convection. When this threshold is reached, the geopotential takes on a lower, constant value c . The parameters D u , D r , D h are the corresponding diffusion constants and ∶= √ gh 0 is the gravity-wave speed for absolute fluid layer h 0 (h 0 < h c ). The small stochastic Gaussian forcing u is added at random locations to the velocity at every model time step, in order to trigger perturbations and hence convection. Note that this implies that the location of convection is mostly random. The parameter is the production rate and the removal rate of rain. When h reaches the rain threshold h r (higher than h c ), rain is "produced" by adding rainwater mass to the potential, leading to a decrease of the water level and buoyancy. The model conserves mass, so the spatial integral over h is constant in time.
For the numerical implementation of the model, the one-dimensional domain representing 125 km is discretized with 250 points, yielding the state vector x = [u T h T r T ] T ∈ R 750 . The time variable is discretized into time steps of 5 s. The Gaussian stochastic forcing u has a half-width of four grid points and an amplitude of 0.002 m/s. The parameters that are estimated, along with the dynamical state, are ∈ [l , u ], c ∈ [l c , u c ] and h r ∈ [l h r , u h r ] (see Table 1 for values of upper and lower bounds). As previously stated, the multiplicative parameter regulates how fast rain decays. Parameter h r is responsible for a possible switch between the regimes, since it determines when it starts to rain. Parameter c is a value of geopotential once the level of free convection is reached and influences the total time required for a cloud to develop to full height.
Each ensemble member, as well as the nature run, is initialized as a random model run, with parameter values drawn randomly from the uniform distributions  (l , u ),  (l c , u c ) and  (l h r , u h r ), respectively. After an appropriate spin-up time, 250 assimilation cycles are performed. All experiments discussed below need about 50 cycles to converge to a constant root-mean-squared error (RMSE) value. Therefore, the average of the RMSE over the last 100 cycles is computed to assign a scalar RMSE value to each experiment. For statistical significance, the final score used in the results is the average RMSE over 250 experiments.
Unless stated otherwise, observations are taken from the nature run every 60 model time steps (equivalent to 5 min in real time). A Gaussian observation error is added to the wind u and height h fields, with a standard deviation of u = 0.001 m/s and h = 0.02 m, and a log-normal error is added to the rain r field, with parameters = −8 and = 1.8, yielding a very small observation bias of 0.000825 and standard deviation of 0.00185. For all fields, the observation error is roughly 10% of the maximum deviation from the field mean. To mimic radar data, observations for all variables are available only at grid points where rain above a threshold of 0.005 dBZ is measured. Therefore, the rain data have a known observation bias of 0.000825 dBZ, where r > 0.005 dBZ in the nature run. Note that, with a probability of 6.7%, observations corresponding to a dry grid point in the nature run are also assimilated. For rain, these observations have a bias larger than 0.005 dBZ. A random selection, amounting to 25% of the remaining grid points, of additional wind observations are assimilated, which represents additional data available (for example obtained from aircraft).
To deal with undersampling, forecast-error covariance localization using Gaspari and Cohn (1999) is applied, with a fixed localization radius of six grid points for all three algorithms. This corresponds to the localization radius for which the EnKF yields minimum analysis RMSE values of the rain variable for small ensemble sizes. For the spatial updating technique described by Equation 6, the localization matrix for the parameters is the same as for the state. For global updating, we use C cov( ,x) = (1∕n)e T as discussed in section 3.1. For the dynamical model of the parameters according to Equation 8, we choose to use the second degree of freedom to set a lower limit on the parameter spread. This limit is set to 50% of the initial spread. An interior point method is used to solve the quadratic minimization problems of the QPEns. The constraints that are applied are mass conservation, i.e. e T (h a − h f ) = e T h = 0, and positivity of precipitation, i.e. r a = r + r f ≥ 0. For the QF and the EnKF, negative values for rain are set to zero if they occur.

State estimation
To evaluate the behaviour of the different algorithms without parameter uncertainty, we first discuss Figure 1, where the average RMSE of the analysis and forecast state is plotted against the ensemble size. Both the QF and QPEns are more sensitive to the ensemble size than the EnKF. They therefore need an ensemble size larger than a certain threshold (N > N QF and N > N QPEns ) to outperform the EnKF, usually with N QF > N QPEns . To calculate higher order moments accurately, a larger sample size is needed, so it is clear why the QF becomes superior to the EnKF only for sufficiently large ensemble sizes. With regards to the QPEns, we hypothesize that the error covariance matrix needs to be sufficiently accurate to use the mass constraint beneficially. When the error covariance matrix is poor, the mass constraint might cause spurious convection instead of preventing it. The forecast shows roughly the same behaviour Results of state estimation with localization radius 6, 5 min between assimilation cycles and 25% additional wind observations. For the forecast (dashed) and analysis (solid) corresponding to EnKF (blue), QF (green) and QPEns (red), the plots are as follows: (a,c,e) the RMSE of the state as a function of ensemble size; (d) analysis RMSE of r split up into rainy (dotted) and dry (dash-dotted) regions; (f) the absolute mass error of h and r, respectively as the analysis, although the distinctions between the algorithms are somewhat damped, as the analysis increment of the QF appears slightly smaller and that of the QPEns slightly larger than that of the EnKF. In particular, the threshold N QF is smaller for the forecast than for the analysis. The top right part of Figure 1 displays the analysis state RMSE of the rain field split up into rainy and dry regions. The QF gains its overall advantage relative to the EnKF in the rainy regions only. However, the RMSE in the dry regions decreases for the QF and increases for the EnKF as the ensemble size grows. This indicates that a threshold above which the QF outperforms the EnKF exists in dry regions as well, but is higher than in the rainy regions. The QPEns does especially well in dry regions when the ensemble size is large enough. The lower two plots on the right show the absolute error in mass of h, and r, respectively. Recall that both the model and the QPEns conserve the mass of h, which means that the mass of h is kept constant throughout the forecast. Absolute mass errors of h do not decrease for the EnKF and the QF when the number of ensemble members is increased and cause mass errors in r as well. The QF and EnKF even add errors in the analysis step for rain mass instead of reducing them. The slight increase in absolute mass of rain for QPEns with the number of ensemble members is the result of the bias in observations from dry regions. To confirm this, we performed experiments using the nature run to distinguish rainy from dry regions (instead of the observations). In this case, the increase in absolute mass of rain for QPEns does not exist (not shown). However, using the nature run to distinguish rainy from dry regions results in less data being assimilated, since we assimilate information on all three variables only where it rains to mimic radar data. This leads to higher RMSEs and absolute mass errors for QF and EnKF and the same behaviour as in Figure 1 as regards dependence of ensemble size.
To distinguish between the respective effects of mass conservation e T (h a − h f ) = 0 and positivity constraints r a ≥ 0, in addition we conducted experiments in which only mass conservation was applied and ones in which only positivity constraints were applied. The results with only mass conservation are very close to the QPEns results, where both type of constraints are imposed (not shown). Experiments where only the positivity of rain is constrained yield slightly higher RMSEs than those corresponding to the EnKF (not shown). This is consistent with the findings of Janjić et al. (2014), where it was demonstrated that mass conservation is needed to benefit from the positivity constraints. We conclude that the effect of mass conservation is both dominant compared with the positivity constraints and necessary for improved FIGURE 2 Results of state estimation with 5 min between assimilation cycles and 25% additional wind observations. For the analysis state of EnKF (blue), QF (green) and QPEns (red) and for localization radii 2 (square) and 10 (triangle), the plotted quantities are as in Figure 1 accuracy. We therefore assume that the distinctions between EnKF and QPEns arise mainly from mass conservation.
The relative behaviour of the algorithms depends on the experimental set-up. To investigate the sensitivity of the set-up further, we vary the localization radius, the observation coverage of u and the observation frequency. In our experiments, the observation frequency affects the total number of observations assimilated. In particular, by doubling the observation frequency the total number of observations assimilated is also doubled. The results are shown in Figures 2-4.
Comparing the performance of the algorithms for different localization radii in Figure 2, we conclude that N QF and N QPEns are positively correlated with the localization radius. In particular, we see a strong superiority of the QPEns for the narrow localization radius. When the localization radius is too small, balances are destroyed (Kepert, 2009;Greybush et al., 2011), conservation laws are violated Zeng and Janjić, 2016) and the natural spatial smoothness of the state variables is lost. This introduces noise, i.e. undesirable small-scale fluctuations in all fields. It is for this reason that the absolute mass error of h for the EnKF and the QF is much larger for small localization radius. The absolute mass error of h, which is caused by noise that can lead to spurious clouds, propagates to the rain field, also creating a strong absolute error in precipitation for the EnKF and the QF. This is avoided for the QPEns, owing to the mass conservation constraint. The top right plot of Figure 2 supports the hypothesis that mass conservation contributes to preventing the introduction of noise and spurious clouds caused by narrow localization, as the smaller RMSE of rain for the QPEns is inherited from dry regions only. For large localization radii, mass is better conserved within the EnKF, resulting in smaller RMSE for the state and smaller absolute mass errors.
As the time between assimilation cycles increases, the prior becomes less Gaussian and therefore more is gained from taking higher order moments and constraints into account. This is confirmed by Figure 3. For temporal sparse observations, the non-Gaussianity caused by rapidly developing and intermittent clouds has time to manifest itself in the prior distribution. Since the QF partially accounts for non-Gaussianity, the absolute mass errors are clearly smaller for the QF than for the EnKF. As reasoned above, the absolute mass errors are related to noise and spurious clouds. By constraining the mass, noise and therefore spurious clouds are suppressed, as is again confirmed in the top right plot of Figure 3, which shows that, for temporal sparse observations, the distinctions between the performance of the methods come mainly from dry regions, i.e. from the presence of noise and spurious convection.

FIGURE 3
Results of state estimation with localization radius 6 and 25% additional wind observations. For the analysis state of EnKF (blue), QF (green) and QPEns (red) and for an assimilation cycle period of 1 min (triangle) and 30 min (square), the plotted quantities are as in Figure 1 Figure 4 displays information about the behaviour of the algorithms with respect to the observation coverage of u. There is no significant relative difference in performance between the EnKF and the QF when the observation coverage of u is increased. The gradients of the graphs as well as N QF are similar for all observation coverages. It is for the QPEns that large relative diversity is detected. The h and r fields are driven by the u field, since the stochastic perturbations in the wind field determine the location of clouds. Due to the time lag between the development of a cloud in the h field and the production of rain in the r field, the model forecast benefits most from wind observations. By observing the entire wind field, the model forecast has good skill with respect to detecting arising clouds, which is not the case when no additional wind observations are assimilated. When the model forecast is accurate in the location of the clouds, there is less room for spurious clouds. Also, the more accurate the model forecast, the less weight is put on the observations and therefore the less noise is introduced. As a result, mass errors are smaller, so, as the observation coverage of u increases, the benefit of imposing mass conservation decreases. However, when the observation coverage of u decreases, the same reasoning as for the previous figures applies, i.e. there is more room for spurious clouds due to lack of forecast skill, which the QPEns can suppress by imposing mass conservation. Again, the top right plot of Figure 4 clearly supports this.
In general, the QF is superior in the rainy regions and the QPEns in the dry regions. Likewise, separating the h field according to h > h c and h ≤ h c highlights the superiority of the QF in cloudy regions and that of the QPEns in non-cloudy regions (not shown). Due to the mass restriction, the QPEns can only reach the full height of the cloud if the non-cloudy regions are estimated close to perfection, i.e. no spurious mass at all, which is practically unattainable. It is therefore not surprising that the QPEns excels in dry and non-cloudy regions, but lacks mass in cloudy and rainy regions. Since the QF takes higher order moments into account, we expect a better performance than the EnKF as long as the sample size is larger than a critical threshold. This threshold is larger for non-cloudy and dry regions, because the prior distribution is more skewed in those regions and needs more samples to estimate its third moment accurately. The excellence of the QF and the weakness of the QPEns in wet regions are generally slightly enhanced in the forecast (not shown), probably due to nonlinearity.

FIGURE 4
Results of state estimation with localization radius 6 and 5 min between assimilation cycles. For the analysis state of EnKF (blue), QF (green) and QPEns (red) and for no additional wind observations (square) and 100% wind observation coverage (triangle), the plotted quantities are as in Figure 1 The QPEns is by far the most sensitive to the experimental set-up. When observations are temporally and/or spatially sparse, the QPEns excels. However, when they become more dense in the wind field, the ensemble size threshold N QPEns above which the QPEns beats the EnKF is increased. In addition, the QPEns can handle narrow localization radii, which the EnKF and the QF cannot. To illustrate further the effect that mass conservation has on the state estimate, we refer to Figure 5, where a snapshot of a random experiment with 50 ensemble members and similar set-up to the square plot in Figure 4 is presented for the EnKF and the QPEns. For all state variables, the difference in behaviour between the respective algorithms is visible primarily in non-cloudy regions (compare truth (red) and ensemble mean (blue)). Here, undesirable small-scale fluctuations (noise) are significantly smaller for the QPEns.

Parameter estimation
To illustrate the effect of imposing dynamics on the parameters and to compare the two approaches for parameter localization as discussed in section 3, we refer to Figure 6, where the results for the different experiments are shown for the EnKF. For both spatial and global updating, the RMSE of the state and the parameters is clearly reduced when the dynamical model is used to maintain a certain parameter spread. The difference is larger on average for small ensemble sizes, because spread collapse is a slower process for large ensemble sizes. Although it is not clear why the absolute error for increases with ensemble size for spatial updating, its effect on r does make sense. Indeed, since influences the rain decay rate, the error in is strongly correlated to the RMSE of r. Overall, global updating appears to perform as well, if not better, than spatial updating. With the additional advantage of significantly reduced computational costs, we consider global updating successful in this context. The remainder of this article is therefore dedicated to the discussion of experiments with global updating and with a dynamical model for parameters. From Figure 7, we can conclude that applying parameter estimation affects the relative behaviour of the algorithms with respect to the state. There are three important differences. First, the thresholds N QF and N QPEns are smaller than for state estimation without model error due to parameter uncertainty. In particular, QPEns now outperforms EnKF for all ensemble sizes. Second, the contribution of dry regions to the RMSE is relatively larger with parameter estimation. This is also reflected by the increased absolute mass errors. Third, estimating parameters along with the state yields a stronger dependence of the state on the ensemble size for the QF. Already, for small ensemble sizes, the parameter estimates are better for the QF than for the EnKF. For the QPEns, the parameters remain within their respective bounds anyway (not shown) and therefore no constraints are imposed on the parameters. The only influence the QPEns has on parameter estimation with respect to the EnKF is through the constraints imposed on the state. The non-Gaussianity related to the parameters is not dealt with. Therefore, for parameter estimation the gradient of the graph corresponding to the QPEns becomes closer to that of the EnKF. Taking higher order moments into account, however, is directly beneficial for parameter estimation (at least for and h r ), as confirmed by the light blue line in Figure 7, where the state is updated with the EnKF and the parameters with the QF. Given a sufficiently large ensemble size, the QF can represent the distributions related to the parameters more accurately than the EnKF and the QPEns, and, since the model space is essentially increased when parameter estimation is applied, the QF becomes more dependent on ensemble size. It therefore has the potential to beat both the EnKF and QPEns as the ensemble size increases for parameter estimation. The analysis increments are comparable to those obtained from the experiments without parameter uncertainty, meaning that the QF has the smallest increments and the QPEns the largest.
Since no constraints are applied to the parameters, comparing the EnKF and the QPEns gives us information about the influence of state errors on parameter estimation. The smaller error of c for the QPEns indicates that state errors can certainly influence the estimates of the parameters. The parameter errors, in turn, also influence the state estimates. Indeed, the QPEns already yields smaller state errors than the EnKF for small ensemble sizes, which is not the case for state estimation only. Comparing the dark blue and light blue lines in Figure 7 also supports this statement. Taking skewness into account for the parameter updates yields better estimates of the parameters. The fact that the state errors are also reduced (even though no skewness is taken into account for the state updates) confirms the influence of parameter errors on the state estimates. Average RMSE of the state and the parameters versus ensemble size for global (diamond markers) and spatial (star markers) updating, with (solid) and without (dashed) dynamical model for parameters for the EnKF Interestingly, the bias of c is very close to its absolute error, which indicates that the algorithm systematically underestimates the true parameter value. The optimization problem appears to have a local minimum for small c . This explains why the quality of the estimate for c deteriorates as the ensemble size increases. Besides this, a smaller parameter error does not always yield a smaller RMSE of the state: compare for example the QF and the EnKF for small ensemble sizes. This suggests that the additional degrees of freedom introduced by treating c as uncertain compensate for errors that are not parameter-related. Figure 8 demonstrates that this is indeed the case, i.e. the RMSE for state and parameter estimation is sometimes lower than for state estimation without parameter error. In experiments where only c is estimated, this effect is stronger (not shown). The graph with triangular markers in Figure 8 shows the RMSE of the analysis state resulting from state estimation with fixed, but wrong, parameter values. These experiments are meant to represent the effect that model error due to wrong parameter values has on the state estimates. The parameter values for these experiments were chosen such that they are equal to the initial parameter errors of the ensemble mean corresponding to the experiments with parameter estimation. Figure 8 also shows the benefit of parameter estimation on the RMSE/spread ratio. Note, however, that the spread remains too small even for large ensemble sizes, which is apparently caused not only by undersampling and model error (otherwise the RMSE/spread ratio for the round marker experiments would converge to 1 with ensemble size). We hypothesize that, instead, non-Gaussianity contributes to the lack of spread, which cannot be resolved with typical inflation techniques. Indeed, we found that in our experiments multiplicative inflation increases the RMSE as well as the spread, thereby failing to be effective. Figure 9 shows the time evolution of parameter absolute errors for 50 ensemble members. The error of was reduced by between 30 and 45%, that of c by between 45 and 70% and that of h r by between 20 and 30%. As prescribed by the dynamical model for the parameters, the spread converges to 50% of its original value. Although the RMSE of the state converges after 50 cycles, the parameter errors still decrease after 250 cycles, with the exception of h r for the EnKF and the QPEns. This indicates that augmented-state parameter estimation with the EnKF approach is not very flexible. It would need many assimilation cycles to adjust to any changes in the true parameter values. We therefore argue that this method is FIGURE 7 Average analysis (solid) and forecast (dashed) RMSE of the state and absolute error (solid) and bias (dotted) of the parameters versus ensemble size for the EnKF (blue), QF (green) and QPEns (red). The light blue line corresponds to experiments in which the EnKF is applied to the state and the QF is applied to the parameter estimation, i.e. higher order moments are taken into account for parameter updates only suited for static or slowly varying (for instance on seasonal time-scales) parameter values, but probably less suited for fast-varying parameters, such as those that vary with weather. In the latter case, one would have to use a different dynamical model for the parameters or apply an adaptive spatial average method (Liu et al., 2014).

CONCLUSION
In this article we compared the model state analysis RMSE for three different EnKF-based algorithms applied to the modified shallow-water model (Würsch and Craig, 2014). In addition, feasibility of parameter estimation using the augmented state approach (Jazwinski, 1970;Evensen, 2009) was investigated. We used a stochastic dynamical model based on a Beta distribution for parameters to prevent underdispersion in parameter space and keep parameters within their bounds. To deal with global parameter localization, we introduced a computationally cheap and successful modification of spatial updating, which we call global updating.
Decreasing the localization radius introduces noise that leads to spurious convection, which leads to a mass error in h and therefore r. We have shown that, in this case, preserving the mass of h contributes strongly to suppressing this noise and, by extension, the spurious convection that arises from it. It was shown in Lange and Craig (2014) that, given a realistic ensemble size and observation coverage and frequency, the localization radius that yields the smallest RMSE of the analysis state does not necessarily yield the best forecasts, presumably due to the introduction of noise. They argue that the localization radius should be chosen sufficiently large to avoid too much spurious convection at the expense of local forecast accuracy. The QPEns might offer the possibility of suppressing noise, while still achieving local accuracy by decreasing the localization radius. It was also shown that for spatial and temporal sparse observations, currently the case in practice, mass conservation plays a significant beneficial role as well. This serves as a motivation to develop the QPEns further to make it computationally affordable.
The QF, which could be computationally suited for practical applications, shows benefits (especially in rainy regions) for ensemble sizes larger than some critical value, which depends on the localization radius and the observation frequency. Whether incorporating higher order moments is beneficial or not will therefore depend on the application. However, when applied to parameter estimation only, FIGURE 8 Average analysis RMSE (solid) and analysis spread (dotted) of the state versus ensemble size for the EnKF (left), QF (middle) and QPEns (right). The different markers correspond to experiments in which all parameters are estimated (diamond), fixed and correct (circle) and fixed and incorrect (triangle). Note that the solid-line graphs corresponding to the diamond markers are the same as in Figure 7 and the solid line graphs corresponding to the circle markers are the same as in Figure 1 the QF supersedes the EnKF for all ensemble sizes and all state variables that were tested in this study (compare light and dark blue lines in Figure 7). We therefore conclude that accounting for higher order moments explicitly can be effective and affordable for parameter estimation problems.
We detected a strong positive feedback loop of errors between the state and parameters. As a result, the advantages of taking higher order moments into account and imposing constraints are enhanced when parameter estimation is applied. An interesting, though not surprising, side effect of augmented-state parameter estimation is the algorithm's unpredictable use of the additional degrees of freedom. While our goal might be to estimate the parameters to the best possible accuracy, the algorithm might compensate for other errors instead, yielding poorer parameter estimates but better state estimates. This behaviour was detected with respect to c . Still, the parameters were all estimated fairly well, in the sense that the error was reduced in all cases, in some cases even by up to 70% for a modest ensemble size of 50.
In practical applications, where other forms of model error influence state estimation as well, the consequences of parameter estimation are not predictable. Error compensations might be beneficial for some but not all errors. This was clearly highlighted in the study of Simon et al. (2015), where combined state and parameter estimation was conducted on an ecosystem model for the North Atlantic and Arctic Oceans. Therefore, the next research step is to apply parameter estimation to models of operational complexity and investigate the forecast quality in realistic scenarios.

FIGURE 9
Average absolute error (left) and spread (right) of the parameters versus time in assimilation cycles for 50 ensemble members is also grateful to the Hans-Ertel Centre for Weather Research (Weissmann et al., 2014;Simmer et al., 2016) for providing support for this study. This research network of universities, research institutes and the Deutscher Wetterdienst is funded by the BMVI (Federal Ministry of Transport and Digital Infrastructure).