Efficient nonlinear data assimilation using synchronization in a particle filter

Current data assimilation methods still face problems in strongly nonlinear cases. A promising solution is a particle filter, which provides a representation of the state probability density function (pdf) by a discrete set of particles. To allow a particle filter to work in high‐dimensional systems, the proposal density freedom is explored. We used a proposal density from synchronization theory, in which one tries to synchronize the model with the true evolution of a system using one‐way coupling, via the observations. This is done by adding an extra term to the model equations which will control the growth of instabilities transversal to the synchronization manifold. In this paper, an efficient ensemble‐based synchronization scheme is used as a proposal density in the implicit equal‐weights particle filter, a particle filter that avoids filter degeneracy by construction. Tests using the Lorenz96 model for a 1,000‐dimensional system show successful results, where particles efficiently follow the truth, both for observed and unobserved variables. These first tests show that the new method is comparable to, and slightly outperforms, a well‐tuned Local Ensemble Transform Kalman Filter. This methodology is a promising solution for high‐dimensional nonlinear problems in the geosciences, such as numerical weather prediction.


INTRODUCTION
Synchronization is a phenomenon first described by the Dutch scientist Christiaan Huygens. He found that two pendulum clocks suspended by a common frame would oscillate, after some time, in opposite directions but in perfect consonance, independent of their initial phases. He concluded that the clocks were able to synchronize their phases due to the frame they were sharing, a common factor between both. Nowadays, based on these ideas, researchers in this field focus on exploring the synchronization of two or more systems while they share common information. Typically, this is obtained by a suitable relaxation term that forces the model (or models) towards the observations and helps to control the stability of the synchronization manifold.
There is a close connection between the concepts of synchronization and data assimilation, as the latter aims to synchronize the model evolution with the truth, via the observations, finding the best state estimate and its uncertainty. In this case, the information to be shared between the systems flows from the truth to the model, in a unidirectional way, through noisy and sparse observations. Through the light of Bayes' Theorem, as explained in (e.g.) van  , data assimilation is actually a mul-tiplication problem rather than an inverse problem. Indeed, one tries to represent a posterior probability density function (pdf) by multiplying the information contained in the model prior with the likelihood, which is updated when observations are available. Data assimilation encompasses a more general concept than synchronization.
Similarities can be found between synchronization and data assimilation, as pointed out by Yang et al. (1996) and Duane et al. (2006) and more recently by Pinheiro et al. (2018), where a discussion can be found on the relations and differences between time-delayed synchronization and the Kalman smoother. A crucial difference is that the synchronization framework allows one to reuse the observations, which is a complication in many data assimilation methods.
Recent work in the synchronization field has explored the time-embedding concept, by considering observations either from the past or future (e.g. Rey et al., 2014aRey et al., , 2014bParlitz et al., 2014;Pazó et al., 2016). Based on these previous works, Pinheiro et al. (2018) have focused on constructing a more robust synchronization scheme that could potentially be applied to a realistic geophysical model, in combination with a particle filter. To this end, they proposed an ensemble-based synchronization framework (EnSynch) that, with a small number of ensemble members, provided promising results for a high-dimensional system. As the authors mention in their study, their ensemble methodology cannot be used as a stand-alone data assimilation method, as no uncertainties are computed and also because observations inside the time window are reused in different time steps, which could bring complications in most data assimilation methods. The authors also discussed the similarities and differences between the ensemble synchronization and methods like the ensemble smoother and also variational methods that employ ensembles to avoid adjoint calculations like, for example, the four-dimensional ensemble-variational schemes (4DEnsVar) (e.g. Bannister, 2017 and section 4.5 in Carrassi et al., 2018) and iterative ensemble smoothers like the iterative ensemble Kalman smoother (IEnKS) of Bocquet and Sakov (2014). Compared to the ensemble smoother, one of the main differences is that the ensemble synchronization ignores the observation-error covariance matrix in the gain. This possible weakness in the scheme is counterbalanced by the inclusion of a tuning factor in the synchronization formulation: the constant . This constant regulates the strength of the influence that the pseudoinverse of the Jacobian of the operator (an embedding map, which contains the dynamical model and the observation operator) will have on the innovation. Another difference is that synchronization allows observations to be reused along the iterations in the time-embedded intervals, so they can increase the observability of the system. This is an interesting feature of synchronization.
F I G U R E 1 Use of observations within the assimilation window for different methodologies: IEnKs MDA, EnSynch and 4DEnsVar. The black dots are the observations which are used in the cycle. The IEnKS uses observations several times, but with increased observation errors, the EnSynch uses observations several times, and 4DEnsVar uses each observation only once (see text for details) Note that the IEnKS also uses observations several times, but with a factorization of the likelihoods, which does not change the observability of the system. Figure 1 shows how observations are used inside a data assimilation window for the IEnKS MDA (Multiple Data Assimilation window), the EnSynch and the 4DEnsVar. In this illustration, observations exist at every time step, but only the black ones are effectively used. Note that in the first two cases the assimilation windows overlap and so the observations are re-used. In the quasi-static IEnKS MDA case, the observations are assimilated several times, but they are weighted within the window as: 1 = ∑

=1
, where is the length of the assimilation window and 0 < < 1 is the extra weight each observation at time receives within this window.
In this work we will merge the efficiency of synchronization described in Pinheiro et al. (2018) with the data assimilation formalism. To this end, we will use synchronization indirectly, as a proposal density, in a nonlinear data assimilation methodology: a particle filter.
It is well known that one can explore the proposal density freedom by modifying the model equations (e.g. Doucet et al., 2001;van Leeuwen et al., 2015). In this concept, extra terms can be included in the model equations, even counting on tunable parameters. The main goal is to steer the particles towards the high-probability region of the posterior pdf between observation time steps. Some interesting experiments can be found in the literature on tests with different proposal densities. An ensemble Kalman filter has been used as a proposal density in a particle filter by Papadakis et al. (2010). Further tests with simple relaxation terms have been performed by van Leeuwen (2010), by Ades and  in a high-dimensional system and by Browne and van Leeuwen (2015) in a climate model. The latter concluded that the relaxation term used was not optimal, as the particles were spreading away from the truth between observation time steps. In Browne (2016), the use of an ensemble Kalman smoother as a proposal was still not effective. Therefore, the main motivation of this paper is to investigate how effective synchronization is as a proposal density in a particle filter.
One of the main issues of particle filters is that ensembles of particles tend to collapse in high-dimensional systems (also known as the curse of dimensionality or degeneracy). The filter chosen for our experiments is based on the implicit equal-weights particle filter (IEWPF; Zhu et al., 2016), but with the extension advocated in Skauvold et al. (2019). This filter does not suffer from degeneracy by construction and allows for unbiased mean and covariance.
The paper is structured as follows: Section 2 describes the methodology used to construct the ensemble-based synchronization framework, gives a summary of the IEWPF formulation and details the implementation of the ensemble synchronization as a proposal in this particle filter. Section 3 presents results on the experiments using the Lorenz96 model (Lorenz, 1995) in a 1,000-dimensional system. In Section 4, we discuss the results and present conclusions.

2.1
The ensemble-based synchronization (EnSynch) Pinheiro et al. (2018) have developed an ensemblesynchronization scheme that can be applied to realistic high-dimensional geophysical models. This methodology is an ensemble-based time-embedding scheme, similar to an ensemble smoother or a 4DEnsVar, which avoids the need for tangent-linear models and adjoint calculations.
The scheme explores the use of time embeddings, in which a delay dimension d is defined in order to capture information from observations, which occur at every time steps within a time interval [ , + ( d − 1) Δ ], where is a constant. Observations are then included, so they can bring additional information back to the present time. Pinheiro et al. (2018) show that, after stabilizing the synchronization manifold, which can be briefly described as a set of subspaces which states are attracted to, very precise estimates are obtained, both for observed and unobserved variables, allowing accurate predictions over a significant forecast period. The ensemble methodology detailed in their paper is summarized here. Define a true state x T ∈ ℜ in the evolution equation of the true system: where is the nonlinear model at time and +1 T is a stochastic perturbation represented by a normal distribution of ∼ (0, 0.01). Compute the evolution of a state estimate x with time using the coupled dynamics and the perfect model : where we include the augmented e -dimensional vectors S(x ) and Y . The embedding dimension e is defined by e = d . The vector S ∈ ℜ e is related to the states of the model and Y ∈ ℜ e is related to the observations y ∈ ℜ , as: The states x in S(x ) are the means of the ensemble of states x , where is the ensemble index. S(⋅) is a map from physical to an embedding space.
is the observation operator, which can be linear or nonlinear. It is assumed that does not change with time, but it is straightforward to make it time-dependent. The coupling constant determines the strength of the coupling, which is progressively increased by along the interval between observations. At an observed time step, when we update the variables, = 1. During the time steps without observations, is increased from 1 to ( int − 1) at the last unobserved time step in the interval, where int is the observation interval.
S(x )∕ x is the Jacobian matrix of S. Its pseudoinverse, ( S(x )∕ x ) † , is responsible for spreading the information from the observed to the unobserved variables. It is computed as follows: where the pseudoinverse on the right-hand side is computed via a truncated SVD. X is the ensemble perturbation matrix, that is, each column is the difference between each member and the mean. Additionally, the ensemble synchronization scheme uses a localization method to reduce the influence of observations which are far away from the variables (Houtekamer and Mitchell, 1998). Pinheiro et al. (2018) give an overview of the localization implementation in this scheme. When implemented in the particle filter, the scheme of Pinheiro et al. (2018) is adapted, to reduce its computational cost and also to fit it into the particle filtering framework, as follows.
In order to avoid calculating the forecast ensembles several times, we store the initial set of ensemble perturbation matrices X computed at time step , so we have: Note that these perturbation matrices are obtained while constructing the Jacobian S(x )∕ x . We also store the ensemble means at each of the embedding times described in Equation (6).
At the next time steps in which we have observations, we perform the following procedures: (1) Rescale all perturbation matrices in X with a factor: where here corresponds to the state variables and is the formulation used to compute the spread. The (prior) , then, is the ensemble spread for the actual time step propagated from the previous observation time, i.e. the prior. (new) is the spread of the perturbation matrix newly computed at the actual time step, after using Equation (2).
(2) Recentre the ensembles at each observed time, by adding to the means previously stored, the difference between the mean of the new ensemble and the prior ensemble. (3) Propagate the last rescaled and recentred ensemble, that is, the one at + ( d − 1) Δ for an additional time steps, to have a complete X, as needed in Equation (2).

Particle filters
Particle filters are based on Bayes' Theorem, in which the posterior pdf of the state x , given the observations y at time is: The basic idea is to represent the prior pdf, a probability density that contains information from the model, as a discrete set of model states or particles x : This approximation, combined with Bayes' formulation, leads to the following representation of the posterior: in which are the weights: These weights are attached to each particle to give them a relative importance, according to how close or distant the model states are from the observations. If only few particles are close to the observations and receive non-zero weights, the posterior pdf will lose its full statistical information, as the other particles will be discarded, due to their inability to follow the observations. This is called filter degeneracy or filter collapse and it happens in even small-dimensional systems. Doucet et al. (2001) gives a more detailed mathematical explanation on particle filters.
Several methods have been developed to mitigate this problem. Snyder et al. (2008; proved that resampling would not solve the degeneracy, even using the so-called optimal proposal density. To avoid the particles becoming degenerate, it is crucial to ensure that equally significant particles are drawn from the posterior density. To do this, we must perform two different and important steps: (a) guarantee that the particles are located at the high-probability regions of the posterior and (b) ensure similar or equal weights of the particles, so they will never collapse. To this end, several methods have been recently developed, such as the Equivalent Weights Particle Filter (EWPF; van Leeuwen, 2010) and the IEWPF (Zhu et al., 2016;Skauvold et al., 2019).
Regarding the first step, we need a scheme that pulls the particles towards the observations, between observation time steps. An interesting property of particle filters that can be explored to this purpose is the so-called proposal transition density.
Consider time steps and + 1 between two observation time steps − 1 and , that is, −1 < < +1 < . The model is assumed to be Markovian. This way, the pdf (x +1 ) can be written as: where (x +1 | x ) is the transition density of the original model. We have the freedom to introduce a proposal density as follows, without changing the previous equation: This is possible if the support of is equal to or larger than the one in . This way, we avoid dividing the previous equation by zero.
Instead of drawing from (x +1 | x ) and so using the original model, we now draw from (x +1 | x , y ). This way, the prior at time is now defined as: in which the weights are accumulated during the time steps between observations until time : It is possible to choose any proposal transition density for (x +1 | x , y ), so the aim is to use one that includes additional information from the observations in the future, in an optimal way.

Ensemble synchronization as a proposal density in the IEWPF
Considering the proposal density freedom, several methods can be used for this, including traditional methods such as the four-dimensional variational schemes (4D-Var), as essentially proposed in Chorin and Tu (2009), and the ensemble smoothers, like in Browne (2016). However, these are rather expensive, as it usually involves solving a problem similar to 4D-Var for each particle. Thus, typically, simpler schemes are employed between observation times. These schemes will be less efficient, although we can ensure that Bayes' Theorem is fulfilled exactly for each particle.
Some authors (e.g. Zhu et al., 2016) implemented the particle filter with a weak relaxation term between observations to control the spread and to achieve better-converging trajectories of the particles. To this end, the relaxation transition proposal density that appears in Equation (14) was defined by those authors as: where Q is the model error covariance and where ( ) is the relaxation strength, given by In this formulation, is a constant, increases linearly from zero to one along the unobserved time steps and R is the observation-error covariance matrix.
They concluded from this formulation that the relaxation term used was a weakness in their scheme. They also argued that proposal densities can be more sophisticated than the one they have used and should be tested to improve the performance and increase robustness. That is exactly what we are investigating in this work.
We will assume that the model errors and the proposal are Gaussian. The transition density in Equation (13) for the prior is related to the original model via where +1 is a stochastic vector representing the model error +1 ∼ (0, Q). This leads to: where The proposal density that appears in Equation (14) is related to a proposed model, in our case, the ensemble synchronization equation: wherê+ 1 ∼ (0,̂), which in our case is drawn from the same distribution as +1 , so we assume here that̂= Q. The augmented observation and state vectors Y and S use future observations in the time-embedding window, described in Equations (4) and (3), respectively. Hence, where which is the deterministic part of the right-hand side of Equation (23). The distribution of this proposal density is calculated as: which we can relate to the model error added in Equation (23) as We avoid computing Q −1 in Equation (27), as in Ades and , by noting that the error̂+ 1 is sam- This change in model equation is compensated by an extra weight, as explained in the previous section, which accumulates for all time steps, except the last time step before the next observation, to At the last time step before the next observation, the IEWPF is applied, to ensure that all of the particles have equal weights in the posterior pdf.
A brief explanation of the modified IEWPF scheme (Skauvold et al., 2019) is as follows. For each particle two samples are drawn from a Gaussian distributed proposal ( , ) = ( | ) ( ), such that is perpendicular to . We then form a new particle as: x a is the mode of (x | x −1 , y ), and are scalars and P is an estimate of the covariance of (x | x −1 , y ). One chooses , which is related to the spread of the ensemble, and is typically a number between 0.3 and 1. An advantage of the modified scheme is that all parts of state space can be reached, and the posterior covariance is no longer systematically too low.
The parameter is chosen to scale the size of the stochastic forcing, in order to make the particles receive the same target weight target , that is: Using the general expression for the weights we thus find Since we do not draw from (x | x −1 , y ) directly, but from the Gaussians the expression for the weights becomes where −1 is related to the weight from previous time steps, as in Equation (30). Zhu et al. (2016) give a more detailed description of the IEWPF, and Skauvold et al. (2019) describe the modified version used here.
Regarding the implementation of the EnSynch method in a particle filter, a powerful data assimilation software package called EMPIRE (Employing MPI for Researching Ensembles) was used (Browne and Wilson, 2015). In EMPIRE, coupling between the numerical model and the data assimilation methods is performed through MPI (Message Passing Interface). This way, the main programming effort is directed only to the data assimilation scheme which one wants to test, while minimal changes in the model are needed.
In the Appendix, a description on how to code the IEWPF using the ensemble-based synchronization scheme as a proposal density is given. Note that we avoid calculating the full weights by only calculating the logarithm of these weights. This is a great advantage of this scheme.

Computational costs
The computational costs of the scheme can be estimated as follows. First one needs to integrate the whole ensemble d time steps forward. However, this is a one-off cost as we update this long ensemble forecast only by integrating it another time steps forward at every observation time.
Hence we need to integrate the ensemble twice between observation times: once to calculate the forcing term for the synchronization, and a second time for the actual synchronization. This is equivalent to twice the cost of other ensemble schemes such as an Ensemble Kalman Filter.
The actual calculation of the synchronization term involves localization and an SVD in the localization domain. This is similar to the kind and number of operations for a local Ensemble Kalman Filter. On top of this the new scheme needs the IEWPF at observation times. Since that scheme does not need localization, the main costs are generating random variables with covariance P, because the inversions needed can be done offline, its costs are typically negligible compared to those of, for example, a local Ensemble Kalman Filter update.
To sumarize, the new method is expected to be more expensive than ensemble methods like Ensemble Kalman Filters when the model integrations are the dominant computational costs, up to a factor 2. If the update is the most expensive part the costs should be similar.

EXPERIMENTS AND RESULTS
The metric used in all experiments is the root mean square error (RMSE): In Equation (35), is considered, as we performed twin experiments, where the truth is artificially generated. In a real experiment, one would count on the observations to know about the truth.
is the best estimate in the pure synchronization experiments, and the ensemble mean for the particle filter. For the particle filter experiments we also use the ensemble spread, defined as the square-root of the ensemble variance averaged over all grid points: in which , is the value at grid point of ensemble member . For both metrics we also consider the time average.

Results for the EnSynch
In this section we show a few results using pure synchronization without a particle filter to provide an idea of what can be achieved this way. Since the emphasis of the paper is on the coupling with the particle filter, only a few results will be presented. Twin experiments were performed using a -dimensional Lorenz96 model, where = 1, ..., and we are showing results for = 1, 000. The forcing parameter = 8.17 is set to guarantee a chaotic behaviour in the model. A fourth-order Runge-Kutta scheme was used, with Δ = 0.01 and observations are used every = 10 time steps. F I G U R E 2 Trajectories of five unobserved variables in a 1,000-dimension system. The blue curves are the truth and the green ones are the estimates. Predictions start at time step 1,000 (after red lines) Observation noise is sampled from a normal distribution, using a standard deviation obs = 0.1. Every fourth variable is measured in the Lorenz ring, that is, 25% of the system is observed. Aiming to increase the complexity of the problem, the frequency of observations is decreased, compared to the first experiments in Pinheiro et al. (2018): instead of having observations available at every time step, they now occur at every 10 time steps. Note that, given the chaoticity of the model, the number of positive Lyapunov exponents is at least 1∕3 of the model dimension (Pazó et al., 2016), therefore observing just 1∕4 of the system is a challenging problem.
Ensemble members are perturbed with a normal distribution of ∼ (0, 0.01) at = 0. Different ensemble sizes were tested, but we observed that the best results compared to computational costs were obtained with ens = 30, and lowering this number destroyed the synchronization. Increasing the ensemble size did not lead to large improvements, related to the use of localization as discussed below. Regarding the computation of the SVD in Equation (5), the number of singular values considered is equal to ens . Different initial conditions and random number realizations were tested, showing similar qualitative and quantitative results.
Due to the small ensemble size, we need to reduce the influence of spurious correlations in the synchronization term. A domain localization method is applied, in which we assume that a position in physical space (e.g. a location on the Lorenz ring) is attributed to each observation. The goal is to reduce the influence of observations which are far away from the variables (Houtekamer and Mitchell, 1998). Pinheiro et al. (2018) give further details on the implementation of localization in the ensemble synchronization. In these experiments, we use a radius of influence = 10 as a threshold, so that any observations located further than this radius are ignored.
The coupling constant is a tuning parameter in the synchronization framework, and in the following example we use = 21. The behaviour of the system for different values of is discussed later. The experiment is run for an estimation period of 1,000 time steps followed by a forecast period of another 1,000 time steps. The size of the delay dimension is d = 5, as in Pinheiro et al. (2018) and is an important factor on the system, as it needs to be large enough to capture additional information from specific observations, but small enough to avoid numerical instabilities on the pseudoinverse computation. Pazó et al. (2016) provide an order-of-magnitude estimate for this optimal delay time as opt = (1∕ ) exp[( − ) opt − 1], in which is the synchronization strength and is the maximal Lyapunov exponent. In our case, for the 1,000-dimensional system, ≈ 2 and ≈ 2, leading to opt ≈ 0.2. A d = 5 corresponds to opt = d Δ ≈ 0.5, indeed of similar magnitude to the Pazó et al. (2016) estimate. This is remarkable, as their equation is a rough approximation and our synchronization term is much more complicated than theirs.
After the assimilation period, predictions start and the system is set to run freely, without the use of the ensemble synchronization scheme. Figure 2 shows estimations (before the red lines) and predictions (after the red lines) for five unobserved variables in the system. During the estimation stage, the green lines (estimates) perfectly match to the blue lines (truth), showing how close the estimates are to the true values. Additionally, during the prediction period, trajectories keep following the truth for around 200 time steps. After that, trajectories start to diverge, due to the chaoticity of the model.
The so-called leading finite-time Lyapunov exponent, ( ), can be computed for a time as:

F I G U R E 3 Leading Lyapunov exponents over 200 time steps.
Positive exponents: red. Negative exponents: yellow and it represents the mean growth rate of the distances between neighbouring trajectories (e.g. Boffetta et al., 2001). Figure 3 shows these leading finite-time Lyapunov exponents of the assimilation solution along 200 time steps. The red crosses show positive values, providing information on moments in which perturbations can potentially grow exponentially, in finite time. After some time steps, negative leading Lyapunov exponents predominate (shown in yellow in Figure 3), apart from a short period of desynchronizing near the end of the experiment (not shown here), which means that the growth of instabilities are supressed for most of the time, as can be seen during the assimilation period in Figure 2.
The behaviour of the system depends on the synchronization strength parameter . Figure 4 shows an example run over 2,000 time steps, with long periods of synchronization interrupted by short burst of desynchronization, using the RMSE as metric. The figure shows how the RMSEs decay exponentially until the order of magnitude of the observation noise added to the system, but also sharp increases of the RMSE when the system desynchronizes, to synchronize more slowly afterwards. Considering that in our tests we have less frequent observations than Pinheiro et al. (2018), we observe that: (a) as the job of the coupling term to steer the particles towards the truth has become more difficult along the gaps between observations, it takes longer for the system to reach RMSEs comparable to obs , and, in this experiment, desynchronization does occur occasionally, (b) the proportional reuse of the coupling term in the gaps between the occurrence of the observations is indeed useful, and To quantify the role of the synchronization strength , we performed ten experiments with different random seeds for different values. Figure 5 shows the results for the occurrence of synchronization, synchronization with occasional bursts, and the absence of synchronization. For the present experimental settings, ≈ 20 yields the best results.
These results show that the ensemble-based synchronization scheme is a valuable tool to steer model states to the truth, working very well in a moderately high-dimensional system, using a desirable small number of ensemble members. It proves to be a potential scheme to be inserted as a proposal density in a particle filter.

Results for the IEWPF using EnSynch as a proposal
For these experiments, we keep the same configuration used for the ensemble synchronization experiment with the Lorenz96 model. The same proportion of a 1,000-variable F I G U R E 6 Magnitude of global RMSE (red) and ensemble spread (blue) as a function of in the particle filter system is measured (25%) and the same standard deviation for the observation error is used: = 0.1, with observations every ten time steps.
One first important point to note is that the main goal of the stand-alone ensemble-based synchronization scheme shown in the previous section is to keep its convergence towards the truth. The ensemble is used only to facilitate the computation of the Jacobian. When we insert this scheme into a particle filter and its ensemble of trajectories, it means that the synchronization coupling term now has to synchronize all trajectories at once, which does not necessarily mean that it will reach its optimal state as shown previously. Actually, we observe that although most ensemble members are moving towards observations between assimilation steps, some do not feel much positive influence of the synchronization term.
In the experiments shown below we used d = 1 with number of ensemble members ens = 20. We found that extending the synchronization window beyond the next observation time did not provide better results. This is related to the fact that the IEWPF will disturb the synchronization of the ensemble members. The particles were initialized by a random vector drawn from ∼ (0, 0.5). The model error covariance is chosen diagonal with variance 0.25. This makes the problem hard for the IEWPF part of the algorithm, as information from observations is spread out via Q. Hence the information flow in the synchronization part is crucial, demonstrating the intimate connection between the two parts of the algorithm.
In Figure 6 we show the RMSE and the spread of the ensemble, defined as the square-root of the ensemble variance averaged over the whole domain, versus . Each point is the result of running the data assimilation scheme for 4,000 time steps, and the average of 20 of these runs each with a different random seed. The uncertainty on the RMSE values is 5% and smaller than 1% on the ensemble spread. As seen in the figure, the RMSE and the filter estimate of the error from the ensemble spread are very similar, pointing to filter consistency.
The ensemble spread is a function of the filter parameter , which varied from 0.05 to 0.95, growing with . This parameter is similar to an inflation factor in Ensemble Kalman Filters. The connection with is understandable: the larger F I G U R E 7 RMSE (black) and ensemble spread (blue) as a function of time for the IEWPF experiments with = 1.5. Note the accurate estimate of the actual RMSE by the ensemble spread is, the stronger are the ensemble members drawn to the observations, and hence to each other. The parameter is used to avoid a too small ensemble spread.
Compared to pure synchronization in Figure 5, the best values for are smaller when synchronization is embedded in the particle filter. This is not surprising as the IEWPF step draws particles together and hence a less strong synchronization is needed. Figure 7 shows the time evolution of the RMSE and the ensemble spread over 4,000 time steps for one of the experiments for which = 1.5. The estimated error via the ensemble spread is a good estimate for the actual error. Note that the actual error fluctuates more as it is based on a single realization, while the ensemble spread is a statistical quantity. The figure shows a fast initial rise over about 100 time steps, followed by a stable behaviour after that. The RMSE saturates at about 0.70, which is larger then the observational error of 0.1 in this case. However, the majority of the state is not observed, and it is well known that the spatial decorrelation length-scale of the Lorenz96 model is 1 or 2 grid points. Furthermore, we employ a model error of size 0.5 which is diagonal. Hence a RMSE value larger then 0.1 is not unexpected.
We compared this behaviour to a state-of-the-art existing methodology, a Local Ensemble Transform Kalman Filter. We tuned the LETKF with a fixed inflation factor and a fixed localization radius. Figure 8 shows the time evolution of the RMSE and the ensemble spread for one of the experiments for which a localization radius of 4 and an inflation factor of 1.05. These correspond to the best values after tuning the LETKF. The inflation factor was not chosen to provide the lowest RMSE, as in many papers, but to provide a consistent error estimate compared to the true RMSE.
When we compare Figure 8 with the Figure 7, we see immediately that the new particle filter with a synchronization proposal outperforms the optimally tuned LETKF, but only F I G U R E 8 RMSE (black) and ensemble spread (blue) as a function of time for an LETKS experiment with inflation factor 1.05. Note the accurate estimate of the actual RMSE by the ensemble spread. Note that the vertical scale is different from the previous figure slightly so. The time-averaged RMSE and ensemble spread for the LETKF are 0.80 and 0.81, respectively, compared to 0.71 and 0.70 for the new particle filter. This is not surprising as the nonlinear filter is able to extract more information from the observations then the linearized LETKF. This also confirms the discussion in the previous paragraph about the RMSE error versus the observational error.
It is important to realise that the LETKF needs localization, while the IEWPF part of the particle filter does not need localization. Localization is only present in the synchronization proposal, and we do have complete freedom in the proposal density. Hence the particle filter solves Bayes Theorem to a high degree of accuracy, apart from the finite ensemble size.
We also investigated the performance of the new filter as a function of the observation error magnitude. Figure 9 shows the RMSE of an optimally tuned IEWPF with synchronization and an optimally tuned LETKF as function of the standard deviation of the observation error. Note that optimally tuned means that the parameter in the new filter and the inflation factor in the LETKF are chosen such that the ensemble spread matches the RMSE. As expected, the RMSE grows with the observational uncertainty. The saturation at low observation errors is due to the model error size. Reducing the model error by a factor 2 also reduces the RMSE by a factor 2 (not shown here).
The standard deviation in each estimate in Figure 9 is 5%, and hence the new particle filter is systematically better than the LETKF, but only slightly so.
We also looked at the sensitivity of the new particle filter to the number of ensemble members. Figure 10 shows the RMSE and corresponding ensemble spread as function of the ensemble size, varying as 20, 50, 75 and 100. The observational error standard deviation was chosen to be 0.5 F I G U R E 9 RMSE of the new particle filter (red) and the LETKF (blue) as a function of observation error standard deviation, with values 0.01, 0.1, 0.2, 0.5, 1.0, and 2.0. The saturation at low observation errors is due to the model error size. The uncertainty in each estimate is 5%, showing that the new particle filter slightly and consistently outperforms the LETKF F I G U R E 10 RMSE of the new particle filter as a function of ensemble size. The uncertainty in each estimate is 5% to avoid the saturation effect mentioned earlier. (Another way would be to reduce the model error, but we did choose the more demanding experiments here.) As expected, the RMSE decreases when the ensemble size increases, close to a square-root 1∕ √ ens dependence, as expected from a Monte-Carlo method.
These experiments (and many others not shown) allow us to provide some guidance on how to choose the parameters in the new particle filter. There are three tuning parameters, the localization radius for the ensemble synchronization, the synchronization strength parameter, and the factor in the IEWPF. The localization radius choice is quite similar as for Ensemble Kalman Filters and Smoothers (the latter for short time windows): a first guidance comes from physical considerations on characteristic length-scales in the system, and then one has to make sure that spurious correlations have to be removed. The synchronization strength is trail and error, but for all experiments shown here a value between 1.0 and 1.5 seems to work best. However, this is expected to be highly system-dependent. The factor has to be chosen such that the ensemble spread is of similar size as the RMSE, similar to the inflation factor in Ensemble Kalman Filters. The difference is that theory tells us that 0 < < 1 (Skauvold et al., 2019). Typical values used here were 0.3-0.5, and values up to 0.9 for an observation noise of 2.0. Again, the best value will be highly problem-dependent.

CONCLUSIONS
In this paper we show results obtained in a nonlinear system, using a non-degenerate particle filter, a modified version of the Implicit Equal-Weights Particle Filter (IEWPF), with a proposal density between observations composed of the ensemble-based synchronization scheme (Pinheiro et al., 2018). The main idea is to apply the coupling term derived from the ensemble synchronization to each particle in the filter, leading the ensemble to follow the true state of the system, as the local instabilities in the dynamics are controlled by the time-embedding framework. We first investigated the behaviour of the synchronization separately to better understand the behaviour of the system in a 1,000-dimensional Lorenz1996 model. It was shown that the synchronization converges, at a very slow rate, if the synchronization strength parameter is chosen correctly. The synchronization depends on the length of the time-embedding used, and the optimal embedding was found to be 50 model time steps.
When we used the synchronization as proposal density in the IEWPF, we found that best results were obtained with a reduction of the time-embedding interval to the time period between observations. This is related to the fact that the IEWPF will move the ensemble members in state space at observation times, and the delicate synchronized configuration is broken.
The results of the resulting particle filter were compared to a well-tuned state-of-the art LETKF. It was shown that the new filter consistency outperforms the LETKF, but only slightly so. We tuned the LETKF not on lowest RMSE only, but also on consistency of RMSE and ensemble spread. It should be mentioned that we used the LETKF with a fixed inflation factor, while an adaptive inflation factor might perform better. However, this would be similar to making the synchronization strength factor adaptive, which we did not consider here. It is argued that the new particle filter performs better because it is a nonlinear data assimilation method, and expected to have a stronger capacity of extracting information from observations in a nonlinear setting.
The cost of the new particle filter can easily be compared to that of an Ensemble Kalman Filter. It is argued that the main difference is that in the particle filter the ensemble has to be run twice, once to obtain the synchronization term that is then used in the second integration. If model integrations are the dominant costs of the data-assimilation exercise, the new method is up to twice the cost of an Ensemble Kalman Filter. For the experiments performed here the execution times were the same order of magnitude.
We understand that this initial implementation, although successful, still does not use the full potential of the ensemble synchronization scheme in the proposal, as further work on understanding the efficiency of the synchronization coupling term in different trajectories of the particles is still needed. Additionally, we interrupt the synchronization process at every 10 time steps to equalize the weights of the particles through the IEWPF part of the filter. During the experiments which investigated a larger number of observations available in the system (not shown here), we observed interesting relationships between the time delay and the coupling constant . As we know that the time delays play an important role in increasing the observability of the system, it is interesting to note that this can also be controlled by the coupling factor , consistent with the results of Pazó et al. (2016). This way, we expect that many improvements can be made to the system, as these are the first results combining synchronization and a particle filter to tackle this complex, nonlinear problem.
An interesting extension would be the implementation of a backward-forward time-embedding in the ensemble synchronization scheme, such as in Auroux and Blum (2008), as the disruptions from the IEWPF scheme would be less severe.
Finally, it is worth mentioning that the promise of particle filters is to provide a set of samples that represent the posterior pdf. Because the number of samples will be limited in high-dimensional geoscience applications, this representation will need to concentrate on the most important features of this posterior pdf. What these features are will depend on the application. Here we have concentrated on the first two moments of the posterior, and their behaviour in the prediction stage. While other methods like an Ensemble Kalman Filter also concentrate on these two moments, they will be biased in nonlinear applications due to the Gaussian assumptions. In practice, these biases are well mitigated by largely ℎ adaptations, such as localization and inflation. It is shown here that the particle filter does not need these ℎ adjustments. However, it is important to investigate the behaviour of moments, or the ability to detect and track multiple modes, using particle filters. This will be investigated in future work, where we note that it is difficult to find a benchmark solution for a high-dimensional system.