Introduction to the special issue on “25 years of ensemble forecasting”

Twenty‐five years ago the first operational, ensemble forecasts were issued by the European Centre for Medium‐Range Weather Forecasts and the National Centers for Environmental Prediction. These centres were followed in 1996 by the Meteorological Service of Canada, and in the subsequent years by many others. Operational ensemble‐based, probabilistic forecasts signed a paradigm shift in weather prediction: for the first time, forecasters and users could have reliable and accurate estimates of the range of possible future scenarios, and not just a single realization of the future. Today, ensembles are used not only to provide reliable and accurate forecasts for the short and medium range, the monthly and seasonal time‐scale, but also to provide estimates of the initial state of the atmosphere, and to generate future climate projections. This article provides an overview on how we developed the early ensembles, illustrates the key characteristics of the seven operational, global, medium‐range ensembles, and discusses ongoing trends to further improve ensemble performance.

how "predictable" the future situations are. These estimates can be expressed in different ways, as a range of possible scenarios, as probabilities or cumulative distribution functions, as clusters highlighting very few possible alternatives or as box-and-whisker plots. Short and medium-range forecasts, monthly and seasonal forecasts, and even decadal forecasts and climate projections are today based on ensembles. Ensembles are used also to estimate the current state of the atmosphere, to provide users with an estimate of the initial-state uncertainty (the analysis error), and to provide a range of conditions that can be used to initialize ensemble forecasts.
This Editorial introduces this Special Issue on "25 years of ensemble forecasting", which includes contributions that discuss different aspects of ensemble prediction: the design of ensembles, the initialization of ensemble forecasts and model error schemes, high-resolution and convective-permitting ensembles, error growth and predictability, verification metrics and application of ensemble forecasts.
In this article, in section 2 we discuss how ensembles have been gradually introduced in operational weather prediction suites. In section 3, we discuss the configuration of the seven global, medium-range ensembles operational at the time of writing (June 2018). In section 4, we briefly discuss how ensembles are now used also to estimate analysis uncertainties, and for the sub-seasonal and seasonal forecast range. Finally, some key areas of ongoing research aiming to improve further ensemble forecasts are discussed in section 5.

THE MOVE TOWARDS ENSEMBLES SIGNALLED A PARADIGM SHIFT IN WEATHER PREDICTION
Since forecasters started using numerical models to predict the weather, they realized that also for the synoptic scales (and not just for local weather), there were cases when forecast errors would remain small for long forecast ranges, while in other cases even a 1-day forecast would be wrong. This operational experience was supported by scientific work that pointed out that due to the chaotic nature of the atmosphere, even very small initial errors could grow very rapidly and affect forecast quality in a very short time (Lorenz, 1969a;1969b;1982).
In the 1970s and 1980s, scientists and operational forecasters started investigating whether it would be possible to know in advance, at the time when a forecast is issued, whether the future situation would be easy (or, say, easier than average) to predict. In other words, they were looking for an objective method that could provide a level of forecast confidence. This confidence could be expressed in probabilistic terms, for example in terms of the probability that a specific event (e.g. rainfall more than 50 mm over 6 h) would occur. Otherwise, it could be expressed in terms of a range of weather scenarios, each with an assigned probability of occurrence. In this way, forecasters could provide their users with the probability of occurrence of each scenario, or of any "tailored" weather condition.
An example of the thoughts of the 1980s comes from Palmer and Tibaldi (1988), who wrote that "It is apparent, therefore, that a scheme to predict forecast skill will have substantial benefit in the medium range, and is an essential requirement for dynamical extended range forecasting." To achieve this, different approaches were tested, mostly based on ensembles, i.e. on mixing and combining many forecasts either started from different conditions, or generated using different models, or using a combination of the two (for some of the earlier works, see, e.g., Hollingsworth, 1980;Hoffman and Kalnay, 1983;Molteni et al., 1996).

2.1
Why are probabilistic forecasts generated using "ensembles" and not by other means?
Ensembles proved to be the only practical way to predict the time-evolution of the probability density function (PDF) of forecast states. Indeed, Ehrendorfer (1994a;1994b) argued that even integrating a Liouville equation for the forecast PDF for systems with more than very few degrees of freedom might require the use of ensemble methods. This means that ensembles are the only feasible way in numerical weather prediction (NWP), given that the number of degrees of freedom of the phase space spanned by the NWP models is of order 10 6 to 10 10 for existing operational systems.
The dimensionality of the problem is not the only reason why approaches other than ensemble-based are not used to estimate the time evolution of the PDF of forecast states. Talking about weather prediction, Ehrendorfer (1994b) states that " … the applicability of the Liouville equation in context more realistic than considered in this work may be subject to considerable problems," and that " … an approach based on the Liouville equation is generally considered to be impracticable in the context of forecasting forecast skill."

2.2
Is there any evidence that ensemble forecasts are more valuable than single ones?
There are at least two reasons why ensemble-based, probabilistic forecasts are more valuable than single forecasts. The first reason, as mentioned above, is that they make it possible not only to predict the most likely scenario but also to estimate the probability that an alternative event, or in general any event of interest, can occur. In other words, ensembles provide users with more complete and valuable information, and one way to measure the difference in value between single and ensemble forecasts is to use the Potential Economic Value metric (PEV: Richardson, 2000). PEV is based on a simple cost-loss model, whereby a user can decide to pay an amount (cost) C to protect against a loss L, linked to a specific weather event. Forecasts' value can then be assessed by considering users with different C/L ratios, and by constructing a curve that shows the savings that users can make if they used the forecasts. Clearly, PEV is a function of the reliability and accuracy of the forecasts: a poor (either unreliable or inaccurate) ensemble will not be able to outperform a good, single forecast. Richardson (2000) and Buizza (2001) showed that the ECMWF ensemble provides more valuable forecasts than the single, ECMWF high-resolution model. Since its first use, routine PEV computations at ECMWF have continuously confirmed this result.
The second reason why ensemble-based, probabilistic forecasts are more valuable than single forecasts is that an ensemble provides forecasters with more consistent (i.e. less changeable) successive forecasts. This can be assessed if one compares consecutive ensemble forecasts, issued 24 h apart and valid for the same verification time. Results indicate that the most recent ensemble jumps less, i.e. is more consistent with respect to the one issued 24 h earlier, than the corresponding single forecasts (Zsoter et al., 2009).

WHAT CHARACTERIZES AN ENSEMBLE CONFIGURATION?
Clearly, each institution that issues ensemble-based forecasts has developed its ensemble using its model and its data-assimilation system. If we consider, for example, ECMWF, the model used by its ensembles is the same model used to generate its high-resolution forecast, and the data-assimilation system used to generate the ensemble initial conditions is the same as the one used to generate the high-resolution initial conditions. In the past 25 years, both the model and the data assimilation have been evolving, their resolutions have been increasing substantially, and these changes have clearly affected the evolution of the ensemble performance.
Hereafter, we are not going to review model and data assimilation aspects, but we are going to focus on some of the key characteristics of an ensemble configuration: a Initial perturbation strategy; b Model uncertainty simulation strategy; c Resolution, forecast length and number of members.
With the term "initial perturbation strategy" we mean the way uncertainties linked to observation quality and coverage, and to data-assimilation assumptions, are simulated in the ensemble. With model uncertainty simulation strategy, we mean the approach followed to simulate model errors, linked e.g. to physical parametrizations, and the fact that models have a finite resolution. These two aspects, together with ensemble resolution (horizontal and vertical), the forecast length and the number of members, define the key characteristics of an ensemble system and affect its accuracy and reliability.

Initial perturbation strategy
Initial-condition uncertainties arise, for example, because observations are affected by observation errors, and do not cover the whole globe with the same quality and frequency. Furthermore, the process of estimating the initial state of the system, from which a forecast is computed, is based on some statistical assumptions and approximations. There is not a unique perturbation strategy to simulate initial uncertainties. In the first version of the ECMWF ensemble (Molteni et al., 1996), initial uncertainties were simulated using singular vectors (SVs: Buizza et al., 1993;Buizza and Palmer, 1995): perturbations with the fastest growth over a finite time interval. Compared to random initial perturbations, SVs are characterized by a faster growth rate, which is very similar to the forecast error growth rate. SVs remained the only type of initial perturbation used in the ECMWF ensemble until 2008, when the ensemble of data assimilations (EDA) started being used together with SVs Isaksen et al., 2011). SVs are still an essential component of the ECMWF ensemble, and they keep providing dynamically relevant information about initial uncertainties that could have a strong, negative impact on forecast errors.
In contrast to the ECMWF ensemble, the first version of the NCEP ensemble used bred-vectors (BVs) to simulate initial uncertainties. The BV cycle (Toth and Kalnay, 1997) aims to emulate the data-assimilation cycle, and it is based on the notion that analyses generated by data assimilation will accumulate growing errors by the virtue of perturbation dynamics. This is because neutral or decaying errors detected by an assimilation scheme in the early part of the assimilation window will be reduced, and what remains of them will decay due to the dynamics of such perturbations by the end of the assimilation window. In contrast, even if growing errors are reduced by the assimilation system, what remains of them will amplify by the end of the assimilation window.
The ECMWF and the NCEP ensembles were followed, in 1995, by the Canadian ensemble, which was designed to simulate a wider range of error sources, linked to initial uncertainties due to observation errors and data assimilation assumptions, and was also linked to model uncertainties (Houtekamer et al., 1996). Because of this, the Canadian initial perturbation strategy could take into account uncertainties linked to observations' quality and coverage, and initial condition errors linked to model uncertainties.
In rather general terms, the initial perturbation strategies used by all the other ensembles operational today (see Table 1 and the references reported in section 3.3 for more details regarding the global medium-range ensembles) have been inspired by these three approaches, although their detailed implementation differs from them, and includes upgrades and changes. The reader is referred to the references included in section 3.3 for more details. Key characteristics of the seven global, medium-range ensembles operational at the time of writing (June 2018), listed in alphabetic order (column 1): initial uncertainty method (column 2), model uncertainty simulation (Y/N, column 3), truncation and approximate horizontal resolution (column 4), number of vertical levels and top of the atmosphere in hPa (column 5), forecast length in days (column 6), number of members for each run (column 7), and number of runs per day (column 8). The ECMWF ensemble is also run up to 6 days at 0600 and 1800 UTC, see Table 2 Centre Initial unc. method (area) Model unc.

Model uncertainty strategy
Model uncertainties arise because the models that we use to generate weather forecasts are imperfect, simulate only certain physical processes on a finite mesh, and do not resolve all the scales and phenomena that occur in the real world.
The Canadian ensemble implemented in 1995 was the first one to include also a simulation of model uncertainties. Following the Canadian example, the simulation of model uncertainties was introduced in the ECMWF ensemble in 1999, using a stochastic approach to simulate the effect of model errors linked to the physical parametrization schemes (Buizza et al., 1999). This was the first time that a stochastic term was introduced in numerical weather prediction.
At present, four main approaches are followed in ensemble prediction to represent model uncertainties (see, e.g., Palmer et al. (2009) and Buizza (2014) for a review): • A multi-model approach, where different models are used in each of the ensemble members; models can differ entirely or only in some components (e.g. in the convection scheme); • A perturbed parameter approach, where all ensemble integrations are made with the same model but with different parameters defining the settings of the model components; one example is the Canadian ensemble (Houtekamer et al., 1996) The design of model uncertainty schemes to be used in ensembles is a key area of active research in many institutions (see, e.g., Raynaud et al., 2012;Piccolo and Cullen, 2016;Leutbecher et al., 2017;Lock et al., 2019).

Resolution, forecast length and number of members
Resolution, both horizontal and vertical, forecast length and the number of members are three further key characteristics of an ensemble configuration. They are also key cost drivers. Theoretical work done in the 1970s and 1980s suggested that one needs at least about 10 members to be able to have a good ensemble-mean forecast (Leith, 1974), i.e. to have enough members to filter out the unpredictable scales. Follow-on works indicated that further increasing the ensemble size led to improved performance (Buizza and Palmer, 1998). Today, most of the operational ensemble forecast systems have between 20 and 50 members (Leutbecher, 2018), with ensembles used to estimate analysis uncertainties having a size going up to more than 250 members (Houtekamer et al., 2018).
Considering that we need to generate forecasts in a reasonable amount of time (say about 1 h) and that we have a finite amount of computing resources, compromises must be taken when an ensemble configuration is defined. Ideally, we would like to have high resolution in physical and probabilistic space, i.e. a very high spatial resolution and as many members as possible. Furthermore, we would also like to extend the forecast length as long as skilful forecasts for at least certain spatio-temporal scales can be issued, i.e. up to the time for which forecasts outperform a climatological forecast (for a discussion of the forecast skill horizon, the reader is referred to Buizza and Leutbecher (2015)). At ECMWF, for example, to be able to serve as many users' demands as possible (in terms of forecast quality and resolution), we decided to compromise on the design of the ensembles, and use three different resolutions to generate forecasts valid for the medium-range, the monthly and the seasonal time-scales. By using a variable resolution approach (Buizza et al., 2007;Vitart et al., 2008) and decreasing the resolution past the medium-range, we had been able to afford to generate operationally ensemble forecasts with a longer forecast length, with a minimum impact on the ensemble performance. By further reducing the resolution, we had been able also to generate operationally seasonal forecasts (Molteni et al., 2011). Results so far have suggested that this strategy, to "remove" scales in the model integration as they lose predictability (i.e. as forecast skill on these scales is lost) and as their impact on the slightly larger scales is reduced, is a very effective way to use computing resources to generate operational forecasts. Table 1 shows the key characteristics of the seven ensemble prediction systems operational at the time of writing (June 2018) that provide global forecasts valid for the medium-and the extended-range (say for between 7 and 15 days). They are generated at the following institutions: Together, these seven institutions produce every day more than 500 medium-range forecasts, with a horizontal resolution that ranges from about 16 to about 120 km. All simulate initial uncertainties, but only five also simulate model uncertainties (Buizza (2014) provides an overview of their key characteristics and performance). They all use different techniques to generate the initial perturbations: • The CMA ensemble uses bred-vectors (BV), perturbations designed to emulate the analysis cycle (Toth and Kalnay, 1993;1997 Wei et al., 2006; technique. The fact that they also use different approaches to simulate model uncertainties indicates that there is not a unique way to generate reliable and accurate probabilistic forecasts. This is not to say that ensemble performance is independent of the ensemble's design. This was, for example, highlighted by Buizza et al. (2005), who compared the performance of the three global ensembles operational in 2004 at ECMWF, MSC and NCEP. They concluded that performance depends not only on the quality of the model and the data-assimilation system, but also on the methodology followed to simulate the sources of forecast error. Park et al. (2008) also documented the large difference between the performance of the single ensembles: for the day-5 prediction of synoptic scales over the Northern Hemisphere (identified by the 500 hPa geopotential height), the difference between the best and the worst ensemble skill was about 3 days.
Forecasts from the ensembles listed above are exchanged routinely, and can be accessed in delayed mode (48 h after they have been generated) thanks to the WMO TIGGE project. TIGGE is the World Meteorological Organization THORPEX Interactive Grand Global Ensemble, which started in 2004; TIGGE data can be accessed from the ECMWF web site: https://www.ecmwf.int/en/research/ projects/tigge. TIGGE data can be used, for example, to understand the impact of using different ensemble methodologies on the ensembles' performance (see, e.g., Park et al., 2008;Hagedorn et al., 2012).

ENSEMBLES ARE USED ALSO FOR THE SUB-SEASONAL AND SEASONAL TIME-SCALES, AND TO ESTIMATE ANALYSIS UNCERTAINTIES
In the 1990s-2000s, operational very high-resolution forecasts started including also ensemble components, run with limited-area models nested in global ensembles, and ensembles started being used also to provide extended-range (monthly and seasonal) forecasts. Following the Canadian example, ensembles were also developed to estimate analysis uncertainties at ECMWF (Isaksen et al., 2011) and Météo-France (Berre et al., 2007). In this article, we are not going to discuss limited-area ensembles, because the techniques that they use to simulate initial and model uncertainties are similar to the techniques used in the medium-range ensembles, reviewed in section 3. Hereafter, we continue to discuss global ensembles for the extended range and to estimate analysis uncertainties.

Extended-range ensembles, and reforecast ensembles
Since the beginning of the 2000s, ensembles have been used to generate operational sub-seasonal (monthly: Vitart, 2004) and seasonal forecasts. Extended-range ensembles have, generally speaking, a coarser resolution then the medium-range ensembles, to limit the production costs (see, e.g., Table 2 for ECMWF). Compared to the medium-range ensembles, most of them also include a dynamical ocean model, to be able to simulate better the propagation of coupled ocean-atmosphere phenomena, like the organized convection associated with the Madden-Julian Oscillation.
Using ensembles is even more essential for the extended time range, to be able to extract reliable and accurate signals. It is worth mentioning that there is increasing evidence to suggest that for this time range the number of ensemble members should be higher than for the medium-range ensembles, say about 100-200, compared to about 25-50 for the medium-range (T. Stockdale, personal communication, 2018).
A key ingredient of extended-range ensembles is that they rely on reforecast ensembles (Hamill et al., 2006;Hagedorn et al., 2008), i.e. on ensemble forecasts generated with the operational ensemble configuration but for a large number of past cases. Extracting predictable signals for the extended range has benefitted from the use of ensembles of reforecasts, since they allow us to estimate in a better way the ensemble error characteristics (e.g. model biases, or whether they tend to under-or over-estimate certain phenomena: Hagedorn et al., 2012). At ECMWF, for example, the reforecast suite of the medium-range/monthly time-scale covers the past 20 years (an 11-member ensemble is run twice a week, for each week of the past 20 years), and the reforecast suite of the seasonal ensemble covers the past 30 years (a 15-member ensemble is run once a month, for the past 30 years). Reforecast ensembles are essential also to be able to have a statistically significant estimate of the skill of monthly and seasonal forecasts, to understand the predictability of different phenomena and how to extract the predictable signals from the forecasts. The need of having an ensemble of reforecasts makes the cost (in terms of computer power, data handling and data storage) of the monthly and seasonal ensembles very high, and this is one of the key reasons why they are characterized by a coarser resolution (Table 2).

Ensembles of analyses
Since its inception in 1995 (Houtekamer et al., 1996), the Canadian ensemble included an ensemble of analyses, generated using an ensemble Kalman filter (EnKF). The initial conditions of the ensemble forecasts were defined by one of the members of the EnKF. The EnKF has been providing MSC Canada with information about uncertainties in the analysis. ECMWF Isaksen et al., 2011) and Météo-France (Berre et al., 2007) started producing an Ensemble of Data Assimilations in 2008.
The ECMWF EDA is based on an ensemble of N separate data assimilation procedures (where N is 25 at the time of writing, and is planned to increase to 50 in 2019), each using perturbed observations and a model uncertainty scheme. Observations are perturbed to simulate observation errors, linked to the instruments' characteristics and to their representativeness. As for the case of ensemble forecasts, model uncertainties are simulated in the ensembles of data assimilations to take into account the fact that the models used to define the analysis (i.e. the forecast initial conditions) are not perfect. Table 3 lists the key characteristics of the EDA used at ECMWF. As mentioned above, since 2008 the ECMWF EDA is used in combination with the SVs to define the initial conditions of the medium-range/monthly ensemble. The addition of EDA-based perturbations has had a major impact on the ensemble reliability and accuracy in the short forecast range over the extratropics, and for the whole forecast range over the Tropics .

A LOOK INTO THE FUTURE
Looking into the future, three main trends can be detected in the way ensembles are planned to be upgraded: • A move towards an Earth-system approach to modelling and assimilation; • A move towards a seamless approach in the design of the analysis, medium-range, sub-seasonal and seasonal ensembles; • A move towards higher resolution.
The first trend towards an Earth-system approach is linked to results obtained in the past two decades that showed that by adding relevant processes we can improve the quality of the existing forecasts, and further extend the forecast skill horizon at which dynamical forecasts lose their value. Buizza and Leutbecher (2015), for example, looked at the evolution of the skill of the ECMWF ensemble from 1994 to date, and concluded that "Forecast skill horizons beyond 2 weeks are now achievable thanks to major advances in numerical weather prediction. More specifically, they are made possible by the synergies of better and more complete models, which include more accurate simulation of relevant 2 Key characteristics in June 2018 of the ECMWF ensembles of forecasts for the limited-area community (row 1), the medium-range (row 2), the sub-seasonal range (row 3) and the seasonal range (row 4), in terms of resolution (truncation and resolution in km; column 2), number of vertical levels and top of the atmosphere (column 3), forecast length (column 4), number of members (column 5), and frequency (column 6). ENS for the monthly extension starts from the ENS medium-range forecasts. SEAS5 forecasts are extended to 13 months every quarter (on the 1st of February, May, August, November). ENS reforecasts include 11 members, run twice a week, for the most recent 20 years. SEAS5 reforecasts include 25 members, run once per month for 36 years     physical processes (e.g. the coupling to a dynamical ocean and ocean waves), improved data-assimilation methods that allowed a more accurate estimation of the initial conditions, and advances in ensemble techniques." The second trend towards a seamless approach comes partly from scientific reasons and partly for technical reasons. From the scientific point of view, for example, there is evidence that processes that were thought to be relevant mainly for the extended range are also relevant for the short range. An example comes from the introduction of a dynamical ocean in the ECMWF ensembles. At ECMWF, for example, we started using a coupled ocean-land-atmosphere model for the seasonal and the monthly time-scales, and we introduced it in the medium-range ensemble only later, when we realized that it could contribute to improving its reliability and accuracy. From the technical point of view, having an integrated approach whereby the same model is used in analysis and prediction mode, from day 0 to year 1, simplifies production, maintenance and the implementation of upgrades. Furthermore, it helps the diagnostics and evaluation of a model version, since tests carried out over different time-scales can help identify undesirable behaviours that could lead to forecast errors.
The third trend towards higher resolution comes from the need to resolve better the smaller scales, and their interaction with the slightly-less-small scales, and so on. All scales are relevant in weather prediction, and errors propagate from the smallest to the larger scales. For example, at ECMWF improvements in the treatment of convection in the Tropics led to improvements in the development and propagation of organized convection (Bechtold et al., 2014), which had positive impact on the skill of monthly forecasts over Europe (Vitart, 2013).
If we consider the current ensembles, we should not forget that even if they use resolutions of 18-120 km (see Table 1), they are able to resolve in a realistic way only scales that are about 5-6 times their resolution. This is because the scales closer to the model grid spacing are not simulated in an accurate way (e.g. to avoid numerical instabilities, strong diffusion operators are used, which makes the energy spectra at the finest scales far from the observed spectra). This means, for example, that frontal dynamics is still poorly resolved, and this can have strong impact on the prediction of synoptic-scale features in the medium-range, and of low-frequency variability (e.g. the North Atlantic Oscillation, or European blocking) in the monthly time-scale.
Thus, at the time of writing (June 2018) the ECMWF global ensemble has an effective resolution of about 100 km up to forecast day 15 (although its grid spacing is 18 km), and the highest-resolution limited-area ensemble run, e.g. at Météo-France, has an effective resolution of about 15 km (although its grid spacing is about 3 km). If we want to be able to predict phenomena such as intense wind gusts or extreme precipitation linked to convective events, it is essential that we aim to further increase the models' resolution. At ECMWF, the 2016-2025 Strategy talks about aiming to increase the ensemble resolution to about 5 km by 2025, with limited-area models expected to run at least with a 1 km resolution by that time. These plans, as discussed above, should not forget the need to increase also resolution in probability space, if we want to provide users with more reliable forecasts also for very rare events, and this can only be achieved by increasing the ensemble size.
Another area where progress should be expected is in the definition of the ensemble initial conditions. Ensembles of analyses and forecasts should be linked closer together, to improve the consistency in the simulation of initial uncertainties. In terms of modelling, physical processes that are not yet included in the models but are relevant for weather prediction should be included to make the forecasts more and more realistic. At ECMWF, for example, as part of our strategy to move towards Earth-system models and assimilation systems, we are investigating the potential role of including an interactive aerosol in our ensemble (A. Benedetti and F. Vitart, personal communication, 2018). Model error schemes should be redesigned to simulate better the model uncertainties linked to each process, and to the numerical schemes used in the numerical integrations. Coupled models, which are now used in ensemble forecasts but not yet in ensembles of data assimilations, should be used in both to provide better and more consistent initial conditions for the coupled ensembles. Similarly, coupled initial perturbations (or, even better, ensembles of coupled initial conditions) should be used to initialize the coupled ensembles.
The ECMWF 2016-2025 Strategy, adopted by the Council in December 2015 (ECMWF, 2015) indeed talks about building an Earth-system model that includes all relevant processes (" … develop an integrated global model of the Earth system to produce forecasts with increasing fidelity on time ranges up to one year ahead"). It talks also about adopting an ensemble approach for all time-scales ("Operational ensemble-based analyses and predictions that describe the range of possible scenarios and their likelihood of occurrence and that raise the international bar for quality and operational reliability").
This last point raises the question of whether, as part of the ECMWF operational forecasts, ECMWF should continue to generate one high-resolution analysis and one high-resolution forecast. So far, experimental tests have indicated that re-centring the ECMWF ensemble on the ECMWF high-resolution analysis leads to better ensemble forecasts. Results have also indicated that adding the high-resolution forecast to the 51 ensemble members into a mixed-resolution, 52-member ensemble leads to better short-range forecasts. As part of the work that we are performing to implement the ECMWF 2016-2025 Strategy, we have started to investigate further the performance of a mixed-resolution ensemble, which could include N members at a lower resolution and M (with M > 1, but with M < N or even M < <N) members at higher resolution (M. Leutbecher and F. Vitart, personal communication, 2018). Advances in post-processing and calibration techniques (Vannitsem et al., 2018) should make it possible to merge ensembles with different resolutions.

CONCLUSIONS
In this article, we have briefly reviewed how ensembles have become an essential component of the operation suites of most of the meteorological centres. The main characteristics of the seven medium-range global ensembles operational at the time of writing (June 2018) have been illustrated. The establishment of ensemble methods as key tools not only for estimating forecast uncertainty at all forecast ranges but also for estimating analysis uncertainties has been discussed. Finally, some key trends that are being followed by many operational centres to improve further their performance have been highlighted.
In the articles included in this Special Issue, the reader can find more detailed discussions of some of the aspects that we discussed above: historical perspectives (Palmer, 2018, Kalnay, 2019, ensemble size, initial condition and model error representation, error growth and scaledependent predictability, diagnostics of ensemble-based probabilistic forecasts and applications of ensemble forecasts. On the issue of ensemble size, it is not clear whether the current size of about 20-to-50 is large enough to provide reliable probabilistic forecasts, especially of the tails of the distributions, which includes the events that users with low cost/loss ratios would be more interested in (Leutbecher, 2018). Results from Canada, for example, indicate that having a large enough ensemble size is very important if one wants to provide a reliable analysis error estimate. Indeed, the Canadian Ensemble Kalman Filter has 256 members, while their ensemble forecast system has only 21 (Houtekamer et al., 2018).
Together with ensemble size, resolution is a key cost-driver in ensemble prediction. Although some users would want to trade resolution for ensemble size, care should be taken in doing so. Resolution is key if one wants to predict phenomena characterized or influenced by small scales, such as extreme wind storms or convection events. Frogner et al. (2019) and Weyn and Durran (2018) discuss some of the issues and challenges linked with the design of very high-resolution and convective-permitting ensembles. Designing ensemble systems capable of providing skilful information for these scales is very challenging, partly because initializing the small scales is very difficult.
Investigating the predictability of different phenomena such as the North Atlantic Oscillation (Weisheimer et al., 2018) and understanding forecast error growth, and more generally the scale dependency of the forecast skill horizon (Hoskins, 2013;Buizza and Leutbecher, 2015) are also key to help us to advance the design of future ensembles. Moving towards more complete Earth-systems can help to extract predictable signals; the ocean is an example of an Earth-system component that has been added to operational global ensembles in the past years, and interactive aerosol and the inclusion of some atmospheric gases (e.g. ozone, CO 2 , other greenhouse gases) might be the next components that could help further extend the forecast skill horizon. In this Special Issue, Zanna et al. (2018) discusses the state of the art in ocean modelling and recent advances in the design of ocean ensembles, and Xian et al. (2019) raises issues linked to the prediction of aerosol. Palmer (2018) also provides some very interesting views on how ensemble prediction will evolve in the next 25 years.
Two other key topics discussed in this Special Issue are ensemble verification, and how best to communicate uncertainty to the users. Siegert et al. (2018) illustrates how complex the verification of ensemble performance is. He discusses the sensitivity of one score that is used to assess the performance of ensemble systems, the ignorance score, shows that it depends on the ensemble size, and proposes to apply an ensemble-adjustment. On the second topic, Fundel et al. (2019) discusses the difficulty to communicate probabilities to lay audiences and the user's prevalent resistance to probability forecasts, and reports the experience of the German weather service (DWD) in introducing probabilistic forecast products to three exemplary user groups.
Let me conclude by saying that, in the past 25 years, ensembles have demonstrated to be extremely valuable and essential sources of information. In weather prediction, ensemble performance has been improving by about 2 days per decade in the medium-range (Buizza and Leutbecher, 2015), and even by up to 1 week per decade for the monthly time-scale for large-scale phenomena such as the Madden-Julian Oscillation (Vitart, 2013;Vitart et al., 2014). Their performance will continue to improve, provided that we can advance in the areas discussed above (modelling, including model error simulation, data assimilation and ensemble initialisation, ensemble design and membership), and their use will continue to increase, provided that we can help users to take decisions using probabilistic information.
This latter one is, in my view, another very hard challenge that we are facing. One possible way forward in this area is to interact more with experts working in other fields where ensembles are used to estimate a range of possible outcomes, such as the insurance sector and economics (see, e.g., Ravazzolo and Vahey, 2013;Sun and Jin, 2016). By working together with them and social scientists that have been studying decision theory under severe uncertainty (see, e.g., Busemeyer et al., 1993;Comes et al., 2011), we could identify the best way to promote the use of ensemble-based, probabilistic forecasts in our field, and thus promote the use of probabilistic-based information. More work along these lines should also be promoted.