Spatial ensemble post‐processing with standardized anomalies

To post‐process ensemble predictions for a particular location, statistical methods are often used, especially in complex terrain such as the Alps. When expanded to several stations, the post‐processing has to be repeated at every station individually, thus losing information about spatial coherence and increasing computational cost. Therefore, the ensemble post‐processing is modified and applied simultaneously at multiple locations. We transform observations and predictions to standardized anomalies. Seasonal and site‐specific characteristics are eliminated by subtracting a climatological mean and dividing by the climatological standard deviation from both observations and numerical forecasts. This method allows us to forecast even at locations where no observations are available. The skill of these forecasts is comparable to forecasts post‐processed individually at every station and is even better on average.


Introduction
Numerical weather prediction (NWP) models provide spatial forecasts of different meteorological parameters. Due to limited computational power, the grid of NWP models is too coarse to resolve some small-scale processes sufficiently, especially in complex terrain such as the Alps. Therefore, parametrizations together with uncertainties in the initial state and the chaotic nature of the atmosphere lead to errors. Statistical post-processing, introduced by Glahn and Lowry (1972), is one way to correct these systematic errors.
To capture forecast uncertainties, many weather centres provide an ensemble of numerical forecasts (Lorenz, 1982;Leutbecher and Palmer, 2008), with slightly different initial conditions or parametrizations. Since ensemble forecasts do not usually capture all error sources, they should also be postprocessed with statistical models, as in Gneiting et al. (2005) or Raftery and Gneiting (2005).
However, these statistical models are usually fitted separately for separate locations for which a set of historical observations and NWP forecasts is available. Several techniques exist to overcome this limitation and to produce statistical forecasts for an arbitrary point in a region. Statistical forecasts from the observation sites can be interpolated to arbitrary locations (Hacker and Rife, 2007;Glahn et al., 2009), observations can be interpolated on to the NWP grid to post-process every grid point (Schefzik et al., 2013) or statistical forecasts representative of a whole region (Scheuerer and Büermann, 2014;Scheuerer and König, 2014) can be produced. The latter forecasts cover several stations simultaneously by using anomalies. The subtraction of a climatological mean from observations and NWP forecasts removes site-specific characteristics, so that multiple stations can be forecast with a single model. Additionally, if a spatial climatology of the observations exists, even points in between stations can be predicted.
We modify this method slightly and standardize these anomalies by dividing the difference from a mean value by the standard deviation (Wilks, 2011). Thus, different variabilities of different stations are also considered. Furthermore, we fit full seasonal climatologies instead of monthly means, to remove season-specific characteristics as well. Therefore, all data can be used for training, so that the post-processing model does not have to be refitted every day.
As a result, this method has three important advantages: all available data can be used instead of only a subset, all stations are forecast simultaneously and forecasts are available for locations without observations. The rest of the article is structured as follows. The data are described in the next section, followed by the method. Afterwards, the different post-processing models are described and compared. At the end is the conclusion and discussion. All work has been performed with the statistical software R (R Core Team, 2016).

Data
Statistical models will be applied to the province of South Tyrol in northern Italy. South Tyrol is an alpine region with a large variation in altitude. 39 stations distributed over altitudes from 208-1900 m above mean sea level (amsl) provide temporal observations ( Figure 1). The numerical ensemble forecasts are produced by the European Centre for Medium-Range Weather Forecasts (ECMWF).
The forecast variable is the temperature at 2 m above ground (T 2m ) and the input variables for the statistical models are T 2m ensemble mean (m) and standard deviation (s) of the 51 members interpolated bilinearly to the station locations. We use a lead time of 18 h for an ensemble forecast initialized at 0000 UTC for the period 1 February 2010-30 June 2015.
To calculate spatial climatologies, a digital elevation model with a spatial resolution of 90 m from the Shuttle Radar Topography Mission (SRTM: Reuter et al., 2007) is used.

Non-homogeneous Gaussian regression
To post-process ensemble forecasts, Gneiting et al. (2005) introduced non-homogeneous Gaussian regression (NGR). In NGR, a variable y is assumed to follow a normal distribution with mean μ and standard deviation σ . However, σ is no longer constant, as in the seminal work of Glahn and Lowry (1972), but depends on the ensemble spread: with b 0 , b 1 , c 0 , c 1 as regression coefficients. In this case, μ depends on the ensemble mean and log(σ ) on the logarithm of the ensemble standard deviation. The logarithmic link function in Eq.
(3) is used to ensure positive values for σ . The coefficients b 0 , b 1 , c 0 and c 1 are estimated with maximum-likelihood estimation as implemented in the R package 'crch' of Messner et al. (2016).

Standardized anomalies
Usually, NGR has to be repeated for every station and season individually, since different measurement sites have different temperature ranges and different annual cycles, as shown in the top panel of Figure 2. To overcome this problem, we use standardized anomalies, which remove seasonal and site-specific characteristics. In addition to subtracting the climatological mean as in Scheuerer and Büermann (2014), we also divide by the climatological standard deviation to account for differences in the climatological variance: where μ y is the mean value of the observations (thick line in Figure 2) and σ y is its standard deviation (thin lines in Figure 2). Standardized anomalies of ensemble mean and ensemble standard deviation are computed similarly, with m and log(s) respectively ( m = (m − μ m )/σ m and log( s) = (log(s) − μ log(s) )/σ log(s) ) instead of y.
The central idea behind these standardized anomalies is presented in the bottom panel of Figure 2. The standardized anomalies have no annual cycle and are centred around zero for both stations, which enables simultaneous forecasts for all stations and even locations without observations.

Standardized anomaly model output statistics (SAMOS)
When NGR is reformulated with standardized anomalies, seasonal and site-specific characteristics are largely removed and only a single model needs to be fitted for all stations and seasons, instead of fitting models for each location and date individually. We call this SAMOS. Therefore y, m and s in Eqs (1)-(3) are replaced by y, m and s.
To obtain temperature forecasts (μ) and the corresponding standard deviation (σ ), Eqs (2) and (3) have to be restructured with Eq. (4) toμ Assuming that the forecast skill is similar at all stations and any location in the region, the model coefficients (b 0 , b 1 , c 0 , c 1 ) are representative for the whole region. * Its spatial resolution depends only on the spatial resolution of the underlying climatology.

Climatology
There are several ways to calculate a spatial climatology. It could be produced at every observation site and then distributed on to the grid with kriging (Scheuerer and Büermann, 2014). A climatology with non-Euclidean distances as in Frei (2014) could be computed, or a spatial climatology for all stations could be fitted at once with a generalized additive model (GAM: Aalto et al., 2013). The approach in this article is to work with an extension of the latter: generalized additive models for location, scale and shape (GAMLSS: Rigby and Stasinopoulos, 2005;Stasinopoulos and Rigby, 2007). The advantage of GAMLSS is that a climatological mean (μ) and a variable standard deviation (σ ) can be fitted simultaneously, similarly to NGR, but with the possibility of nonlinear effects. With GAMLSS, the climatology and the standard deviation can be fitted for all stations with where ξ can be y, m or log(s). The nonlinear model for μ ξ is then with f 1 as spatial smoothing functions of the horizontal coordinates and f 2 and f 3 as smooth, cyclic functions over the day of the year (yd). The influence of the altitude (alt) on the climatology is captured as a linear effect and in its interaction with the seasonality. Therefore, it is possible that the altitude effect can vary over the year and the seasonal effect can vary with altitude. This interaction is important in complex terrain (such as South Tyrol) for capturing differences in seasonality between valley and mountain stations. A similar equation to that for μ ξ is also used for log(σ ξ ) with the same terms as in Eq. (8).
For the observation climatology, station observations are used as input data, while the climatologies of the ensemble forecasts use past forecasts on the ECMWF forecast grid. Thin-plate splines are used as a spatial smoothing function with a degree of freedom of 30 for y and 20 for m and log(s). For m and log(s), the degree of freedom is smaller, due to the coarse model grid of the ensemble forecasts (e.g. Figure 1). The climatology is calculated for log(s) instead of s to ensure positive values with the logarithmic link in Eq. (3). Analyses of climatologies showed that the degrees of freedom do not have a huge impact on the forecasts, especially for stations not contained in the training data, as long as the degrees of freedom are not close to the maximum possible, which is either the number of stations or the number of grid points. Therefore, the degrees of freedom are set to three quarters of the maximum number as trade-off between forecast performance and computational cost. The seasonal smooth functions have harmonic terms at annual and biannual frequencies (i.e. four basis functions sin(2 π yd / 365), cos(2 π yd / 365), sin(4 π yd / 365) and cos(4 π yd / 365)).
The different effects for μ y are illustrated in Figure 3, where the seasonal effect on the left shows that μ y has higher values in summer than in winter. Additionally, the different lines illustrate that the seasonal effect in valleys (brighter lines) has a higher amplitude than at higher located stations (darker lines). The altitude effect in the middle shows that valley stations have a higher mean temperature than mountain stations. The lines indicate that the difference between stations in the valleys and on the mountains is smaller in the winter (brighter lines) than in spring (darker lines). On the right hand side of Figure 3, the spatial effect is plotted, where the mean temperature in the southwest is warmer than in the northeast. With all these effects, the climatology of any day and location inside the region of South Tyrol can be provided. These three effects are combined in Figure 4(b). The effects for the ensemble mean and ensemble standard deviation (neither is shown) are much smoother, due to the smooth model topography.

Forecast example
Spatial forecasts can be produced with these climatologies and Eqs (5) and (6). Figure 4 shows an example for 1 July 2015. The difference between the climatologies of the ensemble mean and observations is large (top panel). Whereas the observation mean (μ y ) is strongly influenced by the altitude, with mapped valleys and ridges, the model mean (μ m ) only shows a smooth temperature gradient, since there are no valleys in the ECMWF model topography. The same result is shown by the climatological standard deviation (middle panel). Whereas σ y maps valleys with   an additional slightly higher standard deviation in the northeast, σ m only illustrates a smooth increase of standard deviation from south to north. The bottom panel in Figure 4 exemplifies that a raw ensemble forecast does not produce realistic temperature forecasts in complex terrain, whereas the SAMOS forecast captures spatial and altitude effects.

Different models
To evaluate the performance of SAMOS, several models are compared. The models are chosen to evaluate the different parts of the SAMOS model listed in Table 1: different anomalies, climatologies, uncertainties, and the calculation of NGR. The climatologies can be either a mean value over the last 30 days at every station individually, over the full time series with GAMLSS at every station individually or at all stations simultaneously. Further variances are two-step models to capture the local forecast uncertainty (Scheuerer and Büermann, 2014), different lengths of training data, and separate or simultaneous calculations of the NGR. All models can only forecast at stations that are used for training, except SAMOS. Being able to forecast at locations not contained in the training data is one of the main advantages of SAMOS.

Ensemble model output statistics (EMOS)
As a reference model, an NGR is calculated on the direct observations and ensemble forecasts at every station individually, which is known as 'ensemble model output statistics' (EMOS: Gneiting et al., 2005). 30 days prior to the forecast day are used to fit the model. A forecast for the next day is then produced with the fitted coefficients. Consequently, to obtain forecasts for every day of the time series, a new model has to be fitted for each day. While this approach requires high computational effort, it has the advantage that almost no historical data are needed.

Ensemble model output statistics with anomalies (AMOS)
Next, difference anomalies ( y = y − μ y ) replace direct values in the NGR (AMOS). Therefore, a GAMLSS climatology is calculated for every station individually, similarly to Eq. (8) but without spatial and altitude effects (second, third and fifth term). Differently from EMOS, all stations are forecast with one single model and local forecast uncertainties are not considered. To capture different local forecast uncertainties, a two-step (AMOS 2 step) model approach is used. Therefore, first an ordinary linear regression is fitted and the site-specific mean of the squared residuals is used as additional predictor variable for σ (i.e. add its logarithm to right side of Eq. (3)). This two-step model is similar to the two-step model in Scheuerer and Büermann (2014).

SAMOS 30 days
SAMOS 30 days has standardized anomalies and one NGR at all stations simultaneously, but with a climatology over the previous 30 days instead of the full time series. As climatology, an average MAE / CRPS -Skill score (%) of the last 30 days preceding the forecast day for every station individually is calculated. Afterwards, one NGR is applied at all stations simultaneously, with the last 30 days as training data. This model is similar to the method in Scheuerer and Büermann (2014), except for anomalies beeing standardized.

SAMOS stationwise
The second SAMOS uses a stationwise climatology as in AMOS, but with standardized anomalies. Additionally, separate NGR models are fitted for each station, but with the full time series as training data.

SAMOS stationwise-simultaneous
SAMOS stationwise-simultaneous has the same climatologies as SAMOS stationwise for every station individually and the full time series, but the NGR is calculated at all stations simultaneously, whereas all previous models forecast every station individually. This model was tested to see whether there is a considerable difference between calculating the climatology at all stations individually and simultaneously (SAMOS). Only fitting one climatology at all stations simultaneously enables us to forecast in between stations. Additionally, a two-step model (SAMOS stationwisesimultaneous two-step) to capture the local uncertainty is tested, similarly to Scheuerer and Büermann (2014). This model shows whether these standardized anomalies capture all the local uncertainty information that is captured by the two-step model in Scheuerer and Büermann (2014).

SAMOS
The climatologies and the NGR are fitted at all stations simultaneously for this model. Its advantage compared with the stationwise models is that locations without observations can also be forecast directly, without any additional interpolation. Errors can be calculated at station points ('same-station') and in between ('new-station'). For the same-station error, all available stations are used to fit the climatologies and the NGR. Afterwards, forecasts for the same stations are obtained with these fits. For the new-station forecasts, a leave-one-out error is calculated. The climatology and the NGR model are fitted for 38 stations and then the forecast is obtained for the 39th station with these fits. This simulates forecasts for new stations where no historical data are available or for points in between stations. To receive a newstation error for all stations, each station has to be left out once and is then verified.

Results
The time series are separated into training and test data to evaluate the forecasts. Tenfold cross-validation is applied to all models that use the full time series. For the other models, the previous 30 days are used for training and the following day for evaluation.
To evaluate the performance of the models, the mean absolute error (MAE) is computed as a deterministic score and the continuous ranked probability score (CRPS) as a probabilistic measure for each station and model. To see the relative change to a reference model, a skill score (SS) for the MAE and the CRPS is produced:

Difference versus standardized-difference anomalies
The first results evaluate the benefit of using standardized differences instead of differences as anomalies. AMOS is the reference for the skill score. Figure 5 demonstrates that the standardized anomalies improve the forecasts by about 4% compared with the simple differences for both the MAE and the CRPS. However, these gains can be realized only by using the full climatology. For the method with the 30 days mean as climatology, almost no differences occur (not shown). Scores do not improve when using a two-step model for the uncertainty.
Consequently, all following comparisons will be limited to SAMOS variants with standardized anomalies and without the two-step model for the local uncertainty.

Different models
To compare the different methods from Table 1, the forecast error at every station is calculated. Figure 6(a) shows that SAMOS-stationwise performs best. It is on average about 4% better than forecasts with a conventional EMOS. If all stations are fitted and forecast simultaneously with SAMOS stationwise-simultaneously, the results are not significantly different.
SAMOS 30 days performs similarly to EMOS. Using all available data and a full climatology (SAMOS stationwise-simultaneously) instead of a 30 days mean not only saves computation time but also clearly improves the forecasts.
The proposed SAMOS (same-station) method performs on average slightly better than EMOS, but worse than SAMOS stationwise or SAMOS stationwise-simultaneously. The only difference between SAMOS stationwise-simultaneously and SAMOS is the climatology. Calculating a spatial climatology instead of separate station climatologies allows us to forecast directly in between stations, at the expense of somewhat reduced skill.

New stations
The reference model EMOS can only forecast where observations are available for training. In contrast, SAMOS allows us to forecast at any location in the region. When SAMOS is tested on new stations (Figure 6(b)), the forecasts are nearly as good as the EMOS forecasts for stations on which they were trained. Errors close to the border can be larger, especially without other stations nearby (circles in Figure 1). These four stations are left out of the verification. Additionally, Probability Integral Transform (PIT) histograms are shown in Figure 7 to evaluate the calibration of forecasts at new stations graphically. A perfectly calibrated PIT histogram should be uniform. The raw NWP ensemble (Figure 7(a)) is underdispersive. The verification is too often outside the forecast range, which is visible in the high density of the extremes. EMOS forecasts (Figure 7(b)) are better calibrated but still slightly underdispersive, as also shown by Möller and Groß (2016). The SAMOS same-stations forecasts (Figure 7(c)) are almost perfectly calibrated, with a nearly uniform PIT histogram. This is similar for the new-station (Figure 7(d)) forecasts.

Discussion and conclusion
To produce spatial forecasts, a statistical model is fitted to standardized anomalies, which remove seasonal and site-specific characteristics of the data. These standardized anomalies are the difference from a climatological mean divided by the climatological standard deviation for the observations and NWP forecasts, respectively.
Without season-specific characteristics, longer training data can be used. Using longer training data improves the forecasts significantly. Since operational NWP models are frequently updated and thus their systematic error characteristics might change, NWP reforecasts (Hagedorn et al., 2008) with the same model are needed to exploit the potential of long observational time series fully.
A large practical advantage is also that the coefficients from the forecast model need not be recomputed every day, as is the case in Scheuerer and Büermann (2014) and Gneiting et al. (2005). The SAMOS approach results in one simple NGR equation. Its complexity is contained in the climatologies, which must be updated only infrequently, e.g. annually.
Without site-specific characteristics, all stations can be forecast simultaneously. Furthermore, the fitted coefficients are valid for the whole region and, with a spatial climatology of the observations, even points in between stations can then be forecast. The spatial resolution of these forecasts is determined by the spatial resolution of the climatology.