Enhanced extended‐range predictability of the 2018 late‐winter Eurasian cold spell due to the stratosphere

A severe cold spell with surface temperatures reaching 10 K below its climatology hit Eurasia during late February/early March 2018. This cold spell was associated with a Scandinavian blocking pattern followed by an extreme negative North Atlantic Oscillation (NAO) phase. Here we explore the predictability of this cold spell/NAO event using ensemble forecasts from the Subseasonal‐to‐Seasonal (S2S) archive of the European Centre for Medium‐Range Weather Forecasts. We find that this event was predicted with the observed strength roughly 10 days in advance. However, the probability of the cold spell occurring doubled up to 25 days in advance, when a sudden stratospheric warming (SSW) occurred. Our results indicate that the amplitude of the cold spell was increased by a regime shift to the negative NAO phase at the end of February, which was likely favoured by the SSW. We quantify the contribution of the SSW to the enhanced extended‐range forecast skill for this particular event by running forecast ensembles in which the evolution of the stratosphere is nudged to (a) the observed evolution, and (b) a time‐invariant state. In the experiment with nudging to the observed stratospheric evolution, the probability of a strong cold spell occurring is enhanced to 45%, while it is at its climatological value of 5% when the stratosphere is nudged to a time‐invariant state. These results showing enhanced predictability of surface extremes following SSWs extend previous observational evidence, which is mostly based on composite analyses, to a single event. Our results suggest that it is the subsequent evolution throughout the lower stratosphere following the SSW, rather than the occurrence of the SSW itself, that is crucial in coupling to large‐scale tropospheric flow patterns. However, we caution that probabilistic gain in predictability alone is insufficient to conclude a causal link between the SSW and the cold spell event.


INTRODUCTION
The occurrence of cold extremes over Eurasia in winter and spring is typically associated with large-scale flow patterns with a strong meridional component that project onto the negative phase of the North Atlantic Oscillation (NAO) and Scandinavian blocking. The predictability of cold spells in extended-range forecasts (i.e., up to 6 weeks ahead) thus hinges on the predictive skill of the large-scale circulation pattern. It has recently been shown that predictive skill beyond 10 days of NAO transitions provides the potential for early warnings of cold spells .
While the deterministic predictability limit of the NAO is approximately 10-20 days (Scaife et al., 2014;Ferranti et al., 2015;Domeisen et al., 2018), there is statistical skill beyond that limit on subseasonal-to-seasonal time-scales due to remote forcers. For subseasonal-to-seasonal predictions of the NAO, there is strong evidence from various types of composite analyses over many events that the state of the stratosphere is one important source of enhanced probabilistic predictability (Sigmond et al., 2013;Scaife et al., 2016;Charlton-Perez et al., 2018;Hansen et al., 2019). There is robust observational evidence that stratospheric extreme events may impact the tropospheric circulation all the way down to the surface (see Baldwin and Dunkerton, 2001 and many other studies since this pioneering work). In particular, the occurrence of sudden stratospheric warming (SSW) events favours negative NAO phases that are often associated with cold extremes over Eurasia. It has been shown that during periods with a weak stratospheric polar vortex, extremely low temperatures over northern Europe are more likely (Kidston et al., 2015) and longer lasting (Garfinkel et al., 2017). These studies reinforce the hypothesis that the state of the stratospheric polar vortex may be important for the evolution, intensity and -in particular -the persistence of surface cold anomalies. Notably, past work has focused primarily on establishing statistically robust links based on composite analysis featuring several SSWs and NAO phases. In a recent overview article, Butler et al., (2019) stated that "challenges remain in arriving at a set of general unifying principles that can provide a quantitative description of the role of stratosphere-troposphere coupling on an event-by-event basis." This calls for a detailed analysis of the role of the stratosphere for single events.
It should however be kept in mind that a direct link between a negative NAO phase and the stratosphere is not always present. Using data from the European Centre for Medium-Range Weather Forecasts (ECMWF) monthly forecasting system, Jung et al. (2011) showed that the onset and persistence of the extreme negative NAO phase in winter 2009/2010 was triggered by internal tropospheric dynamics, and by the state of the stratosphere. Moreover, Riviere and Orlanski (2007) found that even individual storms can change the NAO phase.
In late February to early March 2018, a strong cold spell was observed over large parts of northern Eurasia. The cold spell was associated with a shift of the NAO towards its negative phase at the end of February (see Figure 1). Preceding the negative NAO (henceforth NAO−) phase, persistent Scandinavian blocking was observed during the second half of February (Figure 1a; see also Ayarzagüena et al., 2018;Ferranti et al., 2019). Moreover, in mid-February 2018, a major SSW was observed (as shown by the reversal of the 10-hPa zonal wind at 60 • N in Figure 1a), as the polar vortex split into two separate sub-vortices (known as a splitting event, in contrast to a displacement event). This SSW event persisted for about half a month with some intensity fluctuations (three record-breaking minima in the zonally averaged 10-hPa zonal wind could be identified on February 15, 20 and 26; see Figure 1a). Ayarzagüena et al. (2018) showed that there was a detectable downward Northern Annular Mode signal of this SSW to the troposphere in the reanalysis data.
The SSW was predicted about 10 days in advance, with some dependence on the modeling system , in agreement with the typical predictability limit of SSWs (Taguchi, 2014;Tripathi et al., 2015;Karpechko, 2018). Ferranti et al. (2019) argued that the regime shift that led to the cold spell in late February/early March 2018 was predicted with higher accuracy than average predictability. They suggested that the higher prediction skill was likely due to remote forcers, namely the SSW in mid-February or the strong Madden-Julian Oscillation (MJO) event. Also, Karpechko et al. (2018) showed that forecasts initialized on February 8 (whose ensemble members all predicted the SSW) predicted an enhanced likelihood of low temperatures over northern Eurasia in late February/early March. The evolution of the NAO phase and the cold spell following the SSW are remarkably similar to the composite mean flow evolution following SSW events. Ayarzagüena et al. (2018) described the event as being close to the "canonical" behaviour expected after SSWs. While Ayarzagüena et al. (2018) did not specifically focus on the cold spell itself, but rather on the extreme precipitation event in the Iberian Peninsula, they concluded that the cold spell was likely triggered by the SSW. Due to the similarity to the composite mean behaviour, the cold spell was also indicated purely based on statistical models (Cohen et al., 2018).
However, it should be kept in mind that neither the presence of enhanced extended-range predictability Ferranti et al., 2019) nor the fact that the event followed the composite mean evolution  (MSLP, in hPa, contours) and 2-m temperature (T2m, in K, shading) anomalies averaged over a time period from February 25 to March 6. The magenta box in (b) marks the area that was used to calculate the T2m anomaly over Eurasia, and the green box indicates the area used to calculate the Scandinavian blocking strength.
after SSWs (Ayarzagüena et al., 2018) allows for unambiguous conclusions to be drawn on a causal link between the SSW and the cold spell for individual cases such as that of February/March 2018. Arguably, an unambiguous quantification of the role of the stratosphere in extending the range of predictability of a single event is not possible with the use of ensemble forecasts alone. Nevertheless, ensemble forecasts do provide an opportunity to assess the stratospheric link in a probabilistic sense.
In this study, we aim to provide insights into the potential of extended-range predictability of Eurasian cold spells based on a case study of the 2018 event. In particular, we attempt to quantify the role of the mid-February SSW in providing extended-range predictability of the cold spell by conducting additional sensitivity simulations with a nudged stratosphere. With this perspective, two research questions are posed: 1. Did the cold spell in northern Eurasia develop independently from the NAO− phase? 2. What role did the stratosphere play in triggering the NAO event?
This article is structured as follows. In section 2, the data and methods are described. Section 3 provides a synoptic overview. In section 4, the predictability of the cold spell and the NAO in S2S forecasts is discussed. Section 5 elucidates the influence of the stratosphere on the cold spell and the NAO predictability. In section 6, the tropospheric forecast variability is discussed with a focus on Scandinavian blocking and NAO−. In the final section, we conclude the article with a discussion.

DATA AND METHODS
Subseasonal-to-Seasonal (S2S) 51-member ensemble forecasts from the ECMWF are analysed (Vitart et al., 2017). To investigate the impact of stratospheric forecast uncertainty on the northern Eurasian cold spell, we cluster members into groups with different stratospheric states. We select members that did accurately predict the central date of the SSW (within +/− 3 days) to form an "SSW" cluster, and members that failed altogether to predict an SSW event up until March 6 to form a "no SSW" cluster. In forecasts initialized after February 1, all members capture the SSW . Therefore, forecasts initialized on February 1 are used to investigate uncertainties in the stratospheric state and their impact on the troposphere. For the February 1 initialization, 18 ensemble members capture the SSW (within TA B L E 1 Description of indices and predictability measures used in this study: northern Eurasian 2-m temperature anomaly (Eurasian T2m anomaly); North Atlantic Oscillation (NAO); Scandinavian blocking strength (SB); sudden stratospheric warming (SSW) index; probability of extreme event; anomaly correlation coefficient (ACC); and Northern Annular Mode (NAM). All fields have been interpolated onto an N128 Gaussian grid from a native forecast model/reanalysis grid.

Index Description
Eurasian T2m anomaly Difference of daily 2-m temperature from climatological levels averaged over northern Eurasia (10 • W-130 • E and 50 • N-65 • N; see the magenta box in Figure 1b) as in Karpechko et al. (2018).

NAO
For each forecast ensemble member, the NAO index is computed by projecting the geopotential height anomaly at 500 hPa (Z500) over the Euro-Atlantic sector (80 • W-40 • E and 30 • -90 • N) onto a reference leading empirical orthogonal function (EOF1) (Weisheimer et al., 2017). The leading EOF1 is calculated from 5-day running mean DJFM Z500 ERA-Interim anomalies for the period 1979-2018. The observed NAO is the principal component of the EOF1. The NAO index is in units of standard deviation, where one standard deviation is from the daily DJFM NAO index from ERA-Interim for the period 1979-2018.

SB
Blocking is defined in terms of a southern and a northern Z500 gradient (Tibaldi and Molteni, 1990): A given longitude is defined to be blocked at a specific time if GHGS > 0 and if GHGN < −10 at least for one value of Δ. The strength of a blocking system is given by the GHGS value. To investigate the strength of Scandinavian blocking we average the GHGS between 15 • E and 40 • E (see the green box in Figure 1b).

SSW
Daily mean zonal-mean zonal wind at 10 hPa and 60 • N. The central date of the SSW corresponds to the SSW index transition from a positive to a negative value.
Probability of extreme event An extreme event occurs when anomalies below the fifth percentile are observed. For the T2m anomaly, the fifth percentile is taken from the 11-member 20-year hindcast climatology for each initialization date. For the NAO, the fifth percentile is taken from the daily DJFM NAO index distribution in the ERA-Interim reanalysis for the period 1979-2018.
ACC Linear (Pearson) correlation coefficient between the ensemble mean and ERA-Interim. The ACC is computed for the Eurasian T2m anomaly and Z500 anomaly over the Euro-Atlantic sector as a proxy for the NAO. Time averages for February 25 to March 6 are shown.

NAM
A daily anomaly of the zonal-mean daily mean geopotential height is first calculated by removing a daily 90-day low-pass filtered model climatology for the forecast NAM and ERA-Interim climatology for ERA-Interim NAM. An area-weighted polar-cap averaged (60-90 • N) geopotential height anomaly is then constructed. The NAM index is in units of standard deviation, where one standard deviation is from the ERA-Interim daily NAM index for the period 1979-2018.
Notes: DJFM = December, January, February, March. GHGS = geopotential height gradient (south). GHGN = geopotential height gradient (north). +/− 3 days) and 23 members completely fail to predict an SSW up until March 6. As the remaining 10 members predict the SSW at a markedly different date, these members are not included in either the "SSW" or the "no SSW" cluster.
To investigate the impact of tropospheric forecast uncertainties on the northern Eurasian cold spell, we also cluster members into groups with different tropospheric states. This clustering is done on the Scandinavian blocking strength and the NAO phase in forecasts initialized on February 15. The statistical significance between all the clusters is checked with a two-sided Student's t-test using a 95% significance level.
The tropospheric response also depends on the amplitude and duration of the lower stratospheric anomalies following an SSW central date (Hitchcock et al., 2013;Kodera et al., 2016;Runde et al., 2016;Karpechko et al., 2017;Polichtchouk et al., 2018a). Given that the subsequent stratospheric evolution following the SSW differs amongst ensemble members -despite them accurately predicting the SSW central date -it is difficult to precisely determine whether the cold spell was triggered by the stratosphere based on S2S forecasts alone. For example, not all SSWs in the ensemble extend down into the lower stratosphere/upper troposphere. To better answer this question, we perform nudged forecasts (50 members) in which vorticity, divergence and temperature fields above 70 hPa are relaxed on a 6-hr time-scale to ERA-Interim reanalysis (up to a total wave number of 21), similarly to previously published studies (Douville, 2009;Jung et al., 2010;Greatbatch et al., 2012). To minimize wave reflection, the nudging strength is gradually ramped up over six model levels. We also perform another set of nudged forecasts (50 members), where the stratosphere does not experience the SSW and remains close to its climatological state. This is achieved by perpetually nudging the stratosphere to February 1, 2018 throughout the forecast. On February 1, 2018, the polar night jet was close to its climatology (see fig. 3 in Karpechko et al., 2018). Both sets of nudged forecasts are performed for the February 1 initialization.

SYNOPTIC DEVELOPMENT OF THE BLOCKING HIGH AND THE NAO− PHASE
In this section, an overview based on reanalysis of the surface circulation and the conditions at the dynamical tropopause is given. In mid-February, cyclonic Rossby wave breaking (RWB) events are observed over the North Atlantic ( Figure 2a Figure 2b). Thus, the circulation in the Euro-Atlantic sector in mid-February is influenced by cyclonic RWB close to Greenland. On February 24, the breaking direction of the amplified Atlantic ridge is less clear (Figure 2c). However, the ridge undergoes anticyclonic wave breaking close to western Europe on February 28, pumping warm air towards Scandinavia ( Figure 2d). This anticyclonic RWB coincides with the onset of NAO− (see Figure 1). In addition, Eurasia is under the influence of a high-surface-pressure system with its centre over Scandinavia associated with atmospheric blocking (Figure 2d). The blocking system over Scandinavia can be identified with the help of the blocking index of Tibaldi and Molteni (1990) (see Figure 1a). As the blocking is only defined by positive values of the southern gradient of the 500-hPa geopotential height (i.e., the GHGS described in Table 1), there are many zero values in the time series indicating non-blocking situations (see the yellow line in Figure 1a). In early March, the persistent high-pressure system over Scandinavia weakens (Figure 2e), and at 60 • W there is a ridge tilted to the northeast ( Figure 2e). This structure of high-potential-temperature air originates from a cut-off anticyclone which merged with a low-latitude and less amplified ridge (not shown). On March 8, the wave pattern over the Atlantic is less amplified compared to the second half of February ( Figure 2f). In mid-March, the waves over the Atlantic again tend to break cyclonically (Figure 2g,h): two cyclonic RWB events occur simultaneously and both are associated with surface cyclones at 10 and 60 • W. A high frequency of cyclonic RWB events is typical for NAO− phases (Benedict et al., 2004). While cyclonic RWB events can trigger a shift of the NAO into its negative phase, NAO− conditions also favour the occurrence of cyclonic RWB. Thus, this relation can be understood as a positive feedback between the NAO pattern and RWB (Kunz et al., 2009).

HOW PREDICTABLE WAS THE COLD SPELL AND THE NAO?
First we examine the predictability of the late February/early March cold spell and the NAO− phase in S2S forecasts. Figure 3 shows boxplots of (a) the northern Eurasian cold spell and (b) the NAO as a function of the forecast initialization date (all boxes to the right of the second vertical line). It is clear from the figure that the observed strength of the cold spell and the NAO− phase is only predicted in forecasts initialized on February 19 and 22, with still considerable spread for forecasts initialized on February 15. This result is consistent with the deterministic predictability limit of the NAO (Scaife et al., 2014;Ferranti et al., 2015;Domeisen et al., 2018). However, even for the February 5 initialization, when the SSW is in the forecast, the ensemble is already beginning to show probabilistic skill, as both the ensemble mean and the median shift towards cold anomalies and the NAO− phase. This is remarkable and implies probabilistic predictability >20 days before the extreme event. Interestingly, for the February 12 initialization -that is, the SSW central date -there is a drop in skill for the T2m anomaly and an increase in spread for the NAO index. Whether this represents a random fluctuation or a more systematic deterioration of forecasts initialized at SSW central dates is at present unclear and beyond the scope of this paper to investigate.
As discussed in the introduction, the 2018 Eurasian cold spell and the NAO− phase were extreme events (approximately 2-sigma events). It is therefore instructive to examine the probability of these two extreme events (see Table 1 for definitions) in forecasts with different initialization dates. Findings are shown in Figure 4a. By definition, the probability of observing an extreme event for long lead times is <5% (assuming a Gaussian distribution; a 2-sigma threshold roughly corresponds to a 5% threshold in the probability density function). As can be seen from the figure, this is indeed the case for forecasts initialized at the end of January, and -for the NAO only -on February 1. For later initializations from February 5 onward, the probability of the occurrence of an extreme event exceeds 7%, and on February 15 it exceeds 20%, with a rapid increase thereafter. As seen in Figure 3a, there is a drop in the probability of the T2m anomaly for the February 12 initialization. It F I G U R E 2 Potential temperature on the 2-PVU surface (in K, shading) and MSLP (in hPa, contours) from ERA-Interim on specific dates in February and March, 2018 should be noted that the NAO and T2m distributions used to compute the fifth percentile are not strictly Gaussian. For example, the NAO distribution is negatively skewed such that the fifth percentile corresponds to −1.8 and the 95th percentile to 1.5 standard deviations. This explains why, for example, none of the members for the February 1 forecast fall below −2 sigma in Figure 3a, whereas some members fall below the fifth percentile in Figure 4a.
To complement the predictability analysis, Figure 4b shows the anomaly correlation coefficient (ACC; see Table 1) for the February 25 to March 6 mean T2m anomaly (in red) and the NAO (in blue) as a function of the forecast lead time. Interestingly, there are two drop-offs in the ACC: one between 1-2 weeks lead time, and another one beyond 3 weeks. 1 A possible interpretation is that the first drop-off corresponds to the natural predictability 1 We note that while statistically the skill is a monotonic function of the lead time, this is not always true for a single event. See the ECMWF severe event catalogue for examples, such as the August 2018 windstorm in Denmark or the July 2019 heat wave: https://confluence. ecmwf.int/display/FCST/Severe+Event+Catalogue.

F I G U R E 3 (a) Boxplots of the northern
Eurasian T2m anomaly (in K), averaged between February 25 and March 6, 2018, for different initialization dates of the Subseasonal-to-Seasonal (S2S) ensemble forecasts. The boxes show the lower and upper quartiles, the band inside each box is the median and the ensemble means are indicated by asterisks. The whiskers show the 10th and 90th percentiles, while the outliers are grey crosses. The boxes labelled "1:N-F1" and "1:N-O" represent the nudged forecasts initialized on February 1, where the stratosphere above 70 hPa is nudged to February 1, 2018 and to the reanalysis, respectively. The box labelled "1:ssw" shows only those members that predicted SSW (within +/− 3 days from the central date) in the February 1 forecasts and "1:no ssw" shows those that did not (18 and  limit of the NAO Domeisen et al., 2018;Ferranti et al., 2019) and the second drop-off corresponds to the time after which the SSW was predicted, and could thus be linked to the predictability gain from the stratospheric evolution. However, the coinciding time alone does not allow for the conclusion of a causal link, and we proceed to examine the role of the stratosphere for the cold spell and the NAO− predictability.

INFLUENCE OF THE STRATOSPHERE ON PREDICTABILITY
Forecasts initialized on February 1 only partially predicted the mid-February SSW, and can thus be used to asses the impact of the SSW on predictability by clustering members that did or did not predict the SSW. Figure 3 shows boxplots for the T2m anomaly and the NAO index in late February/early March for the "SSW" cluster (18 members, "1:ssw") and the "no SSW" cluster (23 members, "1:no ssw"). The distributions show a shift towards the cold anomaly and the NAO− phase in "SSW" members, although the difference between the "no SSW" and the "SSW" cluster is not statistically significant from zero by the Kolmogorov-Smirnov two-sample test or by the Student's t-test on the sample means. The large spread in both clusters shows that some "SSW" cluster members predict positive T2m anomalies and/or NAO+, and some "no SSW" cluster members predict negative T2m anomalies and/or NAO−. Therefore, the occurrence of SSW in the forecast is neither a necessary nor sufficient condition to predict anomalously low T2m and/or NAO−.  Table 1) for different initialization dates of the S2S ensemble forecasts (J = January; F = February): northern Eurasian T2m anomaly (in red), and NAO (in blue) averaged from February 25 to March 6, 2018. (b) ACC (see Table 1) for the northern Eurasian T2m anomaly (in red) and NAO (in blue) from February 26 to March 6 as a function of the forecast range (in days) of the S2S forecasts. Blue and red circles show the probability of an extreme event and ACC for nudged forecasts with observed stratospheric evolution, and triangles for nudged forecasts with February 1, 2018 stratosphere Is the spread in T2m in both clusters (SSW/no SSW) due to a spread in how well the NAO− phase is predicted? To answer this question, we further divide the coldest Eurasian anomaly members and the warmest Eurasian anomaly members in the "SSW" and "no SSW" clusters into sub-clusters. To quantify the variability within the ensemble, the differences between the cold and warm "SSW" sub-clusters and the cold and warm "no SSW" sub-clusters are illustrated in Figure 5. In the "SSW" cluster, the lowest T2m anomalies over northern Eurasia are associated with an NAO− phase, indicating that the cold spell did not develop independently from the NAO (Figure 5a). In the "no SSW" cluster, the northern Eurasia cold spell is associated with a blocking anticyclone over the Ural Mountains with its centre over Scandinavia (Figure 5b). Moreover, the area in Europe covered by the surface cold anomaly is larger than in members with SSW. Thus, in members without the mid-February SSW but with a strong northern Eurasia cold spell, the cold spell did not evolve from an NAO− circulation pattern. For the members with SSW, the observed tropospheric circulation pattern (NAO−) is found for members that predicted the cold spell, leading to the hypothesis that the SSW triggered the observed regime change from a Scandinavian blocking pattern (dominating the second half of February) to a shift towards NAO− (dominating the first half of March) (Ferranti et al., 2019). To test whether the coldest Eurasian anomaly "SSW" sub-cluster has a higher forecast skill than the coldest Eurasian anomaly "no SSW" sub-cluster, we calculate the T2m ACC for the two sub-clusters. The T2m ACC for the coldest "SSW" sub-cluster is 0.67 and for the coldest "no SSW" sub-cluster it is 0.71. The similarity of the ACC values for the two sub-clusters suggests that the forecast skill is not enhanced by the occurrence of the SSW.
In addition to clustering by occurrence of SSW, we also perform a clustering of all ensemble members by the amplitude of the T2m anomaly over northern Eurasia between February 25 and March 6; that is, we group the ensemble members that captured the cold anomaly and those that predicted a warm anomaly. We then calculate the percentage of members that predicted the SSW in the "cold cluster" and in the "warm cluster." While the proportion of SSW members in the "cold cluster" (58%) is marginally larger than in the "warm cluster" (52%), this is not sufficient to conclude that the cold spell could be related to the occurrence of SSW. Instead, this result is consistent with Figure 5b, as well as the above-discussed observation that the occurrence of SSW in the forecast is neither a necessary nor sufficient condition to predict anomalously low T2m.
The stratospheric evolution following the central date is deemed to be important for the tropospheric response (Hitchcock et al., 2013;Kodera et al., 2016;Runde et al., 2016;Karpechko et al., 2017;Polichtchouk et al., 2018a). Therefore it is hard to quantify the role of the stratosphere in the cold spell and/or the NAO predictability in the February 1 forecasts, as there is a large spread in the subsequent stratospheric evolution in the 18 ensemble members that do capture the SSW. Moreover, most members that did not capture the SSW still experienced a deceleration of the polar night jet (not shown), which may still trigger downward coupling (Martineau and Son, 2015).
The failure of the "SSW" cluster to predict lower stratospheric evolution following the central date can be seen in Figure 6, which shows a time series of the 100-hPa Northern Annular Mode index (NAM; see Table 1 for definition). By comparing the NAM index in ERA-Interim F I G U R E 5 (a) MSLP and T2m differences between the mean of members with the coldest anomaly over Eurasia (below the 25th percentile, five members) and the mean of members with the warmest anomaly (above the 75th percentile, four members) averaged over a time period from February 25 to March 6 based on members with SSW from the forecast initialized on February 1. (b) The same as (a), but for members without SSW (the "warm" cluster consisting of five members and the "cold" cluster consisting of six members). Dotted areas indicate significant differences in the T2m field between the two clusters (thick solid black line) with that for the ensemble mean of the "SSW" cluster (dashed black line), it is clear that the persistence of the positive NAM index is too short and the amplitude too weak in the "SSW" members. To test whether "SSW" members that better predict the cold spell also better predict the lower-stratospheric NAM evolution, we also show cluster mean NAM evolution for the coldest Eurasian anomaly "SSW" sub-cluster and the warmest Eurasian anomaly "SSW" sub-cluster in Figure 6 (dotted blue line and dot-dashed red line, respectively). Indeed, the better agreement of the NAM index for the coldest "SSW" sub-cluster with ERA-Interim suggests that the lower-stratospheric signal is important for the surface response.
To further explore the impact of correct stratospheric evolution on the surface response, we employ the nudged forecast ensembles. The nudged experiments are initialized on February 1 and the stratosphere is nudged to (a) the observed evolution and (b) February 1. Recall that on February 1, 2018, the polar night jet was close to its climatological state. The lower-stratospheric NAM index in the nudged ensemble (a) is close to the observed (dot-dot-dashed green line in Figure 6). Note that exact agreement is not expected due to the nudging region being located above 70 hPa.
The boxplots to the left of the first vertical line in Figure 3 show the results from the nudged forecasts ("1:N-O" and "1:N-F1"). Remarkably, the "correct" stratospheric evolution leads to a significant enhancement of predictability, similarly to the February 15 initialization.

F I G U R E 6
Time series of the 100-hPa NAM index (in standard deviation) for ERA-Interim (thick solid black line), the mean of the 18 "SSW" members in the February 1 forecasts (dashed black line), the mean of the "SSW" members in February 1 forecasts with the coldest anomaly over Eurasia between February 25 and March 6 (below the 25th percentile, five members, dotted blue line), the mean of the "SSW" members in the February 1 forecasts with the warmest anomaly over Eurasia between February 25 and March 6 (above the 75th percentile, four members, dot-dashed red line), the ensemble mean for the forecasts with the stratosphere above 70 hPa nudged to ERA-Interim (dot-dot-dashed green line) and the ensemble mean for the forecast initialized on February 15 (short-dashed orange line). The two vertical lines indicate the central date of the SSW and March 1, respectively This can be seen by comparing the boxplot for the February 15 initialization with the boxplot for the nudged forecast "1:N-O." Moreover, the skill enhancement essentially disappears in forecasts that have the "incorrect" stratosphere (see the boxplots "1:N-F1"). This is also reflected in the predicted probability of the occurrence of an extreme event, as shown in Figure 4: in the ensemble with the stratosphere nudged to the observed evolution, the probability of the occurrence of an extreme NAO event rises to 25%, and to as much as 45% for the occurrence of an extreme cold anomaly. In contrast, in the ensemble with the "incorrect" stratosphere the probability is equal to the climatological value. This quantification of the increased likelihood of the extreme cold spell occurring due to nudging of the stratosphere to the observed evolution clearly demonstrates the importance of the stratosphere in enhancing the predictive skill. Similarly, the ACC is strongly enhanced in the ensemble with the stratosphere nudged to the observed evolution.
To link the above results from the nudged ensemble with the "correct" stratospheric evolution to the forecasts started at a later time, we further compare the evolution of the 100-hPa NAM index in those members that better predicted the cold Eurasian T2m anomaly (i.e., the cold sub-cluster) with those that did not (i.e., the warm sub-cluster). As for the analyses in Figures 5 and 6, the cold sub-cluster is defined as being below the 25th percentile, while the warm sub-cluster is above the 75th percentile of the Eurasian T2m anomaly averaged between February 25 and March 6. For the February 5, 8 and 12 initializations, the 100-hPa large positive NAM persistence (quantified by the number of days the 100-hPa NAM index stays above 1-sigma between February 12 and March 15) is longer in the cold sub-cluster and closer to the observed (28 days) than in the warm sub-cluster. The average NAM persistence for the cold and warm sub-clusters are 13.4 and 10.1 days (February 5 initialization); 20.3 and 17.8 days (February 8 initialization); and 22.8 and 18.1 days (February 12 initialization), respectively.
The fact that the ensemble with the stratosphere nudged to the observed evolution predicts the cold spell with a higher probability than the subsequent initialized forecasts until February 15 (see Figures 3 and 4) suggests that tropospheric initial conditions become important only from February 19 onwards and that the slightly enhanced skill of the February 15 forecasts is likely due to the "correct" stratospheric evolution (compare the thick solid black line with the short-dashed orange line in Figure 6). Further support for the limited relevance of initial conditions on February 15 comes from the fact that "1:N-O" members that better predict the cold spell do not have smaller tropospheric error on February 15 (as quantified by the Northern Hemisphere 500-hPa geopotential height ACC and root mean square error).
It should be emphasized that despite having the "correct" stratosphere, there is still significant spread in the nudged ensemble, with some members predicting warm anomalies and NAO+. Moreover, the ensemble mean cold spell/NAO− strength is smaller than observed by approximately a factor of two (compare the blue dots to the black asterisk for the "1:N-O" cluster in Figure 3). This reconfirms that internal tropospheric dynamics also played an important role in the development of the cold spell. Since forecasts initialized on February 15 show similar predictability to those with the "correct" stratosphere and still experience a large spread compared to forecasts initialized on February 19 and 22, we now examine the role of internal tropospheric dynamics, in particular the Scandinavian blocking, in the cold spell evolution in February 15 forecasts.

VARIABILITY OF THE TROPOSPHERIC CIRCULATION
We now discuss possible reasons for the cold spell predictability spread in the forecast initialized on February 15 with a focus on NAO− and on the persistent high-pressure anomaly over Scandinavia, that is, the Scandinavian blocking.
From mid-February to mid-March, atmospheric blocking could be identified over the Euro-Atlantic sector (Figure 7) based on the index of Tibaldi and Molteni (1990) (see Table 1). At the end of February, blocking affects a region from 80 • W to 40 • E. At the beginning of March, there is a retrograde (westward) movement of the block. As NAO− and blocking are not completely separate dynamical features (Shabbar et al., 2001;Croci-Maspoli et al., 2007), this signal could be related to the high-latitude, high-pressure system over the northern Atlantic, corresponding to the northern part of the NAO− pattern (with a simultaneous low-pressure system over the southern North Atlantic). In the ensemble forecast initialized on February 15, the ensemble spread in the Z500 field increases more strongly in the area dominated by the blocking system (compare the shading to the yellow contours in Figure 7) than in the surrounding areas, indicating that the Z500 forecast variability is related to the development of the block.
Due to the observed ensemble spread, we first investigate the impact of the Scandinavian block (see the area indicated by the vertical red lines in Figure 7) on the evolution of the cold spell in the ensemble forecast initialized on February 15. In particular, we are interested in answering the question of whether the ensemble members with strong blocking between February 25 and March 1 are already indicating a cold anomaly over Eurasia at this time. To do this, we cluster the ensemble members into groups with the strongest (above the 75th percentile) and weakest (below the 25th percentile) Scandinavian blocking values averaged between 15 • E and 40 • E. We do this for the time period from February 25 to March 1, as the block is identified over this time period in the reanalysis (see Figures 1a, 2 and 7). The differences in mean sea level pressure (MSLP) and T2m between these two clusters averaged over the period February 25 -March 1 are shown in Figure 8a. As a result of the clustering on the strength of the Scandinavian block, a Scandinavian blocking anticyclone is identified in the MSLP field (as expected). Over the North Atlantic an NAO+ pattern (with a negative MSLP anomaly over the polar region and a positive anomaly southward) is visible. A cold anomaly can be observed in parts of Eurasia, showing that members with strong Scandinavian blocking have lower temperatures over Eurasia. However, this cold anomaly is quite weak and not significant. This indicates that there is no significant influence of the Scandinavian block on the surface temperatures over Eurasia from February 25 to March 1.
To investigate the role of the NAO− phase, we also cluster members based on the value of the NAO index ( Figure 8b) between February 25 and March 6. This time period includes the minimum in the NAO index found in the reanalysis (see Figure 1a). The first cluster consists of members with strong NAO− and the second cluster consists of members with weak NAO− or NAO+. The MSLP field resulting from this clustering is dominated by the NAO− pattern. In the strong NAO− cluster, a strong and significant cold anomaly is found, with the lowest values over central Europe. This means, as expected, that members with strong NAO− predict a stronger cold spell over western and central Europe.
The clustering approaches do not answer the question of how important the regime change from the Scandinavian blocking pattern (dominating the end of February) to NAO− (dominating the beginning of March) was for the evolution of the Eurasian cold spell. To address this, we cluster the ensemble members using a different method. First, we select the members with an NAO− index value (averaged between March 2 and 6) below −0.5. Second, we cluster these members according to their Scandinavian blocking values (averaged between February 25 and March 1). The differences in MSLP and T2m averaged over the time period from February 25 to March 6 between the cluster with strong blocking (above the 75th percentile) and the cluster with weak or no blocking (below the 25 percentile) are illustrated in Figure 8c. In the MSLP field, the blocking anticyclone is visible as a result of the clustering approach. Over Eurasia, a significant cold anomaly can be seen, showing that the ensemble members with Scandinavian blocking at the end of February have lower temperatures over Eurasia than members without the blocking. This suggests that the Scandinavian block acted as a precursor to the cold spell as it favoured the flow of continental cold air from the northeast.
We further analyse the dependence of the tropospheric jet displacement on the cold spell amplitude. To do this, we cluster members with the coldest (below the 25th percentile) and the warmest (above the 75th percentile) anomalies over Eurasia between February 25 and March 6 and take the difference between the cluster means. The difference in the zonal wind at 300 hPa is shown in Figure 9. Members with the coldest anomalies over Eurasia are associated with an equatorward jet shift over the Euro-Atlantic sector (associated with NAO−), while members with the warmest anomalies are associated with a poleward jet shift (associated with NAO+). As the position of the jet stream is related to RWB, we additionally compare the 2-PVU isolines (where 1 potential vorticity unit (PVU) = 1.0 × 10 −6 m 2 /s K/kg) at 320 K on March 6 for two representative ensemble members. On March 6, the ensemble member that predicted the coldest anomaly over Eurasia shows cyclonic RWB events at 60 • W and 20 • E (blue contours in Figure 9). In contrast, the member with the warmest anomaly shows anticyclonic RWB over the North Atlantic (red contours in Figure 9). Moreover, the wave pattern over the Euro-Atlantic sector is more amplified in the member with the warmest anomaly over Eurasia. The relation of the cold spell to the NAO−, associated with the F I G U R E 8 (a) MSLP and T2m differences between the mean of members with the strongest Scandinavian blocking (SB) between 15 • E and 40 • E (above the 75th percentile, 12 members) and the mean of members with the lowest SB (below the 25th percentile, 13 members) averaged over a time period from February 25 to March 1 from the forecast initialized on February 15. (b) The same as (a), but for the difference between the mean of members with strong NAO− (12 members) and the mean of members with NAO+ or weak NAO− (12 members) averaged over a time period from February 25 to March 6. (c) The same as (b) but for differences between the mean of members with the strongest SB between 15 • E and 40 • E (above the 75th percentile, 10 members) and the mean of members with the lowest SB (below the 25th percentile, 11 members) based on a time period from February 25 to March 1, conditioned on members with NAO− below −0.5 between March 2 and 6. Dotted areas indicate significant differences in the T2m field between the two clusters southward jet shift and the RWB event, as well as the blocking system as a potential precursor, demonstrates that the prediction of the actual strength of the cold spell depends on the synoptic development in the ∼ 2 weeks preceding the cold spell event. Thus, while the stratosphere was found to favour the development of the NAO− phase, the actual synoptic development determines the severity of the event, and is thus predictable only on synoptic time-scales.

DISCUSSION AND CONCLUSIONS
In this study, we investigated the predictability of the Eurasian cold spell in late February/early March 2018 and the role of the extremely negative NAO. Our results confirm that, ultimately, extreme midlatitude surface events such as this cold spell are largely a result of internal tropospheric synoptic-scale dynamics. However, the probability F I G U R E 9 Zonal wind (in m⋅s −1 , shading) differences at 300 hPa between the mean of members with the lowest T2m anomaly over Eurasia (below the 25th percentile, 13 members) and the mean of members with the highest T2m anomaly (above the 75th percentile, 12 members) averaged over a time period from February 25 to March 6 based on members from the forecast initialized on February 15. Dotted areas indicate significant differences in the zonal wind field between the two clusters. Red contours show the 2-PVU isoline at 320 K for a member with the highest T2m anomaly on March 6, while the blue contours show the member with the lowest T2m anomaly F I G U R E 10 Schematic showing the predictability of a surface extreme event for different forecast ranges (from long-range forecasts with lead times of 1 month to synoptic-range forecasts with lead times of only a few days) under the influence of remote forcing (e.g., SSW) occurring during the extended range. Light grey shading indicates the ensemble spread (5th to 95th percentile), while dark grey shading indicates the ensemble mean of such extreme events occurring can be enhanced by forcers remote from the midlatitude troposphere, such as SSWs. In the case considered here, we found clear evidence for such enhanced probabilistic forecast skill at lead times of up to 25 days in ensemble members of extended-range forecasts from the S2S database (see Figure 10). To summarize our results, we consider the two questions posed in the introduction: 1. Did the cold spell in Northern Eurasia develop independently from the NAO− phase? Persistent weather patterns were essential for the development of the late-winter Eurasian cold spell. At the end of February 2018, the tropospheric circulation over Eurasia was dominated by strong Scandinavian blocking and associated continental polar air outbreaks. A regime shift from the Scandinavian block to NAO− occurred at the beginning of March 2018 and favoured the westward advection of cold air leading to an intensification of the cold spell over western and central Europe. The ensemble forecasts revealed that the strongest cold anomalies were predicted for members with strong NAO− phases, so the cold spell did not develop independently of the NAO− phase. 2. What role did the stratosphere play in triggering the NAO event?
The occurrence of the NAO− phase appears to be favoured by the stratospheric evolution associated with the SSW, as expected based on the canonical tropospheric response to the SSWs. However, the NAO− phase in early March 2018 exceeded the expected strength based on the SSW and may have occurred even without the SSW. However, based on the nudged experiment, we were able to quantify that the probability of an extreme NAO− phase was severely enhanced (to 25%) by the stratospheric evolution, compared to 5% in the climatology. Our results suggest that it is the subsequent evolution throughout the lower stratosphere following the SSW, rather than the occurrence of the SSW itself, that is crucial in coupling to large-scale flow patterns in the troposphere. In particular, enhanced probability of occurrence appears to arise directly following the date of the SSW event, which effectively improves the deterministic forecast skill of the NAO/cold spell event.
Based on the analysis of the ensemble forecast data alone, it is generally impossible to draw precise conclusions about the role of remote forcers (such as SSWs) in triggering midlatitude tropospheric extreme events, such as the 2018 cold spell analysed here. In order to allow for a more unambiguous conclusion on the role of the SSW in the predictability of the 2018 cold spell, nudged experiments, in which the stratosphere is relaxed to the observed evolution, were necessary. In particular, a considerable enhancement of probabilistic skill is observed when the "correct" stratosphere is prescribed (see Figures 3  and 4). This reaffirms the attribution of the cold spell to the SSW, as suggested by Ayarzagüena et al. (2018) and Karpechko et al. (2018). The important role of stratospheric variability for probabilistic predictability in the extended range, as demonstrated in the current example, further demonstrates the need for continued efforts to improve the representation of the stratospheric dynamics in S2S models.
However, there is considerable ensemble spread in the prediction of the cold spell and/or the NAO phase even in the nudged ensemble with the observed stratosphere. This highlights the capability of remote forcers such as the stratosphere to shift the probability of certain events occurring, but also shows the importance of tropospheric variability in determining the full strength of the event. Indeed, the observed cold spell was close to the 10th percentile of the nudged ensemble, which is to say that a moderate cold anomaly would have been just as likely (see Figure 3). We estimated that the likelihood of the occurrence of an extreme cold spell was enhanced from 5 to 45% when nudging to the observed stratosphere (see Figure 4).
We conclude that the evolution of the Eurasian cold spell was related to the shift of the NAO to its negative phase. NAO− was associated with cyclonic RWB over the North Atlantic as well as an equatorward displacement of the jet stream. In addition, the blocking system over Scandinavia seemed to act as a precursor for the onset of the cold spell as it favoured the advection of cold air from the northeast over a continental path towards western Europe at the end of February. This means that members that failed to predict the blocking and the NAO− pattern also failed to predict the amplitude of the cold spell correctly.
It is known from the literature that the frequency of winter blocking over the North Atlantic is significantly higher during NAO− events due to changes in thermal forcing. In particular, the "warm ocean/cold land" pattern associated with NAO− is favourable for winter blocking over the North Atlantic (Shabbar et al., 2001). In contrast, significantly higher blocking frequencies over northern Europe are observed during NAO+ phases, and are associated with a tendency for anticyclonic RWB over the North Atlantic (Croci-Maspoli et al., 2007). However, interactions between the NAO phase and the blocking are possible in both directions. This means that the blocking, depending on the location, can also sustain the current NAO phase (Croci-Maspoli et al., 2007). Moreover, Kunz et al. (2009) showed that these tropospheric dynamical features can also be influenced by the stratosphere. In particular, an enhanced number of cyclonic RWB events are observed in the troposphere during weak stratospheric vortex episodes. Thus, the relation between the NAO phase and the blocking (at different locations) is complex and can additionally be influenced by the stratospheric state. Our analysis elucidates that the evolution of the 2018 Eurasian cold spell might have depended on several internal processes (e.g., a shift from NAO+ to NAO−) and feedbacks (e.g., between RWB and NAO) within the troposphere.
However, it cannot be ruled out that these differences may have resulted (in part) from differences in the impact of the SSW. The dependence of the predictability of the tropospheric flow pattern on the stratospheric state is also indicated by results based on the ensemble forecast initialized on February 1, 2018 (see section 5 and Figure 5).
Apart from the stratosphere, another potential remote forcer influencing the predictability of the NAO and/or cold spell is the MJO (see Vitart and Robertson, 2018 for an example of the MJO influence on the 2013 European cold spell). A strong MJO over the Western Pacific was observed throughout the first three weeks of February 2018 (Barrett, 2019). An MJO over the Western Pacific (phases 6-8) can act as a precursor for NAO− on medium-range time-scales (Lin et al., 2009). However, in the 2018 case clustering of ensemble members by MJO phase was not possible, as all ensemble members exhibited monthly mean Pacific 10-m zonal wind patterns typical of MJO phase 7 (not shown, but see fig. 2 in Marshall et al., 2015). Given this, it is unlikely that the uncertainties in the MJO predictability led to uncertainties in the cold spell/NAO− predictability.
Implicit in our discussion is the assumption that S2S models faithfully represent stratosphere-troposphere coupling. Thus, our conclusions on the impact of SSW on tropospheric flow might suffer from an imperfect representation of stratosphere-troposphere coupling in models (Stockdale et al., 2015). As shown recently, the representation of gravity wave parameterization schemes can have a major impact on the stratospheric dynamics and stratosphere-troposphere coupling (Polichtchouk et al., 2018a;2018b). Resolution, particularly vertical resolution, can also affect the ability of stratospheric anomalies to influence the troposphere, especially in the tropopause communication layer (Birner and Albers, 2017).
Although ensemble forecasts such as those collected as part of the S2S dataset offer important new insights compared to observational analyses, particularly for single events, their ability to infer causal relationships is nevertheless limited. While our results confirm the potential for enhanced predictability of surface extreme events following SSWs, we caution that such probabilistic gains in predictability are insufficient to draw precise conclusions about a mechanistic link between SSW and cold spell events. In fact, our analyses show that it cannot be ruled out that the cold spell event could have occurred without the preceding SSW. Attribution of tropospheric extreme events, such as cold spells, to dynamical events in the stratosphere requires dedicated experimentation, using, for example, targeted perturbation and sensitivity experiments. Such numerical experiments are currently planned as part of the German Research Foundation's Collaborative Research Center "Waves to Weather."