Representation of the Scandinavia–Greenland pattern and its relationship with the polar vortex in S2S forecast models

The strength of the stratospheric polar vortex is a key contributor to subseasonal prediction during boreal winter. Anomalously weak polar vortex events can be induced by enhanced vertically propagating Rossby waves from the troposphere, driven by blocking and wave breaking. Here, we analyse a tropospheric pattern—the Scandinavia–Greenland (S–G) pattern—associated with both processes. The S–G pattern is defined as the second empirical orthogonal function (EOF) of mean sea‐level pressure in the northeast Atlantic. The first EOF is a zonal pattern resembling the North Atlantic Oscillation. We show that the S–G pattern is associated with a transient amplification of planetary wavenumber 2 and meridional eddy heat flux, followed by the onset of a weakened polar vortex, which persists for the next two months. We then analyse 10 different models from the S2S database, finding that, while all models represent the structure of the S–G pattern well, some models have a zonal bias with more than the observed variability in their first EOF, and accordingly less in their second EOF. This bias is largest in the models with the lowest resolution. Skill in predicting the S–G pattern is not high beyond week 2 in any model, in contrast to the zonal pattern. We find that the relationship between the S–G pattern and enhanced eddy heat flux and a weakened polar vortex is initially well represented, but decays significantly with lead time in most S2S models. Our results motivate improved representation of the S–G pattern and its stratospheric response at longer lead times for improved subseasonal prediction of the stratospheric polar vortex.


INTRODUCTION
Variability in the strength of the wintertime Arctic stratospheric polar vortex (SPV) significantly impacts both the behaviour and the predictability of Northern Hemisphere (NH) tropospheric weather patterns on subseasonal to seasonal (S2S) timescales (∼2 weeks to 2 months: Kidston et al., 2015;Tripathi et al., 2015;Domeisen et al., 2020b). Of particular importance is the development of a weakened SPV, which includes major sudden stratospheric warming (SSW) events-when the mean westerly circulation of the SPV entirely reverses to easterlies during midwinter (e.g., Charlton and Polvani, 2007;Butler et al., 2015). In the subsequent weeks to months following the onset of a weakened SPV, there is an increased likelihood of NH cold air outbreaks (Kolstad et al., 2010;Kretschmer et al., 2018a;2018b;Kautz et al., 2018b), through the development of a negative tropospheric Northern Annular Mode (NAM) and associated equatorward eddy-driven jet shift (Baldwin and Dunkerton, 2001) or by favouring particular regional weather regimes Lee et al., 2019b). Weak or disrupted SPV states are driven by the vertical propagation and subsequent breaking of large-scale (planetary wavenumbers 1-3) Rossby waves from the troposphere to the stratosphere (Charney and Drazin, 1961;Matsuno, 1971;McIntyre and Palmer, 1983), although the tropospheric wave activity need not be anomalously large, with preconditioning of the SPV playing an important role (Birner and Albers, 2017;De La Cámara et al., 2019;Lawrence and Manney, 2020). Despite substantially longer predictability and persistence timescales in the stratosphere than in the troposphere (e.g., Baldwin et al., 2003;Son et al., 2020), predicting the onset of significant SPV circulation anomalies on S2S timescales remains challenging and is likely to be a contributing factor to poor skill in wintertime NH subseasonal forecasts. Several recent studies have analysed the performance of prediction systems contributing to the World Climate Research Program and World Weather Research Program S2S Prediction Project database (Vitart et al., 2017), comprising operational forecasts and hindcasts from various modelling centres around the world. Domeisen et al. (2020a) determined that S2S model skill in predicting both weak and strong SPV states was generally limited to the medium-range (1-2 weeks). They found particularly poor predictability for major SSWs classified as "split" vortex events, usually associated with amplification of a wavenumber-2 disturbance. Models with the greatest vertical resolution in the stratosphere and highest lid ("high-top" models) performed best. However, models with the highest vertical resolution also generally have higher horizontal resolution, making disentangling the respective influences difficult. These results are consistent with those of Taguchi (2018) and Karpechko (2018), who assessed the predictability of major SSWs in S2S model and European Centre for Medium-Range Weather Forecasts (ECMWF) hindcasts, respectively. Additionally, Butler et al. (2019), who also used S2S hindcasts, found that dynamically driven final stratospheric warmings (akin to major SSWs) were typically poorly predicted beyond week 2.
A component limiting the skill in predicting these stratospheric events may arise from the predictability of tropospheric wave activity or its interaction with the stratospheric mean state in the model. On subseasonal timescales, sources of enhanced tropospheric wave activity linked to SPV variability are typically associated with processes that interfere constructively with the mean planetary wave pattern. These include the Madden-Julian Oscillation (MJO: Garfinkel et al., 2012;Barnes et al., 2019;Green and Furtado, 2019), tropospheric blocking (Quiroz, 1986;Martius et al., 2009;Attard and Lang, 2019;Peings, 2019), anticyclonic Rossby-wave breaking (Lee et al., 2019a), and extratropical cyclones (Coy et al., 2009;Coy and Pawson, 2014;Attard and Lang, 2019). The latter three are dynamically interconnected; Rossby-wave breaking is a key driver of blocking (Masato et al., 2012), while warm-sector processes associated with extratropical cyclones play a significant role in block onset (e.g., Maddison et al., 2019). Moreover, these are also tropospheric processes known to have particularly poor predictability. Quinting and Vitart (2019) assessed S2S model representation of Rossby-wave packets, finding that all models generally overestimated the propagation distance of wave packets in the eastern North Atlantic, with attendant negative blocking biases. The models with the coarsest resolution (both horizontal and vertical) exhibited the largest biases. Block-onset biases in the northeast Atlantic are in agreement with the results of Ferranti et al. (2015). They found that the ECMWF ensemble forecasts exhibited the worst skill when transitioning from a positive North Atlantic Oscillation (NAO+) to a Scandinavian blocking regime (Cassou, 2008), overestimating the persistence of the prior zonal state.
Various studies have explored the impact of model resolution (including vertical resolution) on the development and maintenance of blocking (e.g., Matsueda, 2009;Anstey et al., 2013;Berckmans et al., 2013;Davini et al., 2017;Schiemann et al., 2017). There is good agreement of significant improvements, especially in the Euro-Atlantic region, at higher resolution, owing to better representation of several of the processes involved directly and indirectly in blocking. These processes include, but are not limited to, orography and its impact on the generation of planetary waves (including the climatological tilt of the North Atlantic jet), the strength of the North Atlantic jet and associated wave breaking in its exit region, the representation of the trimodal variability of the North Atlantic jet, warm-sector ascent and tropopause outflow, and the overall model mean state. Given the link between Scandinavian/Ural blocking and SSWs (Martius et al., 2009;Bao et al., 2017;Peings, 2019), model errors in this region may impact subseasonal skill through SPV variability.
In a case study of the February 2018 SSW, Lee et al. (2019a) (hereafter, L19) related the relatively short predictability onset (∼12 days) to a poorly predicted anticyclonic wave break in the North Atlantic (diagnosed as the "Scandinavia-Greenland (S-G) dipole"), which drove enhanced tropospheric wave activity and led to the onset of Ural blocking . While L19 found that similar S-G dipoles were evident prior to previously observed SSWs, they did not assess its forecast predictability beyond the 2018 event. Motivated by this, in the present study we define a similar, but more generalised, pattern, and assess its representation and predictability in 10 extended-range models from the S2S database. The ultimate aim is to determine whether lead-time or model-dependent biases in the S-G pattern and its influence on the SPV exist, which may then contribute toward limiting subseasonal stratospheric skill.
The remainder of the article is thus laid out as follows. In Section 2, we introduce the datasets and methods used. Section 3 defines the S-G pattern and its relationship with the SPV in reanalysis. We then analyse the representation and predictability of the S-G pattern in S2S model hindcasts in Section 4. Section 5 analyses the modelled relationship of the S-G pattern with the SPV in S2S hindcasts. Conclusions of our work follow.

DATA AND METHODS
We use hindcasts (i.e., reforecasts for dates in the past) launched between November and March (NDJFM) from 10 models from the S2S database, the details of which are shown in  (Hersbach et al., 2020). All data are sampled once daily at 0000 UTC and are regridded to 2.5 • horizontal resolution for intercomparison purposes (hindcasts are stored in the S2S database at 1.5 • resolution, except for BoM, which is stored at 2.5 • ). 40-80 • N averaged zonal-mean meridional eddy heat flux (denoted [v * T * ], where the star denotes a departure from the zonal mean, and the square brackets dictate a zonally averaged quantity) is used as a proxy for vertically propagating wave activity flux, since it is directly proportional to the vertical component of the Eliassen-Palm flux vector (e.g., Edmon et al., 1980). 60-90 • N area-averaged (polar cap) geopotential height anomalies are used as a diagnostic for the strength of the SPV: positive anomalies indicate an anomalously weak SPV (i.e., a negative NAM). Both [v * T * ] and polar cap heights are weighted by cosine latitude. Anomalies are computed with respect to the daily 0000 UTC climatology; this is initialisation-date, lead-time dependent in the case of the S2S hindcasts. Standardised anomalies are computed in the hindcasts by dividing by the initialisation-date, lead-time dependent standard deviation of all ensemble members. In the BoM model, the 33 ensemble members comprise three different configurations of the model with 11 members each (Hudson et al., 2013); anomalies are thus computed with respect to the climatology for each version before forming the grand ensemble.
Empirical orthogonal function (EOF) analysis is performed on mean sea-level pressure (MSLP) anomalies in a grid box in the North Atlantic covering parts of Scandinavia and Greenland, bounded by 60-85 • N, 60 • W-50 • E. These bounds are chosen to cover approximately a similar region to the S-G dipole of L19, as well as with consideration of the climatological exit region of the eddy-driven jet and the stationary wave pattern in the North Atlantic around 60 • N. All data are weighted by the square root of the cosine of latitude to provide equal-area weighting in the computation of the EOFs, and in all cases, the EOFs are scaled by the square root of the eigenvalue to give unit standard deviation of the principal component (PC) time series. The resultant patterns are largely insensitive to the changes in the domain boundaries within ∼10 • . In the hindcasts, EOFs are either analysed as (a) the "model" EOFs, or (b) the projection on to the ERA5 "observed" EOF over the same period. In the case of (a), MSLP anomalies are computed with respect to the model climate for each initialisation date over the November 1999-March 2010 period, and then EOF analysis is performed across all ensemble members at each forecast day for initialisations through the NDJFM period. There is negligible change to the variance or pattern correlation statistics of TA B L E 1 Details of the S2S model hindcasts used in this study. These model versions were chosen as they were all used to produce real-time forecasts during November-March 2018-2019 the ERA5 EOFs over the equivalent time period as a 4 week model forecast. However, beyond 4 weeks, there are some changes due to the seasonal cycle, but the results are qualitatively similar to week 3-4. For brevity, we do not show analysis of forecasts beyond week 4. Statistical significance of the regression analyses is assessed by bootstrap resampling, performed 5,000 times on random paired samples (with replacement) selected from the data for each point in space/time. If zero lies outside the 2.5th-97.5th percentiles of these resampled slopes, then the regression is deemed statistically significant at the 95% confidence level.

EOF structure and characteristics
The first two EOFs, shown in Figure 1 as a regression with MSLP anomalies, together explain the majority (62%) of the variance in the analysis domain and are well separated according to the criterion of North et al. (1982) ( Figure  S1 in the Supporting Information). There is little difference between the EOFs for the full ERA5 dataset and the much smaller S2S common period ( Figure S2); for the sake of a larger sample size, we primarily analyse the full ERA5 dataset. The leading EOF ( Figure 1a) explains 36% of the variance; we define it such that a positive loading is characterised by a cyclonic MSLP anomaly centred over Iceland extending across most of the analysis domain.
There is also an associated remote anticyclonic MSLP anomaly across the central North Atlantic resembling the Azores high. Overall, the structure of this first EOF is similar to the NAO, despite the much smaller region over which it was computed (for example, the NAO of Hurrell (1995) is computed over 20-80 The pattern is persistent; the PC time series has an autocorrelation e-folding timescale of 7 days. Hereafter, we refer to this first EOF as the "zonal pattern". The second EOF (Figure 1b), explaining 26% of the variance, consists of a zonal dipole structure, defined here such that a positive loading has an anticyclonic anomaly over Scandinavia and a cyclonic anomaly extending over most of Greenland. This pattern shares strong similarities with the S-G dipole of L19, the Greenland-Scandinavia cluster of Cassou et al. (2004), and to a lesser extent the Scandinavian blocking regime (Cassou, 2008) and the Ural blocking anomaly of Peings (2019). It also closely resembles the MSLP anomalies associated with anticyclonic wave breaking and block onset near 20 • E in Masato et al. (2012) (their Figure 5f). Hereafter, we refer to the principal component time series of the second EOF as the "S-G index" and the positive loading of this second EOF as the "S-G pattern". The S-G index is more transient than the zonal pattern with an autocorrelation e-folding timescale of 4 days, consistent with the relatively short timescale of anticyclonic wave breaking identified in L19.

Stratospheric relationship
To establish the observed relationship between the S-G pattern and the SPV, we perform lagged linear regression between the S-G index and zonal-mean eddy heat flux and polar cap geopotential height anomalies ( Figure 2) and the amplitude anomalies of wavenumbers 1 and 2 at 60 • N ( Figure 3). Composite-based analyses (not shown) yield very similar results, confirming the suitability of the linear regression approach. At negative lags of 10-20 days, the S-G pattern is associated with a strong SPV precursor, consistent with the Greenland trough (i.e., the absence of Greenland blocking) present in the S-G pattern, which is more likely during a strengthened SPV (e.g., Charlton-Perez et al., 2018). The significantly enhanced heat flux in the stratosphere during this time may be explained by sharpened potential vorticity gradients on the edge of the strengthened SPV acting as a waveguide (e.g., Scott et al., 2004). In agreement with the threshold event-based composite results of L19, on short lags, the S-G pattern is associated with a transient period of anomalously enhanced heat flux (Figure 2a), forming a coherent pulse from the troposphere to the upper stratosphere on a timescale of ∼5 days. Wavenumber 2 is significantly amplified throughout the column during this time (Figure 3b), in agreement with the coherence between the S-G pattern and the climatological-mean eddy height field (cf. Figure 4c). There is a concomitant abrupt development of a weak vortex anomaly, which persists and descends through the stratosphere over the following ∼2 months. These results further motivate analysis of the S-G pattern as a significant contributor to SPV variability across the S2S timescale, beyond just the major SSWs considered in L19.
The relationship with eddy heat flux (Figure 2a) remains significant and positive for lags of several weeks, with evidence of a secondary peak in the upper troposphere-lower stratosphere at lags of ∼25-30 days during a period of wavenumber-1 amplification (wavenumber 2 is anomalously suppressed during this time). The magnitude of the weak vortex anomaly intensifies in the middle and lower stratosphere following this second peak (accordant with the relationship between cumulative heat flux and SPV strength: Polvani and Waugh, 2004). There is little evidence of downward coupling of the weak vortex anomaly into the troposphere, although it is possible this is due to the intrinsically larger tropospheric variability which may not be captured in a F I G U R E 3 Lagged linear regression between the S-G index and standardised anomalies of the amplitude anomaly of (a) wavenumber 1 and (b) wavenumber 2 at 60 • N for NDJFM 1980-2019 in ERA5. Stippling indicates significance at the 95% confidence level according to a bootstrap resampling procedure lagged linear regression approach. However, downward coupling is apparent when only the S2S common period is used, despite otherwise similar results ( Figure S3). Fully determining a cause of these differences is beyond the scope of the study.
To provide further insight into the dynamics introduced in the cross-section analysis, polar-stereographic maps of lagged regressions with standardised anomalies of (a) 100-hPa meridional eddy heat flux and (b) 50-hPa and (c) 500-hPa geopotential height are shown in Figure 4, alongside the mean 500-hPa eddy height field as a diagnostic of the climatological stationary waves. Eddy heat flux is anomalously amplified at 100 hPa in a sector north and east of Scandinavia (days 0 and 5), and subsequently downstream over eastern Asia (day 5), consistent with the constructive interference between the S-G pattern and the mean eddy height field in the North Atlantic sector. The 50-hPa geopotential height anomalies form a clear wavenumber-2 pattern, reminiscent of an SPV split event, on days 0 and 5 with positive height anomalies intersecting negative height anomalies across the pole. Subsequently, the wave field decays, while positive geopotential height anomalies remain over the central Arctic, indicative of a  Figures 2a and 3a) can be seen to emanate from the northeast Pacific, downstream of an amplified tropospheric Aleutian low.

Representation of EOFs
In order to assess the ability of S2S models to replicate the observed patterns of variability, we repeat the same EOF analysis as performed in ERA5, but in the 10 S2S model hindcasts, where the EOF is computed across all ensemble members at each lead time (to assess lead-time dependent biases). The first two EOFs of all S2S models have very similar structures to the equivalent observed EOFs; weekly mean pattern correlations do not drop below 0.7 for either EOF (see Figure S4). The lower pattern correlations are for EOF1; visual inspection reveals this is likely due to slight shifts in the centre of the Icelandic cyclonic anomaly. For EOF2, the pattern correlations exceed 0.95. Thus, we can be confident that the S2S models replicate the structure of the variability well and that the model EOFs can be compared directly with those from ERA5. For brevity, the week 4 mean EOF patterns for all models are shown in Figure 5. We quantify variability biases in the S2S models by computing the ratio between the explained variance fraction of the model EOFs and the equivalent ERA5 EOF ( Figure 6). Generally, these S2S models have more than the observed variance fraction in EOF1, and correspondingly less in EOF2, in common with generic model biases in wave breaking and blocking. No model has statistically more than the observed variance fraction in EOF2. The biases are especially large in weeks 3 and 4, and are largest for HMCR, CMA, and BoM, with 20-30% more than the observed variance in EOF1 and correspondingly less in EOF2. HMCR exhibits large variance biases even in the first forecast week. These three models have the lowest horizontal resolution (>1 • ), as well as the lowest vertical resolution and lowest lid height. The statistics here indicate no dependence on ensemble size or initialisation frequency. Note that biases are small in NCEP, despite it having the fourth lowest horizontal resolution and being the oldest model version used in our study, suggesting that multiple factors likely contribute to these biases. We also considered the ratio of the total variance in the models to that in ERA5 (see Figure S5). In CMA, the total variance is close to that in ERA5, while in BoM and HMCR the total variability declines with lead time to ∼75% of ERA5. Interpreting these statistics in a physically meaningful sense is more challenging, but suggests that the characteristics of the biases in HMCR and BoM are different from those in CMA. We also note that other models, in particular ECCC, UKMO, and CNRM, have slightly more total variability than ERA5. Despite these variance biases, the persistence of the patterns (as measured by the autocorrelation e-folding timescale of the PC time series) in all models is not significantly different from that in ERA5 for either EOF on all forecast days.

Predictability
Next, we consider the ability of the models to predict the evolution of the observed EOFs accurately. We assess this by first projecting the NDJFM 2000-2010 EOFs computed from ERA5 on to the model MSLP anomalies for each ensemble member, to generate a forecast PC time series. As an initial analysis of deterministic skill, Figure 7 shows ensemble-mean correlation skill, defined as the first day when correlation drops below 0.6, for both the zonal and S-G patterns. For all models and for both patterns, the limit of ensemble-mean correlation skill lies within 15 days. There is an indication of 1-2 days of additional skill for the zonal pattern, in agreement with its greater decorrelation timescale, though this difference is only significant for CMA and NCEP. There is again evidence of resolution dependence (although minimal for models with higher resolution than CMA) and the impact of the variability biases discussed in the preceding subsection.

F I G U R E 7
The first day when the ensemble-mean correlation with ERA5 drops below 0.6, for the zonal pattern (EOF1, circle markers) and S-G pattern (EOF2, square markers). Error bars indicate 95% confidence intervals from 5,000 bootstrap resamples, where these correspond to the first day of skill below 0.6 in the 2.5th and 97.5th percentiles of the resampled correlations. The models are sorted left-to-right by increasing horizontal resolution. A persistence forecast is also shown for reference BoM exhibits particularly poor performance, with only 5-6 days of skill for the S-G pattern and 6-8 days for the zonal pattern. In contrast, ECMWF has 9-10 days of skill in the S-G pattern and 10-13 days in the zonal pattern.
To assess the general performance of the ensemble systems on subseasonal timescales, Figure 8 shows the receiver operating characteristic skill score (ROCSS: e.g., Wilks, 2019) for a 1 threshold (where the sigma threshold is set by the ERA5 EOF) for both the zonal and S-G patterns. Similar results are obtained for various positive thresholds. The ROCSS is the additional area under the ROC curve (i.e., a plot of true positive rate (TPR) versus false positive rate (FPR)) versus a non-skilled forecast (TPR = FPR). Here, a true positive is counted as m ensemble members correctly predicting the S-G/zonal pattern index exceeding the threshold in the target week (and vice versa), where m is varied from 0 to the full ensemble size. The ROCSS for all models at all lead times is higher for the zonal pattern than for the S-G pattern, increasing to 50-100% larger in weeks 3 and 4, consistent with the biases introduced in Figure 6. Skill in the S-G pattern for BoM and CMA is appreciably lower than for other models by week 2, but in weeks 3 and 4 all models have only small skill. BoM, CMA, KMA, and ECCC have negligible skill in the S-G pattern in week 4, despite comparable skill to other models in the zonal pattern. We also performed the same analysis but for the lead-time dependent 1 threshold in each model, obtaining similar results.

S-G STRATOSPHERE RELATIONSHIP IN S2S MODELS
In this section, we assess the ability of S2S models to capture the observed relationship (established in Section 3.2) between the S-G pattern and both enhanced heat flux and a weakened SPV in the following weeks to months. First, the ERA5 EOF is projected on to the MSLP anomalies in each ensemble member, and then lagged linear regression is performed across all ensemble members and initialisation dates. Figure 9 shows the lagged regression between the S-G pattern on the first day of the forecasts and the subsequent 40-80 • N averaged eddy heat flux anomalies, and Figure 10 shows the equivalent for polar cap geopotential height anomalies. All 10 models capture the immediate relationship with enhanced eddy heat flux from the troposphere to the stratosphere within the first week, and all but the low-topped HMCR model capture the subsequently weakened SPV (cf. Figure 2) for the remainder of the forecast period. However, there are subtle intermodel differences and differences in the observed relationship, though we emphasise that the differences in sample sizes make direct intercomparison of magnitudes more challenging.
BoM captures the initial pulse of enhanced heat flux (Figure 9a), but there is no evidence of significantly enhanced heat flux from the troposphere to the stratosphere beyond 15 days-the second pulse (wavenumber 1) at 20-30 days is not present. Accordingly, the weak SPV anomaly (Figure 10a) peaks shortly after 10 days and decays slowly without evidence of amplification and downward propagation. Nevertheless, the signal for a weakened SPV over the next 60 days remains, in agreement with observations (cf. Figure 2a). Overall, the structure of the heat flux evolution is closer to observations for the other models, though there is no significantly raised heat flux beyond 10 days in JMA (Figure 9h) and the pattern is rather diffuse at longer lead times in ECMWF (Figure 9j). Notably, the second period of significantly enhanced heat flux is captured even in the low-top HMCR model (Figure 9c), supportive of a mostly tropospheric-led mechanism.
Aside from BoM and HMCR, the significantly weakened SPV maximises in all models at a similar time to the observations, though with varying magnitude. The initially significantly weakened SPV is missing from ECCC (Figure 10i), while the lower stratospheric anomalies in JMA (Figure 10h) around day 30 are insignificant, in contrast to other models and ERA5. The response in the troposphere is particularly varied between the models; the development of negative polar cap height anomalies seen in ERA5 is significantly present in all except KMA and ECCC, again with varying magnitudes and timings. In ERA5, this occurs most strongly at 30-40 days, while it is 10-15 days earlier in these models-consistent with a zonally biased state or reduced blocking persistence. In UKMO, and to a lesser extent ECMWF and CMA, there is significant downward coupling into the troposphere from the stratosphere after 30 days. While ERA5 indicates some coupling (especially in the S2S common period; see Figure S3), the maximum in these models is both larger and earlier-occurring when ERA5 shows negative polar cap anomalies. We also performed the same analysis as shown in Figures 9 and 10 but at forecast day 15, as a measure of the ability of the model to capture the relationship internally once it has drifted toward its own climatological state (while retaining sufficient subsequent forecast days to assess the lagged response). The results are much weaker (shown in Figures S6 and S7); the heat-flux pulse remains present but is systematically weaker, and there is an accordingly weaker signal in the SPV strength, which is not significant in KMA, ECCC, or JMA (though the latter two are limited by their shorter forecast ranges).
To further this analysis, Figure 11 shows the weekly mean regression coefficients across all ensemble members for eddy heat flux at (a) 300 hPa on the same day (i.e., the tropospheric wave activity associated with the S-G pattern, cf. Figure 2), (b) 100 hPa with a three-day lag, and (c) 50 hPa with a four-day lag (where the lags correspond to maximum correlation with the S-G pattern in ERA5). At 300 hPa, there is no significant difference between the regression coefficients for the models or ERA5 for any forecast week. The (statistically insignificant) increase from week 1 to week 4 may be attributable to the seasonal cycle. The agreement between the models and observations at all lead times indicates that, in the troposphere, the influence of the S-G pattern on zonal-mean wave activity is well represented. However, the representation of the relationship with vertically propagating wave activity in the stratosphere at 100 and 50 hPa is largely lead-time dependent, with much weaker regression coefficients, particularly in weeks 3 and 4 (up to 50% smaller for HMCR, UKMO, and ECMWF)-though CMA is significantly weaker at all lead times and does not decay over time. These results agree with the weaker magnitude of the SPV anomalies in Figure  S7, providing evidence that the communication of wave activity from the troposphere to the stratosphere produced by the S-G pattern is weaker at longer lead times, thereby generating weaker circulation anomalies. This bias may therefore preclude subseasonal stratospheric predictability further, even in the case of a well-forecast troposphere. F I G U R E 8 ROC skill score for ≥ 1 zonal pattern (solid colours) and S-G pattern (hatched) events. The models are sorted left-to-right by increasing horizontal resolution Note that, in weeks 3 and 4, the bias is systematic across the S2S hindcasts analysed here and does not seem dependent on the resolution of the models.

SUMMARY AND CONCLUSIONS
In this article, we have performed an analysis of observed (ERA5) and modelled (S2S hindcasts) wintertime tropospheric variability in a region of the northeast Atlantic associated with the exit region of the eddy-driven jet, and its relationship with vertically propagating wave activity and the strength of the SPV. We find that the first two EOFs of MSLP anomalies describe the majority of the variability within the analysis domain. The leading mode represents a zonal pattern akin to the NAO, while the second mode depicts the S-G pattern-characterised by an anticyclonic anomaly over Scandinavia and an anomalous trough over Greenland (Figure 1). The S-G pattern resembles various patterns previously identified to be associated with anticyclonic wave breaking, blocking, and influences on SPV variability. We find that the S-G pattern is associated with transient amplification of wavenumber 2 and anomalously enhanced eddy heat flux into the stratosphere, with a weakened SPV following and persisting for the next two months (Figure 2). The long timescale of the relationship with the SPV strength supports the importance of representing the S-G pattern-and more generally, tropospheric blocking and Rossby-wave breaking-for S2S prediction.
In the 10 models from the S2S database analysed here, the structure of the two EOFs is represented well at all lead times ( Figure 5). However, the lowest resolution models (namely BoM, CMA, and HMCR) exhibit a large zonal variability bias that grows with lead time (Figure 6). These models have more than the observed variance fraction in the first EOF (the zonal pattern) and a proportional reduction in the second EOF (the S-G pattern). Although the aforementioned three models have the largest biases, all 10 models have slightly less than the observed variance fraction in the second EOF, consistent with extensive literature on the underrepresentation of blocking and wave breaking in most forecast models. Our finding of the largest variability biases in BoM, CMA, and HMCR agrees well with the relative magnitudes of the biases in Rossby-wave packet decay in the northeast Atlantic in Quinting and Vitart (2019) (their Figure 2), supporting our physical interpretation of the EOFs further.
We find that all models have more skill in predicting the zonal pattern versus the S-G pattern, especially in weeks 3 and 4 (Figures 7 and 8). Ensemble-mean correlation skill is limited to well within two weeks (with a maximum of 10 days in the S-G pattern in several models, but as low as five days in BoM), and there are also indications of resolution dependence. Considering the link between the S-G pattern and the SPV, the timescale of correlation skill in these models (and intermodel differences) is similar to the timescale of SSW prediction (e.g., Domeisen et al., 2020a), while the very limited ROC skill in the subseasonal range is further supportive of a limitation on subseasonal stratospheric skill arising from poorly predicted tropospheric processes. Moreover, our results indicate that the poor longer-term predictability of the S-G event preceding the February 2018 SSW (as described in L19) is not unique to that case.
The S2S models represent the stratospheric influence of the S-G pattern well when considering the initial conditions (Figures 9 and 10), though the amplitude and persistence of the SPV anomalies are weaker in lower-topped models. However, we find that the relationship between the S-G pattern and enhanced eddy heat flux into the stratosphere (and a subsequently weakened SPV) is much weaker at longer lead times, with no clear dependence on resolution or lid height. We find evidence ( Figure 11) that the weakening of the relationship is due to reduced attendant wave activity flux into the stratosphere (up to 50% weaker in week 4 versus ERA5 in F I G U R E 9 Linear regression between the S-G index in the first day of the forecast and standardised anomalies of 40-80 • N eddy heat flux for the subsequent 30 forecast days in hindcasts from 10 S2S models in NDJFM 2000-2010. The linear regression is carried out across all ensemble members. Stippling indicates significance at the 95% confidence level according to a bootstrap resampling procedure. The contour scale is chosen to match that in Figure 2 HMCR, ECMWF, and UKMO), as the tropospheric wave activity remains similar to that observed. It is possible that lead-time dependent biases in the modelled stratospheric mean state, such as biases in the zonal winds, alter the vertical propagation or subsequent amplification of anomalous wave activity from the troposphere (perhaps arising from the interaction with the mean stationary waves: Nishii et al., 2009). The significantly weaker regression coefficients in week 4 in the higher resolution ECMWF and UKMO models are particularly notable, given the cold SPV bias in those models (Son et al., 2020).
Our results, while limited by a multimodel approach, support the importance of higher model resolution in the representation of wave breaking and blocking within the exit region of the North Atlantic eddy-driven jet. We suggest that a contribution to these biases, especially their lead-time dependent nature, arises from the decline in tropopause sharpness seen in numerical weather prediction models at longer lead times and its impact on Rossby-wave propagation (Gray et al., 2014;Saffin et al., 2017). Future work to ascertain the relative importance of resolution and the representation of other processes (such as diabatic effects) in modelling the variability in the northeast Atlantic accurately may help address this hypothesis. Furthermore, these biases may subsequently be manifested in the downward high-latitude blocking response of the troposphere to a weakened SPV (i.e., the onset of a negative NAM), which is poorly represented in S2S models (e.g., Figure 7 in Domeisen et al., 2020b). We F I G U R E 10 Linear regression between the S-G index on the first day of the forecast and standardised anomalies of 60-90 • N geopotential height for the remaining forecast days in hindcasts from 10 S2S models in NDJFM 2000-2010. The linear regression is carried out across all ensemble members. Stippling indicates significance at the 95% confidence level according to a bootstrap resampling procedure. The contour scale is chosen to match that in Figure 2 note that the negative loading of the S-G pattern corresponds to Greenland blocking and a Scandinavian trough anomaly, and the negative loading of the zonal pattern corresponds to a negative NAO-both of which are broadly consistent with surface responses to a significantly weakened SPV or major SSW (e.g., Butler et al., 2017). Thus, while the focus of this study has been on the positive S-G pattern as a source of SPV weakening, the S2S model biases in variability and predictability apply to anomalies of both signs, and therefore are likely partly related to poor S2S skill in the response to SSWs.
In conclusion, the combination of zonal biases, limited sub-seasonal skill in the S-G pattern, and poor representation at longer lead times of its subsequent impact on the SPV is likely to be a contributing factor to the still limited skill in predicting stratosphere-troposphere coupling on S2S timescales. A targeted approach to determining the representation within S2S models of further key tropospheric processes known to influence the SPV, such as western Pacific bomb cyclones (Attard and Lang, 2019), may illuminate additional regions where stratospheric S2S skill could be gained. F I G U R E 11 Weekly mean regression coefficients between the S-G index in each ensemble member and the corresponding eddy heat flux anomalies at (a) 300 hPa on the same day, (b) 100 hPa three days later, and (c) 50 hPa four days later. The lags correspond to days with maximum correlation in ERA5. Equivalent ERA5 statistics are also shown. The models are sorted left-to-right by increasing horizontal resolution. Error bars indicate 95% confidence intervals according to a bootstrap resampling procedure; stippled bars indicate where these do not overlap with those of ERA5 R8/H12/83/001. EOF analysis was performed in Python using the package eofs (Dawson, 2016). This work is based on S2S data, a joint initiative of the World Weather Research Programme (WWRP) and the World Climate Research Programme (WCRP). The S2S prediction project database is available online at https://apps. ecmwf.int/datasets/s2s. The ERA5 reanalysis is available from the Copernicus Climate Data Store at https://cds. climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5pressure-levels. We thank two anonymous reviewers for their helpful suggestions.