Attributing the forced components of observed stratospheric temperature variability to external drivers

One of the largest anthropogenic fingerprints on climate is observed in stratospheric temperatures, but measurements in this region are uncertain. Here, regularised optimal fingerprinting techniques are used to attribute annual temperature variability in the mid‐upper stratosphere to external forcing factors over the period 1979–2005. Specifically, the solar, volcanic, ozone and greenhouse gas (GHG) forced components are characterised. The analysis compares the two most recent reconstructions of the Stratospheric Sounding Unit (SSU) with each other and with six historically forced simulations taken from the Coupled Model Intercomparison Project, phase 5. In the uppermost stratospheric SSU channel, all individual forcings are detected. Solar and volcanic forcings are also detected in the middle and lower SSU channels, but at these levels the GHG and ozone signals are not detected separately from each other. The uncertainty in the global temperature response due to individual forcings is found to be dominated by observational uncertainty in the upper stratosphere, and the signal‐to‐noise ratio in the middle stratosphere. Estimates of the 11‐year solar cycle amplitude are broadly consistent with reanalysis studies. The temperature response to volcanic eruptions is found to be larger than previously thought in the upper stratosphere (0.4–0.6 K for Mount Pinatubo), although is still dominated by the lower‐stratospheric signal. Finally, the anthropogenic response in the upper stratosphere gives rise to a cooling of ∼2–3 K over the 27‐year period, with two thirds of this attributed to GHGs, and one third to ozone depletion.


Introduction
Over the past half a century, the stratospheric mean state has cooled with time (e.g. Randel et al., 2009). The exact rate of cooling is hard to determine, in part because observations are sparse in both space and time. For instance, different reconstructions of the Stratospheric Sounding Unit (SSU) data reveal widely different temperature evolutions in the stratosphere (Thompson et al., 2012), the reasons for which are still being discussed (Nash and Saunders, 2015). Reanalysis products are equally as diverse, and also show a range of possible temperature evolutions, especially near the stratopause (Mitchell et al., 2015).
A primary factor in the stratospheric cooling is the presence of greenhouse gases (GHGs) in the atmosphere, which cool the stratosphere and warm the troposphere. This cooling is further exacerbated by destruction of ozone from ozone-depleting substances, which results in less ultraviolet (UV) absorption, and so less heat uptake. Superimposed on this cooling trend are distinct warming peaks due to the injection of volcanic aerosol into the stratosphere, and an 11-year solar cycle caused by changes in the total solar irradiance (Robock, 2000;Gray et al., 2010). Other forcings are suggested to impact on the temperature variability, but most likely play a more minor role (e.g. water vapour; Hegglin et al., 2014).
The atmospheric response to these external climate forcings provides a distinct 'fingerprint' of change in the stratosphere (Schwarzkopf and Ramaswamy, 2008). Many studies have attempted to characterise the natural and anthropogenic signals in the stratosphere through regression techniques, but they almost exclusively use regression models which assume noise-free predictors. They also regress time series of the actual forcing onto observations, rather than the response pattern to the forcing onto observations. These alterations to the technique are important for detection and attribution to be accurate (Hegerl and Zwiers, 2011), because the response pattern may not be linearly related to the forcing pattern. A primary reason as to why these studies have not used the most up-to-date techniques is simply because response pattern simulations using stratosphere-resolving climate models have not been widely available until recently, due to high computational demands. Although some such simulations were available as part of the Chemistry Climate Model Validation (CCMVal) activity. Had more relevant climate simulations been available, the techniques which have been extensively employed in the troposphere could have been applied (e.g. Allen and Tett, 1999;Stott et al., 2003).
Recently, more studies have started to employ advanced detection and attribution techniques on stratospheric fields. Gillett et al. (2011) assess the CCMVal simulations to show that anthropogenic change could be observed in the UK Met Office SSU reconstruction which covered the whole stratosphere, but could not detect separately the cooling components due to ozonedepleting substances or GHG forcings. Mitchell et al. (2013) show that the attribution to GHG and natural forcings of vertical temperature profile changes, spanning the troposphere and lower stratosphere, was more easily detected using a stratosphereresolving than a non-stratosphere-resolving version of the same model (also Santer et al., 2013).
The purpose of this study is two-fold: first to understand what components of the various SSU temperature reconstructions are different, and second to characterise, for the first time, the separate solar, volcanic, ozone and GHG components of temperature variability in the mid-upper stratosphere using a modern detection and attribution analysis.

Data and analysis methods
To perform the detection and attribution, regularised optimal fingerprinting (ROF) is employed with a total least-squares regression model (Ribes et al., 2009(Ribes et al., , 2013. The linear regression model takes the form: where y is a vector of observations, x is a matrix of i model response patterns (to a particular forcing), β is a vector of scaling factors and is the internal climate variability (noise). A brief comparison with other regression-based techniques, and an explanation as to what this specific technique adds to previously employed techniques, is given in Mitchell et al. (2015) (their section 4.5).
As is common practice in detection and attribution analyses, estimates of the model noise are calculated from pre-industrial control simulations, and used both for estimating , and for the optimisation (Allen and Tett, 1999;Stott et al., 2003). Because of the high dimensionality of most climate data, the noise covariance matrix, COV( ), is normally estimated by a dimension reduction method, such as projection onto empirical orthogonal functions (EOFs). Problems arise here, because only a finite numbers of EOFs can be used, and the choice of cut-off can be somewhat arbitrary (Hegerl and Zwiers, 2011). Ribes et al. (2009) showed that by regularising the estimate of COV( ), projection onto EOFs were no longer needed. In addition, they show that the ROF method can be more accurate than EOF-based methods. For this reason, the ROF method is used here.
To assess the accuracy and suitability of the methods and data, the following tests are used: (i) tests for signal degeneracy (e.g. Tett et al., 2002;Mitchell et al., 2013); (ii) the residual consistency test (Ribes et al., 2013); (iii) comparison of the power spectra for unforced variability (noise) (e.g. Stott et al., 2004).
Unless stated in the text, all tests are passed.
Simulations taken from the Coupled Model Intercomparison Project, phase 5 (CMIP5; Taylor et al., 2012) are used to assessed the response patterns of stratospheric temperature to individual forcings (x i in Eq. (1)). In addition to the 'historical' forcing simulations (hereafter 'all-forcings') that represent past climate, Lines give a measure of observed noise spectra by removing an independent estimate of the external forcings (the best-guess multi-model mean of all the CMIP5 simulations) from the observations. The light region shows the 5-95%ile range of the spectra of noise obtained from many realisations of the pre-industrial control simulations from CMIP5. The dark region is as the light, but for the inflated noise.
pre-industrial control and single-forced simulations are also required. The pre-industrial control scenario represents climate in the absence of forced variability. It is found here that internal climate variability is poorly reproduced in the stratosphere by models, and an example in the upper stratosphere where this discrepancy is largest, is given in Figure 1. The lines are power spectra of the unforced component of annually averaged, global average temperature from the two SSU reconstructions (i.e. the observations minus a rescaled estimate from the all-forcing simulations). Note that this approach was employed first by Stott et al. (2004) who applied it in the same way as here, but on surface temperatures rather than stratospheric temperatures. As there is such a strong thermodynamical constraint in the stratosphere, the signal-to-noise ratio is far higher than in the troposphere. Therefore, if models inaccurately simulate the forced component of stratospheric variability, there may be increased power in the estimate of observational noise ( Figure 1) than otherwise. The light region gives the 5-95% range of the same quantity but for individual control segments of the same length, used as the noise estimates for the ROF method described earlier. There is a clear underestimation of the noise, with the power of the noise variance being two orders of magnitude different. Tett et al. (2002) also found the noise was underestimated in their study of the lower stratosphere, and here the noise is inflated in the same way (their section 4.1). The dark region represents the inflated noise, which is now in good agreement with observational estimates (lines), lending support that the confidence intervals form the ROF analysis can be calculated accurately, and also that the discrepancy in the modelled noise is the same across all time-scales, and so inflating the noise is a reasonable approach. However, as mentioned before, the 'observed' noise may be overestimated and therefore the modelled noise will have been overinflated. This would mean that confidence intervals in the proceeding analysis are also overestimated, and hence results are conservative. Indeed, if the later analysis is performed without inflating the modelled noise, the confidence intervals are implausibly small. Inflating the noise would also mean that the residual consistency test is easier to pass. However, this test is less relevant for studies of stratospheric temperature where the forced component of variability is so much larger than the noise component (as errors in the noise will be small compared to the signal), and was designed more so for low signal-to-noise situations such as surface temperature fields. (see Ingram, 2006). Table 1. Details of CMIP-5 models; model lid height, number of available ensemble members for different forcing scenarios, SSU channels covered and whether or not they included ODS feedbacks onto ozone in their GHG-only scenarios. Where certain simulations were not explicitly provided by the modelling centres, they were estimated from the simulations that were available, if that was possible (section 2). Coupled chemistry models are in bold.
The two GISS models are the 'physics version 3 (p3)' simulations submitted to CMIP5.
The required single-forcing simulations used here represent, individually, an experiment forced only with GHGs' etc., a solar irradiance forced only experiment, and a volcanic forced only experiment. Detailed documents of the imposed forcings are listed in Taylor et al. (2012) and references therein. However, different modelling groups have interpreted the GHG forcedonly scenario in different ways; all models considered here used forcings recommended by the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR5) for most of the well-mixed GHGs (i.e. CO 2 , CH 4 , N 2 O, etc.), but some models included the feedback on stratospheric ozone from ozone-depleting substances (ODSs), e.g. chlorofluorocarbons (CFCs) and hydrochlorofluorocarbons (HCFCs). These are principally the coupled chemistry models (although the earth system model MIROC-ESM also employs this method), which had the functionality to do this, but crucially this has led to some ambiguity as to what the GHG-forced simulations are showing in the stratosphere. To avoid confusion here, the GHG scenarios with the ODS feedback onto ozone are referred to as 'GHG+O3', whereas those scenarios without the ODS feedback are referred to as the more traditional 'GHG'. (Note that it would not be appropriate to refer to the GHG+O3 simulations as 'anthropogenic', because they do not include the effects of aerosols.) For the modelling groups which did not include the feedback from ODSs, an additional simulation is inferred by subtracting the natural and GHG simulations from the all-forcing simulations. The resulting fields are then used as an additional simulation referred to as 'other anthropogenic' (OA) forcings, which is predominantly ODSs and hence this scenario can be considered as the cooling due to ozone depletion. This inference is needed because these modelling groups did not run an ODSonly simulation. The validity of such an inference for use in detection and attribution analyses has been rigorously tested in the literature (e.g. Hegerl and Zwiers, 2011), and is especially appropriate in the stratosphere because the global mean, annual mean temperatures are known to respond in a linear way.
The CMIP5 models are chosen such that the required individual scenarios can be obtained, as well as the model having sufficient vertical resolution to resolve the stratosphere. Through sensitivity studies, a lid height of 1 hPa is deemed sufficient to cover the lower two SSU weighting functions, and a lid height of 0.1 hPa for the uppermost weighting function. Table 1 gives details of the six models which fit the criteria (but note that CanESM2 is not suitable for analysis with the upper SSU channel).
Observational measurements of stratospheric temperature are taken from two reconstructions of the SSU, the UK Met Office (MO) updated version (Nash and Saunders, 2015) and the National Oceanographic and Atmospheric Administration (NOAA) version 2.0 (Zou et al., 2014). By using more than one observational dataset, a measure of the observational uncertainty can be obtained. The SSU samples over three channels covering the mid-to-upper stratosphere (weighting functions are given in mean temperatures used in the ROF analysis. The regression analysis is applied to each SSU channel and reconstruction independently. Figure 3 shows globally averaged temperature anomalies for the observed MO and NOAA SSU reconstructions (black dashed and solid, respectively), and different modelled scenarios. The left and right panels are identical, other than that different model scenarios are plotted. Thompson et al. (2012) show a similar comparison, but using older versions of the NOAA and MO SSU reconstructions, and without considering the individually forced CMIP5 simulations. Here, both SSU reconstructions are in good agreement only in the lowermost SSU channel (black dashed and solid lines; Figure 3(e,f)). In the upper and middle SSU channels (Figure 3(a,b) and (c,d)) the two SSU reconstructions differ far more in their time evolution. The NOAA reconstruction cools faster than the MO reconstruction for channel 2, but slower for channel 3. The exact reasons for these differences will be due to the retrieval algorithm used, and assumptions made by the MO and NOAA groups when reconstructing the temperatures, which is not the primary focus of this study. A discussion of potential sources of the differences in the reconstructions is given in Nash and Saunders (2015). In this study, using detection and attribution methods, it is possible to say how the forced components of temperature variability differ between the two reconstructions.

Analysis
The CMIP5 all lines in Figure 3(a,c,e) show the all-forcing historical simulations taken from the CMIP5 models as detailed in Table 1. There are only six, due to the required availability of  Table 1. separate forcing simulations, and coverage of the stratosphere. There is a tendency for these simulations to be in better agreement with the MO reconstruction for channel 1 (Figure 3(e)), and with the NOAA reconstruction for channels 2 and 3 (Figure 3(c,a)), although similarities vary across models and much of the disparity is introduced following Mount Pinatubo (second peak). The remaining coloured lines show the individual forcing simulations (see legend of Figure 3). Note that the GHG+O3 simulations are marked with solid lines, and the GHG simulations with dashed lines. For models with the GHG simulations, the corresponding OA simulations are also plotted. It is clear that over all three channels (covering the midupper stratosphere), the models predict that most of the observed cooling trend is from the GHG+O3 forcing (solid lines), rather than either of the natural forcings. However, some of the overall observed trend does come from the solar irradiance, but this is an artefact of the starting and finishing phase of the cycle over this period, rather than any longer-term variability. When the GHG signal (without ODS feedbacks; dashed lines) and OA signal (which is predominately cooling from ozone loss) are compared, it is clear the the GHG dominates the global-mean cooling. The magnitude of the GHG temperature response is about two thirds larger than the ozone temperature response in SSU channels 1 and 2, and about twice as large in SSU channel 3 when the entire 1979-2005 period is considered. However, over the initial period of 1979-1995, the GHG and ozone responses are very similar, and it is only after this time that the OA signal flattens due to the levelling of ozone depletion. The ratio of cooling between the GHG and OA scenarios for the CMIP5 model presented here is consistent with coupled chemistry models presented in Gillett et al. (2011).
The individual natural forcings (Figure 3(a,c,e)) show a particularly distinct response, with the 11-year solar cycle amplitude being larger in the upper SSU channel, but the two volcanic eruption amplitudes being larger in the lower SSU channels. The vertical structure of these responses are in good agreement with previous studies (e.g. Crooks and Gray, 2005) who performed the analysis on reanalysis data from the European Centre for Medium-range Weather Forecasts, although Mitchell et al. (2015) note there is much disagreement between different reanalyses in the upper stratosphere.
To draw any meaningful conclusions from the separate forcing model simulations, it must be determined if they are consistent with observations. As such, the ROF analysis is applied to the multi-modelled average temperature response pattern of each channel, and each SSU reconstruction individually (section 2). The response patterns initially considered are from the solar, volcanic and the GHG+O3 simulations. Figure 4 shows the corresponding scaling factors (β in Eq. (1)), with the 5-95% uncertainty estimate from the regression (Ribes et al., 2009). The  individual GHG+O3, solar and volcanic forcings are all detected, and expand on the results of Gillett et al. (2011). In their analysis, they do not consider the individual solar and volcanic responses.
In channels 1 and 2, the solar and volcanic scaling factors are consistent between the MO and NOAA SSU reconstruction, suggesting that these components are very similar in both observational datasets. This is also true for the anthropogenic scaling factor in channel 1, which is perhaps not surprising considering both SSU reconstructions agree with each other at this level (Figure 3). For channel 2, the anthropogenic scaling factor can account for the difference between the two datasets. For channel 3, the anthropogenic, solar and volcanic forcings are all inconsistent with each other. This suggests that it is more than just the long-term trend that is different between these observational datasets in the upper stratosphere-lower mesosphere.
In the uppermost channel (channel 3), the ROF analysis shows that the volcanic eruptions are underestimated in models, and the amplitude of their response therefore needs to be scaled up to agree with observations. Volcanic aerosol is predominantly prescribed in the CMIP5 models, but the distribution of aerosol and radiative properties are often poorly constrained, and may be the source of the discrepancy. Figure 5 shows the multi-model mean of the CMIP5 models (i.e. from Figure 3), multiplied by the scaling factors ( Figure 4) to reveal the best-guess estimate of temperature trend components in the stratosphere (i.e. the model-predicted temperature variability, consistent with observations). The lighter shades show the best guess and 5-95% confidence estimates from the MO reconstruction; the darker shades show the same but for the NOAA reconstruction.
The largest discrepancy in the observed estimates of the solar and volcanic forcings are in channel 3, with the 11-year solar cycle ranging between 0.3 and 0.8 K per cycle depending on which SSU reconstruction is used in the analysis, and the peak amplitude of Mount Pinatubo ranging between 0.4 and 0.6 K (note that all anomalies are relative to the first four years). The volcanic response is, however, similar in magnitude at all levels, but still much lower than estimates for the lower-stratospheric response (e.g. Mitchell et al., 2015, who report a response of ∼1.5 K for the globally averaged lower stratosphere). The highest discrepancy in the observed estimate of the GHG+O3 component of temperature is in channels 2 and 3, with the cooling trend maximising at −3 K (27 years) −1 when the MO SSU reconstruction is used in the ROF analysis. The uncertainty between the different observational estimates decreases in the lower SSU channels, for all individual forcings, however the 5-95% confidence intervals estimated from the regression increases. This reflects the increasing signal-to-noise ratio as altitude increases in the stratosphere. While GHG+O3 represent a meaningful scenario of anthropogenic change, it is noted that much of this cooling comes from ozone depletion. As mitigation is in play to reduce emissions of ODSs, it is also meaningful to ask by how much has the stratosphere cooled due to anthropogenic influences other than ODSs? (i.e. predominantly due to CO 2 ). To address this, the ROF analysis is performed in exactly the same way, but using the GHG, OA and Natural scenarios, instead of the GHG+O3 scenario. This essentially allows for the cooling due to ODSs to be quantified separately from the cooling due to CO 2 . Over the lowermost channels of the SSU, the GHG and OA fingerprints are not detected separately in either the NOAA or the MO reconstruction (β is consistent with 0). This result was also found in Gillett et al. (2011). However, in the uppermost channel both fingerprints are detected, and are consistent with the MO observation (i.e. β is consistent with 1, and inconsistent with 0; Figure 6), although they are not detected in the NOAA observations. This is because the long-term reconstructed trend at this level is much smaller in the NOAA SSU than in the MO SSU, and so the signal-tonoise level is lower. This suggests that the modelled temperature evolutions of GHG and OA in Figure 3 are in agreement with at least one set of observations. However, because CanESM2 does not cover SSU channel 3, the only model in this ROF analysis of the upper channel is HadGEM2-CC (Table 1). As a sensitivity study, the single-model attribution was also applied to the two lower SSU channels, but both the GHG and OA signals remain undetected.

Discussion
In this study a comprehensive detection and attribution analysis has been undertaken to understand the GHG, ozone, solar and volcanic components of mid-upper stratospheric temperature variability over the period 1979-2005 ( Figure 3). Two different SSU reconstructions of observed stratosphere temperature are used to sample observational uncertainty, and six different CMIP5 models used to sample model uncertainty. In a similar analysis, Gillett et al. (2011) performed a detection and attribution study using coupled chemistry models and the MO SSU reconstruction (the NOAA one was not available at the time). They detected total anthropogenic forcings, but were unable to detected GHG and ozone forcing separately. They also used combined 'natural' forcings, rather than individually considering the solar and volcanic responses (which were unavailable for the models they used). This study therefore complements and expands on Gillett et al. (2011) by (i) detecting the separate GHG forcing in the upper stratosphere, (ii) identifying the solar and volcanic forcings separately, (iii) using a different detection and attribution technique (ROF analysis) and different models and (iv) testing the sensitivity of the analysis to different SSU reconstructions.
It was found that noise estimates from models were lower than noise estimates inferred from observations, the standard deviation of which was an order of magnitude different. The modelled noise estimates were therefore scaled up (e.g. Tett et al., 2002). The estimated observed noise might have been too high (discussion in section 1), in which case the confidence intervals on our analysis are conservative.
The dominant contribution to globally averaged stratospheric cooling is from ODSs and CO 2 . It was found that the cooling trend from the latter was about twice that of the former in the upper stratosphere over the 1979-2005 period. However, they could not be detected individually in the lower two SSU channels. The magnitude of the volcanic response was found to be similar throughout the middle and upper stratosphere, and this was detected in the observations. Most models show a strong decrease in the volcanic response as the SSU channel increases with height, and therefore had to be scaled to be in agreement with observations. Previously, studies have shown low significance in their volcanic responses near the stratopause, but these have predominately been from reanalysis datasets, which can differ widely (Mitchell et al., 2015). The magnitude of the solar-induced response was found to increase with height, which is consistent with previous results using ordinary least-squares regression techniques (e.g. Gray et al., 2010).
Detection and attribution studies of upper-stratospheric temperatures are uncommon, which is disappointing considering the size of the anthropogenic fingerprint in this region. The results presented here provide an interesting insight into just how large the direct response of the climate system to external forcings can be.