Assessing the threat of future megadrought in Iberia

Water scarcity is a critical issue in Iberia but few studies have investigated future drought over the region. The few that do are continental or global studies and do not report the range of possible future projections. In this paper, historical drought and future projections of drought were examined, using observed and downscaled rainfall data from climate models, for the main Iberian international basins (Douro, Tagus and Guadiana). Two drought indices were used, the standardized precipitation index (SPI) and the drought severity index (DSI). However, problems were found in the distribution fitting required for the calculation of SPI which render this index inadequate for assessing droughts in Iberia. The skill of CMIP5 climate models in representing historical drought for the Douro, Tagus and Guadiana is very variable with some not showing enough persistence of dry conditions, and others simulating droughts that are too long and too severe. Nevertheless, the observed range of drought conditions for these basins was encompassed by the ensemble members. Furthermore, we found no relationship between performance in simulating mean large‐scale circulation and model ability to reproduce the historical drought characteristics for the region. All models project an intensification of drought conditions for the three basins. However, some only project small increases, while most project extreme multi‐year droughts by the end of the century. We found that climate model future projections are neither related to their performance in simulating historical drought nor to their large‐scale circulation patterns. This result emphasises the need for climate change impact studies to take into account the whole range of climate model projections in order to provide the best information for robust adaptation. Choosing one or a subset of climate models based on historical performance would not lead to a credible and justified reduction in uncertainty.


Introduction
Water scarcity is a critical issue in Iberia due to the region's large interannual variability of rainfall, combined with an uneven spatial and seasonal distribution (Trigo and DaCamara, 2000;Goodess and Jones, 2002;Rodrigo and Trigo, 2007;González-Hidalgo et al., 2010;Guerreiro et al., 2014). The main international Iberian rivers, the Douro, the Tagus and the Guadiana (see Figure 1), are vital water resources in Portugal and Spain, covering around 40% of the Iberian Peninsula. Their flows are regulated by an international convention between the two countries (Almeida et al., 2009).
Spatial patterns of drought in Iberia vary according to the time-scale studied (Vicente-Serrano, 2006b) and only the most extreme droughts affect the entire peninsula (Vicente-Serrano, 2006a). In Spain, 20th century droughts have exposed the divergence between water demand and water resources and led to an increase in reservoirs and water transfers (Hispagua, 2013). The worst droughts in Spain were during 1941Spain were during /1945Spain were during , 1979Spain were during /1983Spain were during , and 1990Spain were during /1995 and affected most of the country, with reductions in rainfall from 23 to 30% resulting in runoff reductions of more than 40%, and reaching 70% in the Guadiana and the Guadalquivir basins (JuntaDeAndalucia, 2013). Some of these droughts were also felt in Portugal where the most significant recent droughts were in 1943/46, 1965, 1976, 1980/1981, 1991/1992, 1994/1995, 1998/1999, and 2004/2006. The 2004/2006 drought had the largest spatial extent (all territory) and was the most intense when considering the number of consecutive months in severe or extreme drought. Regions south of the Tagus are more prone to drought and in the last decades of the 20th century there has been an increase in the frequency of drought, especially in the months from February to April (IPMA, 2013).
Most studies project decreases in rainfall in Iberia with climate change (Ekström et al., 2007;Hingray et al., 2007;Kilsby et al., 2007;Guerreiro et al., 2016), with geographical and seasonal variations, but few studies have examined potential changes to future drought. Droughts can be defined in terms of meteorological, hydrological, agricultural or socio-economic conditions and consequently a large number of different drought indices exist (Lloyd-Hughes and Saunders, 2002). Zargar et al. (2011) concluded that despite the popularity of some indices, the number of presently used indices (more than one hundred) reflects the different requirements of users. The most commonly used drought indices are the PDSI -Palmer drought severity index and the SPI -standardized precipitation index (Vicente-Serrano et al., 2010). PDSI is based on a soil water balance equation but requires large amounts of data and is strongly influenced by the calibration period (Vicente-Serrano et al., 2010). Difficulties in quantifying evapotranspiration have promoted the use of simple drought indices based solely on precipitation that compare well to complex hydrological indices (Lloyd-Hughes and Saunders, 2002). SPI is based on a probabilistic approach and is comparable in time and space to more complex indices, for example Lloyd-Hughes and Saunders (2002) showed that SPI-12 is well related to PDSI for Europe. However, SPI like any other rainfall-only-based index, assumes that droughts are controlled only by the temporal variability of precipitation and are therefore insensitive to future increases in drought conditions due to increased temperature, and consequently potential evapotranspiration (Vicente-Serrano et al., 2010). Furthermore, problems in applying SPI, in particular choosing which distribution to use, are widely known (Guttman, 1999;Lloyd-Hughes and Saunders, 2002;Sienz et al., 2012;Soľáková et al., 2014). Nonetheless, it is common to see studies where this is not taken into account, and most papers about future drought in Iberia (Jenkins and Warren, 2014;Pulquério et al., 2014) do not present information on the suitability of their chosen distribution to their data. Jenkins and Warren (2014) applied SPI (with a 6-and a 12-month rainfall aggregation; SPI-6 and SPI-12) to observed (1955-2003) and projected (2003-2050) precipitation for several countries around the world, including Portugal and Spain, using a simple climate model emulator to produce monthly precipitation. They suggested that drought magnitude (assessed using SPI-12) could increase from 96% to 341% in Spain, while in Portugal there was no agreement as to the direction of change.
Assessment of the validity of SPI to their data was not performed.
Heinrich and Gobiet (2012) used a subset of eight regional climate model (RCM) simulations driven by five different general circulation models (GCMs) to examine changes in European drought from 1961-1990 to 2021-2050. They performed a daily quantile mapping of RCM output on precipitation observations using the E-OBS dataset (Haylock et al., 2008). Using different time-scales (SPI-3, SPI-6, SPI-12, SPI-18 and SPI-24) they found that the most pronounced increases in drought conditions (frequency, length, distance, magnitude, and area) over Europe were for the Iberian Peninsula. Using SPI-18, mean dry event frequency was projected to increase by 19.8%, drought length by 26.5%, magnitude by 62.6%, and area affected by 51.1%. However, they presented only the multi-model mean results and gave no information on the range of changes projected by different models.
There is no standard method for the use of multi-model ensembles in impact studies. The most common way of presenting data is by using multi-model means; however, they are hard to interpret and defend  and the results can be physically implausible (Knutti, 2010). Selecting a subset (or sometimes even just one) of the 'best performing' models is another common practice despite the fact that there is no agreement on how to rank models and choosing different metrics leads to different models being selected . Selecting models can therefore lead to an unjustifiable reduction of uncertainty and consequently poor adaptation measures. Sometimes, a probability density function (PDF) of change is calculated from the multi-model ensemble. Although the appeal of being able to assign probabilities to different future outcomes is understandable, the models in present multi-model ensembles like the Coupled Model Intercomparison Project Phase 5 -CMIP5 (WCRP, 2012) are not independent, which is a requirement for the creation of a PDF. Therefore, models might agree due to shared process representation and/or calibration on the same datasets and not because that outcome is more likely . Furthermore, the sampling of models in this ensemble is not random or systematic (Knutti, 2010), which are also statistical assumptions necessary for the creation of a PDF.
A different paradigm is therefore necessary. We believe the whole range of climate model outputs should be taken into consideration and, if a subset of models has to be chosen for reasons of practicality, it should be representative of the full range of possible futures for the chosen scenario. This allows different adaptation strategies for different possible futures to be tested, which is a necessary requirement to robust adaptation to climate change. Therefore, in this study we assess all the possible futures projected by CMIP5 for their higher emission scenario (RCP8.5) and selected 15 model runs that cover the whole CMIP5 range of possible rainfall and temperature changes for the Iberian region. With these model outputs we studied future rainfall changes (Guerreiro et al., 2016), droughts and discharges for the three main international Iberian basins.
This paper presents the projections of drought for the Douro, the Tagus and the Guadiana. Historical drought conditions projected by the 15 CMIP5 models are compared with observations and transient projections  are assessed using drought indices and the percentage of basin in severe and extreme drought. The paper is organized as follows. In Section 2 the datasets used are presented, followed by the methodology in Section 3. Section 4 shows the results and discussion, both for historical and transient drought conditions. Conclusions are presented in Section 5.

Data
For rainfall observations, we used the IB02; a 0.2 ∘ resolution gridded daily rainfall dataset for Iberia. IB02 covers the time period from 1950 to 2003, and was calculated based on a dense network of quality-controlled gauges using ordinary kriging. It is available as two datasets: PT02 for Portugal (Belo-Pereira et al., 2011) available at http://www.ipma.pt/pt/produtoseservicos/index.jsp? page=dataset.pt02.xml and Spain02 v2 for Spain (Herrera et al., 2012) available at http://www.meteo.unican.es/en/ datasets/spain02.
Rainfall projections for 1961-2100 over the trans-national basins of the Douro, Tagus and Guadiana were obtained for 64 GCM runs from CMIP5 (the Coupled Model Intercomparison Project Phase 5 -http:// cmip-pcmdi.llnl.gov/cmip5/) for RCP8.5. Figure 2 shows the mean and coefficient of variation of annual rainfall for the two datasets (IB02 and CMIP5 runs). Most models show a wet bias for this area and a large majority does not show enough interannual variability; nonetheless the CMIP5 ensemble does cover the observed range.
As explained in the introduction, multi-model means can be physically unrealistic. They are not appropriate to the study of drought as, for example, averaging between a model showing a wetter future and another showing a drier future could indicate no change in drought conditions, which is a future not predicted by any of the models. Also projections of multi-model means hide the variability of possible futures and are therefore not appropriate for adaptation purposes. Therefore, a sub-set of model runs was chosen in order to fully represent the spread of CMIP5 (RCP8.5) projections for seasonal changes in rainfall and temperature for the region. Temperature and seasonal data were considered because this paper is part of a wider study that also included seasonal analysis of rainfall changes (Guerreiro et al., 2016) and hydrological modelling. The selection of representative runs was made by assessing changes in mean temperature and rainfall for different seasons and different GCM grid cells and selecting enough model runs that covered the full uncertainty space. This resulted in the selection of 15 model runs. Table 1 shows information about the selected GCM runs and Guerreiro et al. (2016) give more detail on climate model selection. Since the selection of projections was performed with the aim of including the whole range of possible futures and did not consider the source of the projection, some GCMs have two runs selected -same GCM but different initial conditions. It should be noted that despite the full range of mean temperature and mean rainfall changes being covered, this does not necessarily imply that the full uncertainty in drought projections is covered, since the timing of rainfall (i.e. the sequence of 12-month accumulated rainfall) was not taken into consideration. Nevertheless, our methodology should provide a good approximation of the full range of future drought projections. For the period 1961-2003, empirical cumulative distribution functions were calculated for rainfall from each of the 15 selected GCM runs and for IB02 (spatially aggregated to the GCM grid cell scale). For each GCM's simulated monthly rainfall output, a corresponding observed value was found by matching quantiles for corresponding months. This produced a dataset with the temporal and spatial resolution of IB02. The unique inter-annual variability represented by each model run was maintained, but the whole distribution of rainfall was corrected by the quantile mapping technique. Full details of the selection, downscaling and bias-correction of these projections using a variant of empirical quantile mapping can be found in Guerreiro et al. (2016).

Standardized precipitation index
SPI was initially defined by McKee et al. (1993) by fitting a Gamma distribution to monthly rainfall aggregated to the desired temporal scale (e.g. 12-month aggregated monthly rainfall for the calculation of SPI-12) and then transforming it into a standard normal. Drought was defined as a period of continuously negative SPI; where SPI reached a value equal to or less than −1.
In this paper, two distributions were fitted to gridded rainfall for the period 1961-1990 for the calculation of SPI-12: the gamma distribution (McKee et al., 1993) and the Pearson III distribution (Guttman, 1999;Vicente-Serrano, 2006b). To assess distributional fit, the modified mean square error (MSE), as defined by Papalexiou et al. (2012), was used. Overall, Pearson III showed smaller MSEs than those of Gamma; however, it also showed values of minus infinity. This occurred due to a poor fit that resulted in the Pearson III distribution having a probability of zero for the lowest observed value of rainfall. The same problem occurred when using the downscaled GCM rainfall and for future projections there were minus infinity SPI values on average in more than half the grid points for all three basins. Therefore, the Pearson III distribution was deemed not suitable for use.
When using the Gamma distribution to calculate SPI-12 for the 15 downscaled and bias-corrected climate models for the period 1961-2003, further problems arise with SPI values reaching −6 for one of the models (meaning a probability of occurrence of 9.8 × 10 −10 ). Even during the 1961-1990 period, which is the reference period used to fit the distribution, several models show SPI-12 below −3 (meaning less than 0.0013 probability of occurrence) in all basins. Furthermore, differences in drought severity between SPI-12 Gamma and SPI-12 Pearson III can reach up to 70%. These results are in contrast to those of Guttman (1999) who found very little difference in the characteristics of SPI in the United States when using different probability distributions. However, our results are similar to the differences of up to 54% in SPI-12 severity found by Soľáková et al. (2014) for Rome and differences between 20 and 50% found by Sienz et al. (2012) for different areas of the world.
Also, some models seem to have periods of negative and positive SPI-12 that are too long, meaning the dry or wet conditions are too persistent (compared to observations). Therefore, when using SPI-12, errors related to distribution fitting will be added to biases from the climate models and consequently SPI-12 was considered inadequate for the assessment of future drought conditions in Iberia and 5028 S. B. GUERREIRO et al.  was not used for the remaining analysis. Detailed information about the application of SPI to this data and its unsuitability are shown in the Appendix S1, Supporting information.

Drought severity index
The drought severity index (DSI) concept was originally proposed by Bryant et al. (1992) and is based on cumulative monthly precipitation anomalies. The index was defined by Phillips and McGregor (1998) and, like the SPI, can be calculated for different time scales. For example, DSI-3 is calculated using the following procedure: • If the rainfall anomaly in month t is negative (i.e. rainfall is below the mean for that month) and rainfall in the three previous months is lower than its three monthly mean, a drought sequence is initiated in month t; • DSI-3 for month t is then a positive value equal to the precipitation anomaly in month t. • The DSI for the following month (t + 1) is the rainfall anomaly of month t plus the rainfall anomaly of month t + 1, but only if the three monthly mean total for the months t − 1, t and t + 1 has not been exceeded. When this mean is exceeded the drought sequence terminates and DSI-3 is assigned a value of zero. • To standardize the index, the absolute deficit (in mm) is divided by mean annual rainfall and multiplied by 100. Therefore, the final index value expresses the accumulated precipitation deficit as a percentage of the mean annual rainfall.
DSI-6, DSI-12 (or any other DSI) is calculated in the same way but using six months, 12 months (or any other time scale required) instead of 3 months. DSI-3 is able to promptly detect rainfall deficits but tends to terminate the drought sequence too easily. Longer scale DSIs, like DSI-6 or DSI-12, show a greater resistance to the initiation and termination of a drought sequence (Phillips and McGregor, 1998).
Monthly basin rainfall was calculated by adding the monthly rainfall from the grid points inside each basin. For the historical period, DSI-12 plots were produced for all basins using the IB02 dataset and the 15 bias-corrected CMIP5 models in order to compare the models' behaviour with observations. For the 15 models, DSI-12 plots were also produced for the period 1961-2100 to assess future drought conditions. The percentage of basin in severe (DSI > 50%) and extreme (DSI > 100%) drought was also calculated for the three basins using the gridded rainfall from the 15 models. Plots with 30-year rolling means of the percentage of basin in severe and extreme drought are also shown in the results section.

Historical period
To investigate the performance of the CMIP5 models in representing historical drought we compared DSI-12 values calculated using the gridded observational dataset (IB02) and the 15 downscaled and bias-corrected climate  1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 Time DSI−12 (%) Historical DSI−12 for the Guadiana Basin models, for the 1961-2003 period and for the three basins. Figure 3 provides a synthesis of the characteristics of severe drought (defined as DSI-12 = 50%) while Figure 4 shows the DSI-12 time series for the Guadiana (results for the Douro and Tagus are presented in the Figures S.5 and S.6 of Appendix S1). We found that all models underestimate the frequency of severe drought in the Douro (the most northern basin). For the Tagus this frequency is well represented, while for the Guadiana (the most southern basin) all but one model (model 15) overestimates the frequency of severe drought. Model 15 never reaches a DSI-12 of 50% for the Guadiana, and only does so once for the Douro and the Tagus showing a lack of persistence of dry conditions. Mean duration and maximum severity of severe drought is better represented by the models, with the observations being encompassed by the climate model ensemble for all three basins. Nevertheless, most models (14 out of 15) overestimate both the duration and severity of drought for the Guadiana. Models 3, 6, 7 and 8 show the worst representation of historical droughts in the three basins, as does model 5 for the Douro.
The interpretation of the differences between models and observations is hindered by the impossibility of separating model errors from simulated natural variability without having either several runs of the same model with different initial conditions simulating different low-frequency climate fluctuations (e.g. Deser et al. (2012)) or much longer observed records. Therefore, we cannot reliably say whether the lack of agreement between observations and model runs is due to errors within the climate models or due to a lack of synchronicity in the timing or the representation of cycles of natural variability. Rainfall in Iberia, and therefore drought, is heavily influenced by the North Atlantic extratropical storm track. Zappa et al. (2013) divided the CMIP5 models into three groups according to their ability to capture the positioning of North Atlantic extra-tropical cyclones in winter: the majority of models have a storm track which is 'too zonal', meaning it does not tilt north enough in its European end; other models produce a storm track which is too far south, while a third group of models show only small biases. This last group includes CMIP5 models 7, 8, 11, 12 and 15. Zappa et al. (2013) also noted that the CMIP3 models had the same problem in terms of capturing the meridional tilt of the North Atlantic storm track, but the mean bias of the CMIP5 models is roughly half that of the CMIP3 bias. McSweeney et al. (2015) assessed CMIP5 models' performance over Europe by looking at: the large-scale climatological flow at 850 hPa; the position and frequency of storm tracks; and the annual cycles of surface temperature and precipitation. They classified model performance into four classes: satisfactory, biases, significant biases and implausible. All models used in the present study were classified as satisfactory in terms of storm track frequency and position, except for models 10 and 13 which were not included in the McSweeney et al. (2015) study. For the other parameters, most were considered satisfactory with the exception of model 3, which had 'biases' for the annual cycles of surface temperature and precipitation, and models 1, 2 and 3, which had 'biases' for the large-scale climatological flow. This means that some of the models that showed the worst performance in terms of the reproduction of severity and duration of historical drought, were classified as the 'best performing' models by both studies (models 7 and 8), as was the model showing the worst performance for historical drought frequency (model 15). We did not find a relation between the mean large-scale performance of the models, as assessed by Zappa et al. (2013) and McSweeney et al. (2015), and their ability to reproduce severe drought characteristics in these three basins.

Transient analysis
To understand the range of future drought projections from the 15 models, we plotted each basin DSI-12 time-series (see Figure 5 for the Guadiana and for the Douro and Tagus see Figures S.7 and S.8 of Appendix S1). The results for the different basins are similar but the spread of projections from the different climate model runs is very wide. Nonetheless, all models show an increase Table 2. Summary of DSI-12 results: maximum values of DSI-12 and the maximum percentage of each basin in extreme drought (DSI-12 > 100%) for the historical period (1961-2003: hist)  in drought conditions over 1961-2100. Some project small increases (models 4, 5, 7, 10 and 15, for the Douro and Tagus and models 7, 10 and 15 for the Guadiana) while most models (1, 2, 3, 8, 9, 11, 13 and 14) project multi-year droughts reaching up to a DSI-12 of 800% (i.e. 8 years of mean annual rainfall missing). Plots of 30-year rolling means of the percentage of the basin in severe and in extreme drought, using DSI-12, are shown in Figure 6 for the Guadiana (Douro and Tagus are shown in Figures S.9 and S.10 of Appendix S1). Again, all models project increases in both severe and extreme drought, but the range of projections is wide. Models 4, 7, 10 and 15 show a small increase in the mean area of the basin that will be in drought. However, by the end of the century, seven of the fifteen models (1, 2, 8, 9, 11, 13 and 14) project that, on average, more than half the basin will experience severe or extreme drought. Model 8 projects the worst scenario with the mean of the area of each basin experiencing extreme drought conditions reaching 80% by the end of the century. The percentages of basin in drought for the Guadiana are slightly higher than in the northern two basins. Table 2 details the most extreme droughts projected by each model for each basin (maximum values of DSI-12) and the maximum 30-year rolling mean of the percentage of basin in extreme drought (DSI-12 > 100%) per basin and per climate model. These are shown for the period 1961-2100 (all) and for the period 1961-2003 (hist).
We did not find a relation between historical drought performance and future projections. The most extreme drought projections are from model 8, which showed droughts that were too severe in the historical period for all basins, and model 9 which showed droughts that were too severe in the historical period for the Tagus and the Guadiana. However, extreme future drought conditions are also projected by other models, including models 1 and 2 which were found to perform reasonably well in all basins during the historical period. These two models project extreme drought conditions reaching around 500% accumulated precipitation deficits when using DSI-12.
Furthermore, we also found no relation between model performance in simulating historical large-scale circulation and drought projections. Both the model projecting the biggest increases in drought (model 8) and the model projecting the smallest increases (model 15) had the smallest large-scale circulation biases for the historical period. The five models classified as best performing by Zappa et al. (2013) and McSweeney et al. (2015) actually span the whole range of drought projections.
Comparisons with previous studies are hindered by the use of different geographical areas and different indices. However, our results are similar to Jenkins and Warren (2014) for Spain (drought magnitude increases between 96 and 341% using SPI-12) and encompass the multi-model mean results of Heinrich and Gobiet (2012) for Iberia (63% increase in magnitude of drought and 51% increase in drought area, using SPI-18).

Conclusions
The skill of the chosen CMIP5 models in representing historical drought for the Douro, Tagus and Guadiana is very varied: while some do not show enough persistence of dry conditions, others show droughts that are too long and too severe. Nevertheless, the ensemble generally encompasses the range of observed drought conditions for these basins.
We found no relationship between performance of the models in reproducing the mean large-scale circulation (i.e. position and frequency of storm tracks, large scale climatological flow and European annual cycles of surface temperature and precipitation), as assessed by Zappa et al. (2013) and McSweeney et al. (2015), and their ability to reproduce historical drought characteristics for the region.
All models project an intensification of drought conditions for the Douro, Tagus and Guadiana. However, they strongly disagree on the magnitude of these changes; some project small increases in drought conditions but most project multi-year droughts reaching up to a DSI-12 of 800% (i.e. 8 years of mean annual rainfall missing), and an average of 80% of each basin area experiencing extreme drought, by the end of the century.
There is no relation between historical drought performance and future projections. Despite the fact that the two models projecting the most severe future drought conditions overestimated historical drought, extreme future droughts were also projected by models that simulated historical droughts with similar conditions to the observations.
There is also no relation between historical large-scale circulation performance and drought projections. The five models classified as best performing by Zappa et al. (2013) and McSweeney et al. (2015) are shown to span the whole range of drought projections and include the model that projected the most severe future droughts and the model that projected the smallest changes.
Despite the widespread use of SPI as a drought index, we found it was not appropriate to use it for the semi-arid rainfall regimes of the Douro, Tagus and Guadiana basins. This is because problems related to the fit of a chosen distribution to the data can lead to minus infinity values (Pearson III distribution) or unrealistic small values (Gamma distribution) and the differences between SPI-12 values calculated with different distributions can reach up to 70%.
In this paper we have presented a wide spread of possible future drought conditions for the three main international river basins in Iberia. We have shown that a climate model's future projections are neither related to their performance in simulating historical drought nor to their performance in simulating historical mean large-scale circulation. Therefore, discarding climate model projections based on historical performance could lead to an unjustifiable reduction of uncertainty. Instead, we have shown that climate change impact studies should take into account the whole range of model projections in order to provide the best possible information for robust adaptation.