High‐resolution gridded climate data for Europe based on bias‐corrected EURO‐CORDEX: The ECLIPS dataset

Climate is an important driver of many ecological and social processes; the availability of high‐resolution climate data is thus one of the key presumptions for knowledge‐based decisions. We created a new climate dataset for Europe referred to as ECLIPS (European CLimate Index ProjectionS), which contains gridded data for 80 annual, seasonal and monthly climate variables for two past (1961–1990 and 1991–2010) and five future (2011–2020, 2021–2140, 2041–2060, 2061–2080 and 2081–2100) periods. The future data are based on five regional climate models (RCMs) driven by two greenhouse gas concentration scenarios, RCP 4.5 and RCP 8.5. Two ECLIPS versions were developed: ECLIPS 1.1 with a spatial resolution of 0.11° × 0.11°, which is the resolution of the underlying RCMs, and ECLIPS 2.0 downscaled to the resolution of 30 arcsec employing the delta approach. Both ECLIPS versions were tested against independent station data from the European Climate Assessment (ECA) dataset. Correlations of the ECA and ECLIPS 1.1 data ranged from 0.63 to 0.78, depending on the tested variable. The correlations increased to 0.78–0.93 for ECLIPS 2.0, suggesting substantial improvement of the match with station data due to the downscaling. A large number of climate projections, periods and indices as well as the availability of these data at two different spatial resolutions can support diverse studies across a range of disciplines and thus extend our understanding of climate‐sensitive dynamics of many social and ecological systems.


| INTRODUCTION
The availability of reliable climate data is one of the crucial prerequisites for understanding the effects of climate change on social and natural systems and developing efficient adaptation and mitigation strategies (Overpeck et al., 2011). Several datasets describing past and future climates have been developed and made publicly available. Most recognized observation-based gridded datasets for Europe include the daily E-OBS database (Haylock et al., 2008;Cornes et al., 2018) and the monthly Climate Research Unit (CRU) dataset (Harris et al., 2020). Station-based datasets include, for example, the European Climate Assessment and Dataset (ECA&D) (Klein Tank et al., 2002;Klok and Klein Tank, 2009).
General circulation models (GCMs) with a spatial resolution of 250-600 km have become versatile tools for describing the future climate, providing an important extension of the past climate datasets. Such a resolution, however, is often inadequate for regional and local studies; therefore, different downscaling methods were developed (Trzaska and Schnarr, 2014;Vaittinada Ayar et al., 2016). While statistical downscaling applies empirically derived functions between the coarse resolution GCM output and local weather conditions (Benestad et al., 2008;Pielke and Wilby, 2012;Themeßl et al., 2012;Ahmed et al., 2013;Hewitson et al., 2014), the dynamic downscaling employs high-resolution regional climate models (RCMs) nested within the GCMs. RCMs thus cover a limited spatial domain accounting for mesoscale atmospheric processes and regional biophysical properties. The dynamic downscaling allows for a more realistic representation of the variation in climate attributed to the regional atmospheric physics and land relief features (Giorgi and Mearns, 1999;Wang et al., 2004;Torma et al., 2015).
The simplest delta change (DC) addresses only the climate change signal between the control and future periods (Moreno and Hasenauer, 2016). The DC variables are then used to modify the observations to create a so-called perturbed time series representing the future climate (Hay et al., 2000). The DC method, however, assumes the variance of the reference climate to remain stable in time, an assumption that is not consistent with the RCM outputs. Therefore, more sophisticated methods were proposed, evaluating the complex statistical association between simulated and observed data in the past and correcting the future simulations accordingly (Christensen et al., 2008;Maraun, 2012). For example, the linear-scaling (LS) approach uses monthly correction values based on the average difference between the observed and corresponding simulated values (Lenderink et al., 2007). The distribution-scaling (DS) approach, also referred to as distribution mapping or quantile mapping, uses correction factors calculated based on the comparison of distributions of the RCM data and the corresponding observed weather data. The correction factors are thereafter used to adjust the distribution of the RCM outputs for future periods. DS methods have been found to outperform simpler bias correction methods, particularly by addressing the entire distribution functions (Teutschbein and Seibert, 2012) and preserving the projected changes in climate variability described by the RCMs, making them superior to DC and LS approaches (Switanek et al., 2017).
Although recent studies have suggested the non-stationarity of model biases, generate concerns about the rationale of bias correction (Ehret et al., 2012;Maraun, 2016), bias correction remains an inseparable-component facilitating the use of climate simulations in impact studies. Most recent advances in the field include DS method such as de-trended quantile mapping (Cannon et al., 2015) and scaled distribution mapping (Switanek et al., 2017), aiming to preserve the complexity of the projected climate change.
A large number of RCM projections driven by multiple GCMs from the Coupled Model Intercomparison Project Phase 5 (CMIP5) (Taylor et al., 2012) were released in the framework of the Coordinated Regional Climate Downscaling Experiment (CORDEX) (Giorgi et al., 2009). Subsequently, the EURO-CORDEX framework provided climate model results for Europe (Jacob et al., 2014) at a resolution of 12-50 km. Some of the model results were corrected for bias, aiming at temperature and precipitation variables .
Despite the recent surge in the availability of the dynamically downscaled and bias-corrected RCM outputs, their use remains limited by their spatial resolution (12-50 km) for a wide range of disciplines. For example, species distribution, forest growth and ecosystem modelling studies may require a higher resolution climate than it is currently provided by the RCMs (Mcshea, 2014;Senay and Worner, 2019).
To date, Climate-EU (Wang et al., 2016) and WorldClim (Fick and Hijmans, 2017) are among the most frequently used fine-scale (30 arcsec, equivalent to 1 × 1 km) climate datasets for Europe, supporting diverse biological and social studies. Both datasets provide long-term averages for different climate variables based on (a) the interpolation of observed data for the past climate and (b) GCM data downscaled using | 123 CHAKRABORTY eT Al. the DC method for the future climate. Climate projections in these datasets, however, are based on a mere modification of the high-resolution maps based on the observed climate by climate change signal from GCMs. The interpolated GCM-based climate change signal thus produces unrealistic patterns in topographically complex areas such as mountains and coastlines (Kanada et al., 2008;Rauscher et al., 2010) and has a limited ability to describe extreme weather events (Lorenz and Jacob, 2005;Inatsu and Kimoto, 2009;Flato et al., 2013). Superimposing the low-resolution climate change signal on high-resolution base maps can thus lead to erroneous conclusions concerning climate change patterns and impacts. Moreover, climate projections in Climate-EU and WorldClim do not consider any changes in the distribution of climate variables under the future climates, critically limiting the representation of climate extremes. Climate indices based on daily data (such as the number of frost-free days) are empirically estimated in these datasets based on monthly climatologies by using sigmoidal functions derived from past observations (Wang et al., 2016), not recognizing the limited transferability of past distributions to the future.
The benefits of using RCMs extend well beyond an improved spatial resolution. Whereas GCMs aim to secure a global consistency of the simulated climate, RCMs provide an optimal representation of the regional processes. For example, RCMs were found to systematically reduce the biases and modify global climate change signals, even on scales that are considered well resolved by the driving GCMs (Sørland et al., 2018).
To the best of our knowledge, there is currently no gridded dataset containing bias-corrected RCM results downscaled to the resolution of 30 arcsec. We used the state-of-the-art bias-corrected RCM results with a horizontal resolution of 0.11° × 0.11° (approximately 12 × 12 km over Europe) and downscaled them statistically to the resolution of 30 arcsec, roughly corresponding to a 1 × 1 km resolution. To inform about the key limitations and improvements of our dataset, we test the data against independent station-based observations. We aim to provide a ready-to-use high-resolution climate dataset containing basic climate variables and a suite of climate indices that can support a wide range of scientific disciplines.

| EURO-CORDEX dataset
We used five daily bias-corrected regional climate model results out of nine projections available in the EURO-CORDEX database at the time of this study ( Table 1). The criteria used to select the five climate projections were as follows: (a) representation of all available RCMs and GCMs; (b) two RCMs being nested in the same driving GCM; and (c) one RCM being driven by two different GCMs. Such criteria were adopted to ensure the representativeness of all combinations of RCMs and GCMs available in the EURO-CORDEX database. The models were driven by two Representative Concentration Pathways scenarios RCP4.5 and RCP8.5 (Moss et al., 2010). The simulations were run for the EUR-11 domain with 0.11° × 0.11° horizontal resolution (Giorgi et al., 2009;Jacob et al., 2014). Precipitation, minimum and maximum temperature projections were corrected for bias using a distribution scaling method (DS) by the Swedish Meteorological and Hydrological Institute. The correction factors were calculated by comparing the distribution of daily RCM data with the distribution of corresponding daily observed weather data. Then, the correction factors were used to remove the bias from the RCM outputs for the future. The used procedure corrects all quantiles of the data distribution and thus preserves the expected changes in climate variability projected by the RCMs (Yang et al., 2010).
In the case of precipitation, the DS approach consisted of two steps: (a) correcting the frequency of wet days and (b) correcting the precipitation amounts in wet days to match the observed distribution. First, simulated precipitation and observed daily precipitation were sorted in descending order. Then, a threshold was defined to reach the proportion of wet days in the simulations similar to that in the observations. Days with precipitation above the threshold were considered wet, and vice versa. To better describe the properties of extreme values, the precipitation distribution was divided into two partitions by the 95th percentile, which is modelled by a double gamma distribution. Daily data were modelled using a normal distribution with parameters conditioned by the wet or dry state of the day, considering the inherent dependence between precipitation and temperature. Finally, distribution scaling factors were derived by comparing the distribution of the simulated and observed variables (Yang et al., 2010).
The reference data for the bias correction covered the period 1989-2010 and were extracted from the MESAN downscaling database Landelius et al., 2016). MESAN is an operational mesoscale analysis system of selected meteorological variables (Häggmark et al., 2000). It uses a regional reanalysis using a High-Resolution Limited Area Model (HIRLAM) (Källén, 1996) on a 5-km grid by assimilating precipitation observations from the ECA&D archive (Klok and Klein Tank, 2009) and the high-resolution rain-gauge networks from France and Sweden. Unique scaling sets were defined for each grid point and the season. The variability of the corrected data was found to be well consistent with the original RCM data.

| Climate-EU data
We used Climate-EU (Wang et al., 2016) as a base data with a horizontal resolution of 1 km for the period 1961-1990 to downscale the underlying RCM data to a 30 arcsec resolution; F I G U R E 1 Flow diagram to demonstrate the delta method to downscale the ECLIPS 1.1 dataset to a fine resolution of 30 arcsec (roughly 1 × 1 km), the ECLIPS 2.0, demonstrated with mean annual temperature (MAT) for the period 1961-1990 as an example. The panels are zoomed in to show the mountainous regions of the Alps this roughly equals to a 1-km horizontal resolution, depending on the latitude. Climate-EU uses the parameter-elevation regressions on independent slopes model (PRISM) (Daly et al., 2008) for precipitation and ANUSplin interpolation (Hutchinson, 1989) for temperature data. While PRISM uses physiographic information to capture the geographic variation in precipitation patterns (Daly et al., 2008), ANUSplin supports the interpolation by thin-plate smoothing splines (Hutchinson, 1989). The Climate-EU dataset is available at: https://sites.ualbe rta.ca/~ahama nn/data/clima teeu.html

| Spatial downscaling
We developed here two climatic datasets: ECLIPS 1.1 with the spatial resolution of the underlying RCMs of 0.11° × 0.11° containing basic climate variables and a range of climate indices, and ECLIPS 2.0 derived from ECLIPS 1.1 by downscaling the data to the resolution of 30 arcsec. To downscale the ECLIPS 1.1 data, we used the delta (also called anomaly) downscaling approach (Fréjaville and Curt, 2015;Moreno and Hasenauer, 2016). Our approach takes advantage of the detailed spatial pattern of the high-resolution Climate-EU data, using this information to modify the medium resolution ECLIPS 1.1.
The downscaling was performed in four steps ( Figure 1). First, Climate-EU data were upscaled by a bilinear interpolation to 396 arcsec or (0.11° × 0.11°) resolution (X 1 ) to match the resolution of the ECLIPS 1.1. Second, the upscaled Climate-EU data (X 1 ) and the ECLIPS 1.1 data (Y) were disaggregated using the bilinear interpolation (X 2 and Y 1 ). Third, delta values were calculated as the relative deviations between X 2 and Y 1 . Fourth, ECLIPS 1.1 data were multiplied by the delta grid to produce the high-resolution (30 arcsec) version of ECLIPS 1.1, referred to as ECLIPS 2.0 (Table A2 in Appendix S1).

| Data evaluation
To evaluate the ECLIPS 1.1 and ECLIPS 2.0 datasets, we obtained climate data from 4,637 weather stations distributed across Europe from the European Climate Assessment (ECA) dataset (Klok and Klein Tank, 2009). ECA contains daily time series for nine climate variables: minimum, maximum and mean temperature, precipitation amount, sea-level air pressure, snow depth, sunshine duration, relative humidity and cloud cover for the period 1899-2017. We extracted 3,168 weather stations falling within the geographic extent of our dataset and calculated 22 testing climate variables (Table 1) including mean, minimum and maximum of annual and seasonal temperature and precipitation for the periods 1961-1990 and 1991-2010. These 22 variables represent well the overall dataset and are commonly used in climate impact studies. The testing was based on the comparison of ECA stations with corresponding ECLIPS 1.1 and ECLIPS 2.0 pixels. Mean absolute error (MAE), mean bias error (MBE), root-mean-squared error (RMSE) and the squared Pearson correlation coefficient (R 2 ) were used as performance metrics. The metrics were calculated for the whole of Europe and the European biogeographic regions ( Figure A1 in Appendix S1). The Arctic, Black Sea and Anatolian zones were excluded from the evaluation as there were no ECA stations available.  (Table A1 in Appendix S1). The data cover two past periods 1961-1990 and 1991-2010; and five future periods 2011-2020, 2021-2140, 2041-2060, 2061-2080 and 2081-2100. For the future periods, the calculations were done for five RCMs (Table 1) under RCP 4.5 and RCP 8.5 scenarios. The calculations were done on a rotated grid, which was transformed into a regular grid, preserving the original horizontal resolution. The resulting grids cover the entire Europe (−10.61667 to 38.55834°E; & 34.56226 to 71.18726°N) with the horizontal resolution 0.11° × 0.11° (i.e., 396 arcsec, ca 12 × 12 km). The dataset is available at http://doi.org/10.5281/zenodo.1204351.

| ECLIPS 2.0
The ECLIPS 2.0 dataset was derived from ECLIPS 1.1, aiming to enhance local climate features, while maintaining the continental scale pattern of the original medium resolution data (Figures 2 and 3). The dataset contains the same set of climate projections as ECLIPS 1.1 with a horizontal resolution of 30 arcsec (Table A2 in Appendix S1). The dataset is available as GeoTIFF raster format in the WGS 84 projected reference system (Table A2 in Appendix S1).

| Evaluation
The ECA and ECLIPS 1.1 data showed correlations ranging from 0.63 to 0.78; this coefficient range emerged from the evaluation of 22 representative climate variables listed in Table 2. The correlations increased from 0.78 to 0.93 for the high-resolution ECLIPS 2.0, suggesting significant improvement of the correspondence with the ECA data due to the downscaling. The RMSE and MAE showed a minor decrease between ECLIPS 1.1 and ECLIPS 2.0, from 0.26-1.05 to 0.20-0.95 (RMSE), and from 0.20-0.80 to 0.22-0.74 (MAE), respectively. These effects, however, differed between the biogeographical regions of Europe (Figure 4; Figures A3, A4, A5 in Appendix S1). While a good level of correspondence was found in most of the biogeographic regions, it was not true for autumn and winter precipitation in the Steppic region ( Figure 4). Moreover, winter and autumn precipitation showed higher RMSE and MAE values compared to the remaining variables, especially in the Mediterranean and Steppic regions ( Table 2, Figures A3 and  A4 in Appendix S1).
Most of the evaluated variables, except for MAT, MWMT and Pr_sp, showed minor underestimated in both ECLIPS dataset as compared with the station data ( Figure A5 in Appendix S1). T A B L E 2 Evaluation of original ECLIPS 1.1 and the downscaled ECLIPS 2.0 data against observed weather station data for the past (1961-1990) and current period (1991-2010

| SUMMARY
We developed here two datasets with climatological information for Europe with a high and medium spatial resolution to support climate change impact studies in different disciplines. Both datasets contain gridded data for 80 annual, seasonal and monthly climate variables for Europe covering the period of 1961-2100. The data were made available for two non-overlapping past periods (1961-1990 and 1991-2010) and five future periods (2011-2020, 2021-2140, 2041-2060, 2061-2080 and 2081-2100). The future climate projections were driven by two radiative forcing scenarios, RCP 4.5 and RCP 8.5, addressing a sensible range of future uncertainty. The versatility of these datasets is underscored by the provision of two spatial resolutions, that is 0.11° × 0.11° for the underlying ECLIPS 1.1 and 30 arcsec for ECLIPS 2.0. The introduced ECLIPS 2.0 may offer an important alternative to the Climate-EU (Wang et al., 2016) and WorldClim (Fick and Hijmans, 2017) datasets, being superior in several aspects. We resolved the main limitations of the existing datasets by using bias-corrected RCM projections rather than superimposing the GCM signal on the reference climate maps. ECLIPS 2.0 thus provides a more realistic approximation of the projected climate change due to the higher resolution of the used RCMs. Climate projections in ECLIPS also consider the projected changes in climate variability due to the use of the DS method for bias correction of the underlying climate projections, leading to a more reliable representation of extreme events and related climate indices. Moreover, both datasets showed good correspondence with the observed station data, with significant improvement due to the applied downscaling. The used data format (GeoTIFF rasters) makes the data easily accessible for non-specialist users such as biologists, conservation managers and land-use planners. Still, uncertainties related, for example, to bias correction algorithm and spatial downscaling remain (Jacob et al., 2020), and should be carefully considered while using the data.