Historical global gridded degree‐days: A high‐spatial resolution database of CDD and HDD

Cooling and heating degree‐days (CDD/HDD) are important metrics used in energy studies as a proxy for determining demand and consumption patterns of residential/commercial buildings and work spaces. Driven by the requirements of energy impact modellers, policymakers and building design experts; a new historical high‐spatial resolution, global gridded dataset of degree‐days constructed using various base (threshold) temperatures (Tb) is presented in this study. Derived using sub‐daily temperature from a quality‐controlled reanalysis data product (Global Land Data Assimilation System—GLDAS), the dataset called ‘DegDays_0p25_1970_2018’ includes monthly and annual (i) CDD; (ii) HDD; and (iii) CDD computed using wet‐bulb temperature (CDDwb) at 0.25° × 0.25° gridded resolution, covering 49 years over the period 1970–2018. The Tb used for assembling DegDays_0p25_1970_2018 include 18, 18.3, 22, 23, 24, 25°C for CDD and CDDwb; and 10, 15, 15.5, 16, 17 and 18°C for HDD, respectively. The data of individual indices are made publicly available in the commonly used scientific Network Common Data Form 4 (NetCDF4) and Georeferenced Tagged Image File (GeoTIFF) formats. DegDays_0p25_1970_2018 fills gaps in existing energy indicators’ datasets by being the only high‐resolution historical global gridded time series based on multiple threshold temperatures, thus offering applications in wide‐ranging climate zones and thermal comfort environments. The richness of DegDays_0p25_1970_2018 lies in its flexibility by allowing users to aggregate the degree‐days not only at varying spatial scales (such as administrative levels, national boundaries, economic organizations e.g. OECD; with or without population weights), but also at varying temporal scales (such as seasons), thereby offering climatologists with a potential to examine global teleconnection patterns more discretely.


| INTRODUCTION
Cooling (CDD) and heating (HDD) degree-days are important climatic indicators, commonly used to estimate the climate-dependent cooling and heating demands in buildings respectively (CIBSE, 2006). Degree-days are defined as monthly or annual sum of the difference between a base temperature (T b ) and daily mean outdoor air temperature (T d ), whenever the T d is greater (CDD) or lower (HDD) than T b (ASHRAE, 2009). * The T b is also referred to as 'threshold' temperature or 'set-point' temperature, and it signifies the T d at which the indoor cooling or heating systems do not need to run in order to maintain human comfort levels (CIBSE, 2006;ASHRAE, 2009).
Degree-days have been routinely used by building designers and engineers to estimate indoor cooling/heating-related energy consumption; and by policymakers and researchers for forecasting energy demand, consumption patterns and associated carbon emissions (Lee et al., 2005;Mourshed, 2012). This is partly rooted in its' simplicity but yet a powerful capability to represent a relationship with cooling or heating energy consumptions (Atalla et al., 2018). In addition, degree-days are also widely used as climatic indicators for the assessment of the impact of climate change and variability, such as the CDD and HDD in the energy sector (Moustris et al., 2015), and growing degree-days (GDD) in the agriculture sector (Schlenker and Roberts, 2009;Schauberger et al., 2017). Readers are referred to Spinoni et al. (2018) for a more detailed application of degree-days in various sectoral impact studies.
This study presents a unique (first-ever) high-spatial resolution, global gridded database of three types of degree-days; namely CDD, HDD and a variant of CDD accounting for humidity (CDD wb ). Computed using multiple wide-ranging T b and meteorological variables from a quality-controlled reanalysis data product, the degree-days dataset referred to as 'DegDays_0p25_1970_2018' includes monthly and annual degree-days, spanning the most recent 49 years . The exhaustive dataset is aimed towards multiple end users, such as the research community assessing impacts of climate change on the energy sector (as well as the usage of energy for adapting to climate change), and policymakers examining the historical climate-energy nexus as a proxy for understanding future trends and patterns in energy demands for human comfort.
Rest of the paper is organized as follows. Indices, materials and methods, and the underlying reanalysis data product used in assembling the dataset are discussed in detail in Section 4. Details on data file formats and ways to access the dataset are outlined in Section 4. Finally, Section 5 discusses potential applications and limitations of the dataset, with recommendations for additional work.

METHODS
CDD and HDD are calculated using the commonly used American Society of Heating, Refrigerating, and Air-Conditioning (ASHRAE) method (ASHRAE, 2009), which are defined as follows: where '+' signifies only positive values accumulate over n days in the chosen time period (e.g. months, seasons, year). T d and T b in Equations 1-2 represent the daily mean outdoor air and base (threshold) temperatures, respectively. Degree-days are commonly represented as °C or °F days, depending on the underlying units of T d and T b used in the formulation. Nevertheless, conversion from °C days to °F days (and vice-versa) follows similar rule for unit conversions as in temperature scale. For example, CDD computed using °C units can be converted to °F days by using the following relationship: CDD computed using T d only considers the effect of drybulb temperature. † In regions with high relative humidity (rh) * Definitions of degree-days applying T b differently in calculations also exist (see CIBSE, 2006). This study uses the definition adopted by ASHRAE (2009). (1) (3) CDD• F = 9∕5 * CDD• C † For clarity, the daily mean outdoor air temperature (T d ) referred to in Equations 1-2 is measured using a dry-bulb thermometer. Hence, T d is also referred to as dry-bulb temperature in the remainder of the text. but also at varying temporal scales (such as seasons), thereby offering climatologists with a potential to examine global teleconnection patterns more discretely.

| 3
MISTRY such as the coastal regions in New South Wales (Australia), coastal regions in India (e.g. Kerala) and South-Eastern regions of China and Brazil, CDD can have limited applications in determining energy requirements for space cooling (Guan, 2009). For such regions, CDD wb is recommended as a more suitable indicator than the conventional dry-bulb-derived CDD (Guan, 2009;Krese, 2012).
The methodology to compute CDD wb on monthly and annual timescales varies only in the use of wet-bulb temperature (T wb ) instead of dry-bulb temperature (or simply T d as discussed in Equation 1). Moreover, the base temperatures and the units of CDD wb also remain unchanged, thus making CDD wb easily comparable to CDD. T wb is the minimum temperature to which air can be cooled by evaporative cooling, and as such, contains information about air temperature as well as moisture content. For further details, readers are referred to Stull (2000Stull ( , 2011. Following Stull (2011), average daily T wb is computed utilizing T d and average daily rh as follows: where the arctangent (atan) function returns values in radians. T wb are expressed in the same units (°C) as T d.

| Dataset description
The degree-days included in this study are derived using meteorological variables from Global Land Data Assimilation System (GLDAS) (Rodell et al., 2004). GLDAS is a new generation global high-resolution reanalysis data product developed jointly by the National Aeronautics and Space Administration (NASA), Goddard Space Flight Center (GSFC) and National Centers for Environmental Prediction (NCEP) (Ji et al., 2015).
GLDAS incorporates satellite and ground-based observations, producing optimal fields of land surface states and fluxes in near real time, thus facilitating regular updates of the DegDays_0p25_1970_2018 dataset presented in this study (Section 5.1). Furthermore, GLDAS makes available meteorological and land surface variables that are not commonly available in other reanalysis data products either as consistent long time series, or at a high-spatial resolution. Other reanalysis data products available have either (i) a coarser spatial resolution (e.g. ECMWF-ERA40 and JRA-55, both available from the mid-1950s but at 1.125°) or (ii) a shorter time series (e.g. newly released ECMWF-ERA5 at 0.281° from 1979-present day and NCEP-CFSv2 at 0.205° from 2011-present day).
GLDAS provides a consistent quality-controlled long global gridded time series of a number of key meteorological variables at fine-scale spatio-temporal (0.25° gridded, ‡ 3hourly) resolution. It has been comprehensively evaluated using different regional/global reference datasets in earlier studies, such as Ji et al. (2015) who compare the GLDAS daily surface air temperature at 0.25° gridded resolution with two reference datasets: (a) Daymet data (2002 and 2010) for the conterminous United States at 1-km gridded resolution and (b) global meteorological observations (2000) Zhong et al. (2011) for the analysis of regional environmental conditions and changes. A recent dataset (Mistry, 2019b(Mistry, , 2019c has also incorporated temperature and precipitation data from GLDAS to assemble a comprehensive set of 71 climate extreme indices. Further details on studies implementing GLDAS are available on https ://ldas.gsfc.nasa.gov/gldas/ GLDAS publi catio ns.php. Some known caveats of GLDAS are discussed in Section 5.2.

| MATERIALS AND METHODS
The GLDAS variables used in the present study for computing CDD and HDD include daily (a) near-surface maximum (TX) and minimum (TN) temperatures in °C, and in addition (b) surface relative humidity (rh) in % for computing CDD wb . rh is not directly available from GLDAS, but assembled utilizing surface pressure (P) in hecto-Pascal (hPa) or millibars (mb), and specific humidity (Q) in kg kg −1 , both made available by GLDAS (Equations 6-8).
The variables (TX, TN, P and Q) covering the years 1970-2018 were obtained at their native 3-hourly time steps in the Network Common Data Form 4 (NetCDF4) format § from GLDAS version 2 ‖ (Rodell et al., 2004;Kumar et al., 2006;Peters-Lidard et al., 2007). The daily fields of these variables were assembled using a suite of command line operators from NetCDF Command Operators (NCO ver 4.3.4) ¶ and Climate Data Operators (CDO ver 1.9.0). * A summary of the data variables used, along with the methodology, is provided in Table 1. where VP is the vapour pressure (in hPa) and SVP is the saturation vapour pressure (in hPa).
Equation 7 is referred to as the Magnus equation or the Magnus-Tetens equation, or the August-Roche-Magnus equation (Tetens, 1930;Webb, 1994), and is defined for temperatures above 0°C. Equations 6-8 are discussed in detail in Stull (2000).

| Spatial and Temporal coverage of DegDays_0p25_1970_2018
The spatial extent of GLDAS covers all land north of 60°S latitude. Consequently, the degree-days in DegDays_0p25_1970_2018 are also computed over the corresponding 1,440 (longitude) × 600 (latitude) grid cells spanning 90°N-60°S, at the same 0.25° gridded resolution. Because GLDAS does not record data at or near water bodies, the grid cells in the proximity of water bodies do not report degree-days. Figure 1 (a-c) shows the mean 1970-2018 annual degree-days using T b = 18°C at the native 0.25° gridded resolution.

| DATASET LOCATION AND FORMAT
The degree-days in DegDays_0p25_1970_2018 on monthly and annual timescales spanning years 1970-2018, computed using different base (threshold) temperatures (Table 1), are free available in two widely used data formats; NetCDF-4 (.nc4) and Georeferenced Tagged Image File (GeoTIFF) (.tif). While the former is a scientific data format commonly used by the climate research and modelling community, the latter is popular among users applying geospatial analysis. Both data formats are compatible with a number of software or desktop GIS tools, such as R, Python, MATLAB and QGIS. Additionally, command line tools such as CDO and NCO are recommended for reading, manipulating and analysing NetCDF-4 data format.
Data can be accessed as compressed.tar.bz2 folder containing the individual.nc4 and.tif files from https ://doi. panga ea.de/10.1594/PANGA EA.903123. The files follow the naming convention 'gldas_0p25_deg_DD_base_T_de-gC_1970_2018_timescale.nc4'; wherein 'DD' is the abbreviation of the index (CDD, CDD wb or HDD), degC is the threshold temperature used in the computation of T b , and 'timescale' either 'ann' or 'mon' relating to annual or monthly timescales over which the corresponding degreedays are computed.
Grid cells with missing values are identified by '1.e + 20f'. Further details of the variables/dimensions in the individual netCDF4 files can be examined using either NCO or CDO commands, such as 'ncdump -h netcdf_file_name' or 'cdo sinfo netcdf_file_name', respectively. For creating quick plots and exploratory data analysis of individual netCDF files, open-access data tools such as Panoply (https ://www.giss. nasa.gov/tools/ panop ly/) or NCview (http://meteo ra.ucsd. edu/~pierc e/ncview_home_page.html) are recommended.

| Scope of application
Potential scope and applications of DegDays_0p25_1970_2018 include empirical assessment of energy demands at regional and global scales, implications on efficiency of building heating/cooling systems (such as Heating Ventilation and Air Conditioning systems-HVAC), cluster analysis of grid cells for identification of regions with similar historical spatial-temporal patterns of degree-days.
DegDays_0p25_1970_2018 enables users to apply degree-days using various (a) spatial scales, by aggregating grid cells to regional, national or user-defined boundaries; (b) temporal scales, by aggregating monthly degree-days to seasonal (e.g. winter months) or user-defined periods; and (c) weighting options, * for example population or other socio- (8) rh = (VP∕SVP) × 100 * Readers are referred to (Hanigan et al., 2006)  For instance, linear trends in annual CDD (T b = 24°C) for Mexico (Figure 2) are examined using Mann-Kendall † test using R (R Core Team, 2018) spatialEco package (Evans, 2018).
Trend analysis, as well as other statistical and machine learning approaches (e.g. cluster analysis), can facilitate identification of potential cooling/heating demand patterns in recent decades. † As evident from Figure 2a, the north-west states of Sonora and Sinaloa along the Gulf of California show a significant positive trend (8-12°C days year −1 , at p < 0.05) in CDD. Together with information on population distribution and air conditioning in households, the finescale degree-days available in DegDays_0p25_1970_2018 can assist policy planners to identify potential hot-spots in regional-scale energy demands.
By employing different T b in compiling DegDays_ 0p25_1970_2018, users can also have flexibility in application of degree-days across broader climatic regions (Indraganti and Boussaa, 2017). Recently studies such as Krese et al., (2012) and Lee et al., (2014) have highlighted the sensitivity to the choice of T b both in assessment of energy demands, as well as in shaping policy measures for consumption of residential/ commercial cooling and heating devices.

| Limitations
While the ASHRAE (2009) methodology employed for computing degree-days in this study is one of the commonly adopted approaches in literature, the T d used in the formulation may make the degree-days less applicable for certain applications. For instance, fluctuations of T d around the T b , as well as the asymmetry between T d and diurnal temperature variations are important (Spinoni et al., 2018); both of which are not accounted for fully by the degree-days in DegDays_0p25_1970_2018.
The different methodologies to compute T d using daily and sub-daily TX and TN, and the subsequent potential bias in the derived metric (such as the degree-days in this study) have been well investigated in literature (e.g. (Weiss and Hays, 2005;Ma and Guttorp, 2013;Villarini et al., 2017)). T d computed as the arithmetic mean of TX and TN (Equation 5) was driven by the choice of methodology (ASHRAE, 2009) for computing degree-days (Equations 1-2) in this study. Any potential bias in the monthly and annual degree-days emanating by using arithmetic mean for T d is likely to be negligible as highlighted by Villarini et al. (2017). Moreover, as emphasized by (Weiss and Hays, 2005), the choice of methodology in computing T d becomes more relevant when the outcome metric is based on a nonlinear algorithm, which is not the case in this study.
While the underlying reasons for utilizing GLDAS in this study have been discussed in Section 2.1 in detail; whenever possible, applications of indices (especially in impacts assessment) should incorporate input variables from different underlying data products to account for parameter and † The Mann-Kendall test developed by Mann (1945) and Kendall (1975), and expanded by Dietz and Killeen (1981), is a commonly-used nonparametric test for time trend analysis. † Additional animations of global-gridded annual CDD, CDD wb and HDD (using T b = 18°C) are provided in the online Supporting Information.

F I G U R E 1 Global maps of mean 1970-2018 annual (a) CDD
(b) CDD wb and (c) HDD, as °C days, computed using T b = 18°C, at 0.25° grid-cell level. Country boundaries overlaid to show spatial distribution of degree-days. At a given T d and rh < 100%, T wb will be lower than T d . The CDD wb computed at the same T b (as in CDD) therefore show a lower range of °C days compared to CDD model uncertainty. For instance, certain known limitations of GLDAS data, such as larger uncertainty in the surface air temperature estimates over high mountainous areas are well documented in literature (Ji et al., 2015). Users of the GLDAS-derived data products, such as Mistry, 2019b) and DegDays_0p25_1970_2018 in this study, are recommended to pay attention to the data caveats.
Moreover, as highlighted in Section 3.1, the grid cells in the proximity of water bodies do not report degree-days because of missing data in GLDAS. This can introduce some limitations to users focusing on point locations or regions smaller than the ~27 × 27 km 2 within water bodies (including lakes and rivers), especially in densely populated areas near coastal region. Such instances in DegDays_0p25_1970_2018 are likely to be minimal because the criteria to assign the grid cell as land or water in GLDAS ver-2 data are based on a very high-resolution land-water mask. ‡ Nevertheless, one work around to fill these gaps in the degree-days data would be to use an appropriate interpolation technique using software routines commonly available in R, CDO, etc. (e.g. bilinear, near neighbour, inverse-distance mapping).
Lastly, it is important to emphasize that while CDD and HDD have been widely adopted in literature as indicators of heating and cooling demands, respectively, they should not be construed either as 'perfect' indicators of energy demands for heating and cooling; or as being representative of outdoor thermal comfort (Petri and Caldeira, 2015). Nevertheless, degree-days can be applied as proxy indicators to understand both independent, as well as combined cooling and heating energy requirements (see Petri and Caldeira, 2015 as an example of aggregated CDD + HDD indicator of the total amount of cooling and heating needs).

| Ongoing work and recommendations for work in future
A key motivation of this study is to provide an open-source, high spatio-temporal dataset of degree-days, using T b , updated for the most recent years. Consequently, subject to the availability of the required GLDAS input meteorological variables in the coming years, DegDays_0p25_1970_2018 will be kept updated and made available to the research and end-user communities.
Additionally, another dataset of indices largely relevant for health but also energy sector (called 'HEI_0p25_1970_2018') is currently under preparation (Mistry, 2019a). Some features of HEI_0p25_1970_2018 will for instance be the inclusion of indices accounting for wind as a feel factor, in addition to the T d , T wb and rh used in this study. For instance, two of the indices 'Wind Chill' and 'Apparent Temperature' in HEI_0p25_1970_2018 are aimed to address human discomfort factors in cold and warm thermal environments. Together, both DegDays_0p25_1970_2018 and HEI_0p25_1970_2018, as well as the recently published dataset on climate extreme indices 'CEI_0p25_1970_2016' (Mistry, 2019b(Mistry, , 2019c, are aimed to address the growing needs of the climate impact community, by overcoming the current data scarcity of high-resolution global gridded CEIs in climate science. DegDays_0p25_1970_2018 is currently the only comprehensive set of degree-days computed at a global high-spatial resolution using multiple T b (see Table S1 for a summary of other existing publicly available degree-days' datasets covering selective regions). Nevertheless, it is based on a single global reanalysis dataset (GLDAS), employs one of the known methods in formulating degree-days (ASHRAE, 2009), and may be restrictive in applications due to the selective (although broad range) choice of T b . Datasets of similar energy indicators based on additional observed/reanalysis datasets should be considered for a robust assessment of ‡ Further details on the land-water mask used in GLDAS ver-2 data are provided in the online Supporting Information. F I G U R E 2 CDD using T b = 24°C at 0.25° grid-cell level for Mexico illustrating (a) Trends (°C days/year) and (b) mean 1970-2018 (°C days). White regions in trends indicate Mann-Kendall test not significant at p < 0.05. Regional boundaries overlaid to show spatial patterns of climatological mean and trends