The MaRIUS‐G2G datasets: Grid‐to‐Grid model estimates of flow and soil moisture for Great Britain using observed and climate model driving data

The MaRIUS‐G2G datasets were produced for the MaRIUS (Managing the Risks, Impacts and Uncertainties of drought and water Scarcity) project, using the Grid‐to‐Grid (G2G) national‐scale hydrological model for Great Britain. There are six separate datasets, with each of three combinations of meteorological driving data (two observation‐based and one from climate model ensembles) used to produce two types of outputs (daily time‐series of natural river flow for 260 sites, and monthly 1 × 1 km grids of natural flow and soil moisture in the unsaturated zone). The driving data required by G2G are rainfall and potential evaporation (PE). Two of the datasets from observation‐based driving data use rainfall from CEH‐GEAR (CEH‐Gridded Estimates of Areal Rainfall) with PE from MORECS (Met Office Rainfall and Evaporation Calculation System), and cover the period 1960–2015, while the other two use CEH‐GEAR rainfall with PE derived from 5 km temperature data using the Oudin method, and cover 1891–2015. The two datasets based on driving data (rainfall and PE) from the climate model ensembles cover three periods: Historical Baseline (1900–2006), Near‐Future (2020–2049), and Far‐Future (2070–2099). Data for a 30‐year Baseline period (1975–2004), against which the Near‐Future and Far‐Future periods should be compared, are also available directly. There are 100 members in each ensemble, and the future periods use the RCP8.5 emissions scenario. This paper provides details of the G2G model and the different sets of meteorological driving data, as well as the availability and formatting of the output datasets. It also describes some recent and potential applications of the datasets, which have already been used to support historical and future analyses of low flow and drought characteristics across Britain, and provides some guidance on how the climate model‐driven datasets should (and should not) be used.

(Met Office Rainfall and Evaporation Calculation System), and cover the period 1960-2015, while the other two use CEH-GEAR rainfall with PE derived from 5 km temperature data using the Oudin method, and cover 1891-2015. The two datasets based on driving data (rainfall and PE) from the climate model ensembles cover three periods: Historical Baseline , Near-Future (2020-2049, and Far-Future (2070-2099. Data for a 30-year Baseline period , against which the Near-Future and Far-Future periods should be compared, are also available directly. There are 100 members in each ensemble, and the future periods use the RCP8.5 emissions scenario. This paper provides details of the G2G model and the different sets of meteorological driving data, as well as the availability and formatting of the output datasets. It also describes some recent and potential applications of the datasets, which have already been used to support historical and future analyses of low flow and drought characteristics across Britain, and provides some guidance on how the climate model-driven datasets should (and should not) be used.

K E Y W O R D S
climate change, drought, hydrological modelling, river flow, soil moisture

| INTRODUCTION
MaRIUS (Managing the Risks, Impacts and Uncertainties of drought and water Scarcity) was a UK NERC-funded research project (2014)(2015)(2016)(2017) which developed a riskbased approach to the management of droughts and water scarcity (www.mariusdroughtproject.org). As part of this overall aim, MaRIUS had several scientific and applied objectives, including a wish to promote the uptake of the research through the analysis of real and synthetic droughts, at catchment and national scales, under historical and future climatic conditions. This analysis was undertaken within a probabilistic framework supported by the generation of synthetic time-series of hydrometeorological conditions for historical and future time-periods using a regional climate model. These time-series were propagated through the MaRIUS suite of models (of hydrology, agriculture, ecology, and socio-economic impacts) to analyse the impacts of water scarcity and drought risk on a range of sectors.
The hydrological datasets presented here (Bell et al., 2018a(Bell et al., , 2018b(Bell et al., , 2018c(Bell et al., , 2018d consist of output from one of the MaRIUS suite of models: a nationalscale grid-based hydrological model ). The model output has already been used in MaRIUS to support a range of drought analyses, but could also be used to support other hydrological research.
MaRIUS was one of five projects funded under the NERC UK Droughts & Water Scarcity Programme. The fifth project, ENDOWS (aboutdrought.info), aims to maximize the impact of the Programme for a diverse range of stakeholders. It also provides easier access to the wide range of datasets produced by the Programme, including the hydrological model output.

| DATA PRODUCTION METHODS
The G2G grid-based hydrological model is used with three combinations of meteorological driving data (two observation-based and one using climate model ensembles) to produce two types of outputs for Great Britain (GB; daily time-series for 260 sites and monthly grids). The result is six datasets, collectively termed MaRIUS-G2G ( Figure 1).

| The G2G model
The G2G is a national-scale hydrological model for GB that runs on a 1 × 1 km grid (aligned with the GB national grid), at a 15-min time-step, and is parameterized using digital datasets (e.g., soil types, land-cover) (Bell et al., 2016). The effect of urban and suburban land-cover on runoff and downstream flows is accounted for in the model. G2G has been shown to perform well for a wide range of catchments across Britain (Bell et al., 2016;Formetta et al., 2017) particularly those with more natural flow regimes as it currently does not include the effect of artificial influences such as abstractions and discharges on river flows. G2G is routinely used for high flow applications, for example operational fluvial flood forecasting (Cole and Moore, 2009), pluvial flood forecasting (Speight et al., 2016) and assessments of the effect of projected climate change on peak river flows (Bell et al., 2016;). It has recently been shown to perform well for low flows and for drought identification . The G2G generally uses spatial datasets in preference to parameter identification via calibration, and where model parameters are required (such as the kinematic wave speeds used in lateral routing), nationally applicable values are used. Thus, calibration has not been used to identify separate model parameters for individual catchments.
G2G requires input time-series of precipitation and potential evaporation (PE). The optional snow module  is not used here, thus precipitation input to G2G is assumed to be rain. The spatial data, such as topography and soil data, used to configure G2G are as in Bell et al. (2016).

| G2G driving data: observation-based
The first combination of observation-based meteorological driving data consists of. MORECS (Met Office Rainfall and Evaporation Calculation system) provides observation-based monthly estimates of PE from well-watered short grass using the Penman-Monteith equation (Monteith, 2005). Although the CEH-GEAR rainfall data are available for the period 1890-2015, MORECS PE is not available pre-1960, so the G2G runs with these data only cover the period 1960-2015.
The second combination of observation-based meteorological driving data covers a longer historical period, and consists of 1 km x 1 km grids of daily rainfall (CEH-GEAR), Monthly PE estimates derived from temperature data on a 5 × 5 km grid.
As neither MORECS PE nor all of the climate data required by the Penman-Monteith equation for PE (e.g., long and short wave radiation) are available for earlier historical periods, the temperature-based method of Oudin et al. (2005) has been used to estimate monthly PE from gridded (5 × 5 km) monthly temperature observations available from 1891 (Perry and Hollis, 2011). A set of monthly spatial correction factors has been applied to fit the long-term mean values of Oudin PE to those of MOR-ECS , as there are some differences in seasonal patterns, although annual means are similar (Kay and Davies, 2018). This method of estimating PE allows G2G runs covering the period 1891-2015. Note that rainfall measurements were particularly sparse for periods in the first half of the 20th century, and sometimes identified as missing in CEH-GEAR (figure 6 of Keller et al., 2017); these were spatially infilled .
For both combinations of observation-based driving data, the PE are copied to each of the corresponding 1 × 1 km boxes of the hydrological model grid, and both PE and rainfall are divided equally down to the 15-min model time-step (Bell et al., 2016;. Despite differences in the resolution of the base datasets (40 km for MORECS PE and 5 km for Oudin PE), the spatial variability in the 1 km downscaled PE data is very similar, probably as a consequence of the monthly spatial correction factors used to fit Oudin PE to MORECS PE, together with Climate data input to G2G F I G U R E 1 Schematic summarising the 6 MaRIUS-G2G datasets the relatively low spatial variability in PE estimates across Britain (Robinson et al., 2017, figure 6).
Data for a 30-year Baseline (BS) period , against which the Near-Future (NF) and Far-Future (FF) periods should be compared, are also available (as a subset of the longer Historical BS period). Five alternative sets of NF and FF ensembles were produced, with varying sea surface temperature (SST) warming patterns and magnitudes; only the NF and FF sets based on the median patterns and warming are used here. The future periods use the RCP8.5 emissions scenario (Riahi et al., 2017).
For the historical periods, the PE calculation uses monthly stomatal resistance (rs ) values from MORECS (Hough and Jones, 2008).
For the future periods, two alternative versions of PE are available: one using the same rs values as the historical periods and one using values adjusted to allow for closure of stomata under increased carbon dioxide concentrations. Adjusting r s decreases the projected changes in PE in the future (Rudd and Kay, 2016;Guillod et al., 1997), and moderates projected future decreases in low flows . The datasets provided here only use the adjusted r s PE.
Unlike precipitation, PE is not bias-corrected (see Guillod et al., 1997 for more details).
For use by G2G, the WAH2 precipitation and PE data are re-projected from the 0.22°(~25 km) rotated lat-lon RCM grid to the 1 × 1 km G2G grid. Following re-projection, spatially distributed weights based on standard average annual rainfall patterns are used to provide a nonuniform distribution of precipitation within each RCM box (Bell et al., 2007). Note that the WAH2 RCM assumes 360-day years (twelve 30-day months).
For the first 2 years of the HISTBS, NF and FF simulations the G2G was being "spun up," thus flow estimates from the first 2 years should be ignored in analyses, or used only for follow-on model spin-up.

| G2G outputs
Two types of outputs are produced: daily time-series of river flow for 260 sites across Great Britain, monthly 1 × 1 km grids of flow and soil moisture for Great Britain.
For daily outputs, G2G flow estimates are provided as daily mean natural flows (m 3 /s) for locations corresponding to 260 National River Flow Archive (NRFA; nrfa.ceh.ac.uk) gauging stations ( Figure 2). The majority of the 260 sites were selected to achieve a wide spatial coverage across Britain and are typically the furthest downstream gauge on a river. Other sites were chosen as they were part of a set of gauges used to assess the G2G for low flow events . The gauged sites represent catchments covering approximately 65% of mainland GB, and span a very broad range of soils, relief, climate conditions, and anthropogenic influences.
For monthly outputs, G2G flow estimates are provided as monthly averages of daily mean natural flows (m 3 /s), and G2G soil moisture estimates are provided as monthly averages of daily mean soil moisture in the unsaturated zone (mm water/m soil). The latter can also be interpreted as 1,000 θ, where θ has units of m water/m soil (0 ≤ θ ≤ 1). The G2G model assumes that soil properties, including soil depth, vary spatially across Britain. Soil depth can vary from a few centimetres to several metres, and G2G model soil moisture estimates should be interpreted as depth-integrated values for the whole soil column. Both flow and soil moisture estimates are provided for every non-sea 1 × 1 km grid box, and flows are provided for every 1 km land grid box whether there is a river located in the grid box or not. Figure 3 presents example of 1 × 1 km G2G flow and soil moisture output over Britain.

| DATASET LOCATION AND FORMAT
The six MaRIUS-G2G datasets are available from the Environmental Information Data Centre (EIDC; eidc.ceh.ac.uk), for non-commercial license as well as internal business use. The formats of the daily and monthly data are described below.

| Daily data
The MaRIUS-G2G-MORECS-daily (Bell et al., 2018a) and MaRIUS-G2G-Oudin-daily (Bell et al., 2018c) flow data are stored in csv format files (Table 1), with a single header line. The first column is the date, followed by a column for each catchment. The data follow the standard (365-or 366-day) Gregorian calendar. The time is recorded as the calendar date and the flows are mean values from 09:00 to 09:00 on the following day.
The MaRIUS-G2G-WAH2-daily flow data (Bell et al., 2018e) are stored in csv files, one for each catchment (Table 1), with a single header line. The first column is the date, followed by a column for each ensemble member (1-100). The data have 30-day months due to the "360_day" calendar of the climate model data. The time is recorded as | 67 the calendar date and the flows are mean values for that day (midnight to midnight). A related file provides details of the 260 NRFA gauging stations for which a corresponding 1 × 1 km G2G grid box has been selected, including the station number, river name, location, G2G easting, G2G northing, G2G catchment area, and any data issues (e.g., station closed). All G2G catchment areas are within 8% of the NRFA catchment areas for these catchments.

MORECS-monthly) or "days since 1891-01-01" (for MaR-IUS-G2G-Oudin-monthly).
The MaRIUS-G2G-WAH2-monthly 1 × 1 km gridded data are stored in NetCDF4 files, as one file for each period and each ensemble member ( Table 2). The data have 30-day months due to the "360_day" calendar of the climate model data. The time stamp in the NetCDF files is "days since 1900-01-01".
The monthly data are provided for a 700 × 1,000 km spatial domain on the GB National Grid, from lower left corner (0, 0) to top right (700,000, 1,000,000) (in m). Values for each 1 × 1 km grid box represent the centre of the grid box (i.e., the lower left corner pixel is [500, 500]). G2G values are only provided for land grid boxes and set to missing (−9,999) in the sea. The monthly values are nominally assigned to the first day of the month. Flow, soil moisture, and time are referenced in the files as "flow," "soil," and "time".
To aid the use of the monthly data, two further datasets are provided: 1. Digitally derived catchment area (km 2 ) draining to every 1 × 1 km grid box (Davies and Bell, 2018): MaRIUS_G2G_CatchmentAreaGrid.nc. 2. Estimated locations of NRFA gauging stations on the 1 × 1 km grid and as a csv file.
i The 1 × 1 km grid (MaRIUS_G2G_NRFASta-tionIDGrid.nc) provides the best locations corresponding to 1,285 gauging stations, referenced by their integer NRFA station number (including the 260 stations for which daily river flow time-series are also provided). At these locations, the G2G flow estimates can be compared to observed (gauged) river flows. The (integer) file format sets ID to 0 for land, −9,999 for sea, and the NRFA station number at gauging station locations. ii A file (MaRIUS_NRFAStationIDs.csv) provides details of the 1285 NRFA gauging stations for which a corresponding 1 × 1 km G2G grid box can be selected.
The most appropriate G2G grid cell is identified as the one that is closest in terms of geographical location and catchment area, and additional checks have been undertaken to ensure that the G2G flows are for the correct river tributary, and not for a nearby river with a similar catchment area. Despite these checks, in some cases the derived catchment area draining to the 1 × 1 km river grid cell will be different to the "observed" NRFA catchment area. This problem can particularly affect small catchments for which discretization to a 1 × 1 km grid leads to proportionally larger errors.

| Recent and potential uses of the datasets
The observation-driven G2G flow estimates for recent historical periods (e.g., from MaRIUS-G2G-MORECS-daily) can be compared to observed (gauged) river flows (e.g., from the NRFA). Such performance assessments (e.g., Rudd et al., 2017) show that G2G simulates river flows reasonably well, performing best for catchments with a natural flow regime (little anthropogenic influence) and a flow record that is considered accurate, but less well where the regime is influenced by artificial abstractions/discharges and where the subsurface hydrology is unusually complex. Long-term soil moisture observations are less plentiful than observed flow data, and an evaluation of G2G estimates of soil moisture against observations has not yet been undertaken. However, soil moisture is estimated by G2G as a precursor to estimating runoff and flow, and river flows have been subject to evaluation at hundreds of GB river locations (see above). The increasing availability of remotely sensed soil moisture products, such as COSMOS-UK (cosmos.ceh.ac.uk) and ESA CCI (www.esa-soilmoisture-cc i.org/node/137), should enable an evaluation of G2G soil moisture estimates soon.
The observation-driven G2G flow and soil moisture estimates for longer historical periods (e.g., MaRIUS-G2G-Oudin-monthly) can be used to identify droughts or floods, and investigate their characteristics. For example, Rudd et al. (2017) use the MaRIUS-G2G-Oudin-monthly dataset to show that the threshold level method can be used to identify historic droughts in Britain (1891Britain ( -2015. They then show that there is substantial spatial and temporal variability in drought characteristics, with groundwater- dependent areas typically experiencing more severe droughts, but that there are no consistent changes through time for four 30-year time-slices covering the period 1891-2010. The climate model-driven G2G flow and soil moisture estimates for historical and future periods can be used to investigate potential changes in drought characteristics. For example, AC Rudd (unpublished data) use the MaRIUS-G2G-WAH2-monthly datasets, and show that the severity and intensity of river flow and soil moisture droughts is projected to increase in the future. Droughts in southern and eastern regions are projected to increase in length, and droughts with the largest spatial extent across Britain are projected to increase in area.
The climate model-driven G2G daily river flow estimates for historical and future periods can be used to investigate potential changes in low flow frequency. For example,  use the MaRIUS-G2G-WAH2daily data for four catchments, and show future reductions in low flows which are generally larger in the south of the country and for the later (FF) time-period.
The climate model-driven flow and soil moisture could in theory also be used to investigate potential changes in high flow characteristics. Guillod et al. (1997) show that the linear monthly factors used to correct for monthly biases in WAH2 precipitation estimates lead to improved estimates of rainfall during periods of low rainfall, but they note a "small overestimation of dryness at rare frequencies" (>20 years return periods) for long rainfall accumulation times (2-4 years). Their analysis also indicated that biascorrected WAH2 overestimates high rainfall extremes for 5-to 50-year return periods across GB, with the magnitude of the overestimate varying with location and return period. As MaRIUS-G2G-WAH2 flow and soil moisture datasets are based on bias-corrected WAH2 climate data, it follows that they are also likely to over-estimate high flow and soil moisture extremes when compared to observations, however an analysis of relative change between current and future periods may be unaffected.
The G2G data can also be used in other modelling, for example, to investigate impacts related to agriculture, ecology, or economics.
The MaRIUS-G2G-WAH2-daily data have been used for a catchment-explicit national risk assessment of possible future economic impacts of abstraction restrictions on irrigated agriculture in England and Wales due to drought management decisions (G Salmoral unpublished manuscript). The study evaluates the frequency, severity, and duration of abstraction restrictions following a risk-based analysis of economic losses using rainfall and river flow for the BS, NF, and FF. It shows how, for a set of rainfall and river flow triggers in line with those applied by current environmental regulators, different ranges of economic losses are obtained due to the related changing climate and crop specific conditions.
To assess the ecological impacts of drought on river habitat availability, observation-driven and climate modeldriven G2G flow estimates (MaRIUS-G2G-MORECSmonthly and MaRIUS-G2G-WAH2-monthly) have been used in combination with hydraulic geometry estimates (e.g., mean river depth) for several hundred sites across England and Wales to assess the effect of low flows on loss of river habitat and longitudinal connectivity (Laize et al., 1965). There is also the potential to use the flow estimates to evaluate the resilience of wetlands to droughts. For example, river-fed wetland ecosystem models could use climate model-driven G2G flow estimates (MaRIUS-G2G-WAH2-daily) to evaluate potential hydro-ecological impacts under climate change. Similar work is being undertaken in MaRIUS using WAH2 RCM data as an input to rain-fed wetland ecosystem models (www.mariusdrought project.org).
The observation-driven G2G soil moisture estimates for recent historical periods (MaRIUS-G2G-MORECSmonthly) have been used to support studies of the effect of recent climatic changes on the phenology of arboviruses (viruses transmitted by arthropods), which have a preference for moist or semi-aquatic habitats (C Sanders unpublished data).

| How (and how not) to use the climate model-driven G2G datasets
MaRIUS-G2G-WAH2-monthly or MaRIUS-G2G-WAH2daily data for the BS time slices (HISTBS and BS) can be compared to estimates from G2G driven by observational input data (e.g., from MaRIUS-G2G-Oudin-monthly or MaRIUS-G2G-Oudin-daily), or to observed data (e.g., river flows from the NRFA). However, comparisons in either case should only be made statistically, not by time-series equivalence. For example, WAH2 BS river flows for 1976 will not directly resemble observed reality in 1976; only statistics over long (multi-decadal) periods should be compared (e.g., mean monthly flows, or flow duration curves). Comparison of climate model-driven G2G simulations to an observation-based G2G run will indicate how biases in the WAH2 data affect the results for the BS periods; comparison to observational data themselves will be additionally affected by the accuracy of the G2G model simulations.
G2G outputs for future time slices (NF and FF) can be compared to the BS time slice estimates, NOT to observed time series or G2G simulations with inputs of observed precipitation and PE. Similarly, results from impacts models (e.g., economic, ecological, agricultural) based on G2G outputs for future time slices (NF and FF) should be compared to those from the BS time slice, NOT to observed "real world" impacts.
Each of the 100 historical (HISTBS and BS) and future (NF and FF) ensemble members are plausible realizations of the climate of these periods, and analyses of projected future changes should look at differences between historical and future statistical distributions, rather than between individual ensemble members.
Although each of the historical (HISTBS and BS) and future (NF and FF) ensemble members is numbered from 1 to 100, historical and future ensemble members with the same ensemble number bear no relation to each other and should not be directly compared. Thus, flows from BS1 (Baseline ensemble member 1) should not be directly compared to NF1 (Near-Future ensemble member 1), any more than they should to NF2 or NF35.

OPEN PRACTICES
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available at DOI 10.5285/5f3c1a02-d5c4-4faa-9353-e8b68ce2ace2. Learn more about the Open Practices badges from the Center for Open Science: https://osf.io/tvyxz/wiki.