Reconstructed monthly river flows for Irish catchments 1766–2016

Abstract A 250‐year (1766–2016) archive of reconstructed river flows is presented for 51 catchments across Ireland. By leveraging meteorological data rescue efforts with gridded precipitation and temperature reconstructions, we develop monthly river flow reconstructions using the GR2M hydrological model and an Artificial Neural Network. Uncertainties in reconstructed flows associated with hydrological model structure and parameters are quantified. Reconstructions are evaluated by comparison with those derived from quality assured long‐term precipitation series for the period 1850–2000. Assessment of the reconstruction performance across all 51 catchments using metrics of MAE (9.3 mm/month; 13.3%), RMSE (12.6 mm/month; 18.0%) and mean bias (−1.16 mm/month; −1.7%), indicates good skill. Notable years with highest/lowest annual mean flows across all catchments were 1877/1855. Winter 2015/16 had the highest seasonal mean flows and summer 1826 the lowest, whereas autumn 1933 had notable low flows across most catchments. The reconstructed database will enable assessment of catchment specific responses to varying climatic conditions and extremes on annual, seasonal and monthly timescales.


| INTRODUCTION
Continuous, long-term river flow records are needed for evaluations of hydro-climatic variability and change, historical extremes and catchment processes (Machiwal and Jha, 2006). They also underpin water management and provide a means of stress-testing existing and planned systems to a range of variability and past droughts . Unfortunately, there are few continuous and homogeneous river flow records spanning a century or more (Mediero et al., 2015). Instead, available records are often impacted by confounding factors or large amounts of missing data .
Various techniques exist for extending observations by reconstructing river flows. This typically involves forcing statistical or conceptual hydrological models with longterm precipitation and temperature/evapotranspiration data provided by reanalysis (e.g. Kuentz et al., 2013;Brigode et al., 2016) or long-term historical data sets (e.g. Jones, 1984;Spraggs et al., 2015;Crooks and Kay, 2015;Rudd et al., 2017;Hanel et al., 2018;Smith et al., 2019;Noone and Murphy, 2020). Others have leveraged international data rescue initiatives to generate gridded historical weather variables (Casty et al., 2007). Whilst these kinds of information have been used to reconstruct river flows in parts of Europe (e.g. Moravec et al., 2019), they have yet to be deployed in the British-Irish Isles.
Here, we develop a data set of reconstructed monthly river flows for 51 catchments across the island of Ireland back to 1766. This was achieved using gridded historical meteorological data, bias corrected to contemporary observations in each catchment. These data provided the input to a conceptual hydrological model and an artificial neural network (ANN), both of which were trained and verified using river flow observations. In addition, we use recently rescued precipitation data to evaluate model reconstructions for selected catchments during the period 1850-2010. The following sections describe the catchments, data sets and modelling approaches, before we present the derived reconstructions.

| Catchments and data
Reconstructions were generated for 51 catchments (Table 1 and Figure 1) that are relatively free from artificial influences (following criteria applied by Murphy et al. (2013): they have at least 25 years of record and acceptable quality rating curves). The catchments are broadly representative of hydroclimatological conditions across the island, with a recognized under-representation of upland catchments along coastal margins . Urban extent averages <2% of the combined area of all catchments, which individually vary in size between 10 and 2,418 km 2 . However, given the extent of arterial drainage works undertaken in Ireland, it is unavoidable that some catchments have been impacted by such activities. We note which catchments are known to be affected by arterial drainage in Table 1.
Daily flow series were obtained from the Office of Public Works (OPW; http://water level.ie/) and the Environmental Protection Agency (http://www.epa.ie/hydro net/) and then aggregated to monthly mean flows. The average amount of missing data was <6% across the 51 catchments, with a notable outlier of 31% being the Blackwater at Duarrigle (ID: 18050). Of the total missing days (11% overall), the majority have been previously infilled using rainfall-runoff modelling techniques (Murphy et al., 2013). As the remaining missing data only represented 1% of the total, they were not repopulated.
We use gridded (1 × 1 km) monthly precipitation and temperature series (Walsh, 2012) area-averaged for each catchment, alongside concurrent river flow records, to calibrate the hydrological models (see below). Monthly potential evapotranspiration (PET) was estimated from air temperature and radiation following the method of Oudin et al. (2005). We favoured this over more physically based methods (e.g. Penman-Monteith), because the latter have greater data requirements (e.g. wind speed, humidity) that cannot be met over the full duration of the reconstruction period. Instead, the sensitivity of monthly river flow simulations to PET estimation methods was tested for periods with complete variable sets. Six PET estimation methods (Penman-Monteith Penman (1948), Monteith (1965), Blaney and Criddle (1950), Hamon (1961), Oudin et al. (2005), Thornthwaite (1948) and Kharrufa (1985)) were evaluated using the hydrological model GR2M. This revealed that the Oudin method performed similarly to the Penman-Monteith method, with an average RMSE of 3.6 mm between flows generated from the two methodologies for five catchments for the period 1974-2000 (equating to 4.5% of mean annual flows). Casty et al. (2007) (henceforth Casty data) produced gridded (0.5° × 0.5°) monthly temperature and precipitation series for Europe covering the period 1766-2000 using non-linear principle component regression of a spatial network of available station data against reanalysis data, with independent predictors used for different variables (Casty et al., 2007). Monthly mean temperature and total precipitation were extracted and averaged for grids overlying each catchment for the years 1766-2000. Quantile mapping (Maraun, 2016)  T A B L E 1 (Continued) bias correct Casty data to catchment averages using the aforementioned gridded (1 × 1 km) monthly precipitation and temperature series. We perform quantile mapping by interpolating the empirical quantiles using local linear least square regression to robustly estimate the values of the quantilequantile relationship between the Casty and observed data for each catchment. For values outside the historical range, a constant correction-equivalent to the highest quantile in that series-was applied (Boé et al., 2007). Bias correction was carried out on a monthly basis using the 'qmap' R package (Gudmundsson, 2016). Sample bias correction plots for nine catchments are shown in Figure 2 (temperature) and Figure 3 (precipitation). Across the 51 catchments, the bias adjustment produced minimal change in mean annual temperature values (−0.15°C). Precipitation corrections were more substantial, with a mean increase of 94.2 mm/year (7.7% of mean annual precipitation). Once bias corrected, observed temperature and precipitation were appended to each catchment series to bring values up to 2016. The Oudin method was then used to derive PET estimates from the Casty temperature data for each catchment.

| Hydrological models and calibration procedures
To ascertain the contribution to uncertainty generated by model structure, two model types were implemented-a conceptual hydrological model (GR2M) and an empirical based Artificial Neural Network (ANN). These models are explained below.

| The GR2M conceptual model
GR2M is a simple water balance model (Mouelhi et al., 2006), originally developed for French catchments, now available via the airGR R hydrological modelling package (Coron et al., 2017). The monthly flow model contains two reservoirs representing a soil store and routing reservoir ( Figure 4) governed by two parameters: the production store capacity and groundwater exchange coefficient. GR2M has been widely deployed across diverse catchment types and applications (e.g. Louvet et al., 2016), including for flow reconstructions (Dieppois et al., 2016). For each catchment, GR2M was calibrated and validated on observed data before using the bias corrected Casty data to reconstruct flows. A split record for calibration/validation was applied as this allows direct comparison between GR2M and ANN model outputs on a catchment-by-catchment basis. Calibration for all catchments (including a 1-year warm-up period) was undertaken from the start of the flow record up to December 2000. This time interval captures periods of large flow variability ranging from the drought rich 1970s to the flood rich 1980s. Validation was undertaken using the 15 years postcalibration (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) for all catchments (see Table 1). Uncertainty in GR2M model parameters was sampled using Monte Carlo methods. For each parameter, 20,000 values were randomly drawn from a uniform distribution of [0-2500] for the production store capacity and [0-2] for the groundwater exchange coefficient. Each parameter set was used to simulate flows for the calibration period (yielding a 20,000-member ensemble). The performance of parameter sets was evaluated using two objective functions to ensure robust performance across the flow regime: the Nash Sutcliffe Efficiency (NSE) (Nash and Sutcliffe, 1970) derived from log transformed flows (logNSE) and the modified Kling Gupta Efficiency (KGE) derived from raw flows (Gupta et al., 2009;Kling et al., 2012). Two steps were then F I G U R E 2 Annual bias corrected Casty temperature for nine catchments from the start of the respective observations up until the year 2000. R 2 scores between bias corrected and observed temperature values are also provided undertaken to determine which parameter sets to retain. First, objective function scores were ranked by their performance, with the top 400 sets from each being retained. Second, retained simulations were evaluated by their absolute per cent bias (PBIAS) relative to observed flows, with the 200 best performing parameter sets for both logNSE and KGE F I G U R E 3 Annual bias corrected Casty precipitation values for nine catchments from the start of the respective observations up until the year 2000. R 2 scores between bias corrected and observed precipitation values are also provided F I G U R E 4 Outline of the structure of the GR2M model together with relevant equations defining the model structure. (Adapted from Mouelhi et al. (2013) and Lespinas et al. (2014) retained. The median (henceforth GR2M median) and 95th percentile confidence intervals of GR2M simulated flows, retained from this process, were then determined.

| The ANN Model
ANNs have been widely used for rainfall-runoff modelling (Dawson and Wilby, 1998;Dastorani et al., 2010). A backpropagation ANN was developed here using the neuralnet R package (Fritsch et al., 2019), with different combinations of inputs and neurons tested with two hidden layers. The same calibration and validation periods for individual catchments were employed as those for the GR2M, again using observed data to generate the model. When determining the ANN structure, input data were limited to observed variables that were also available for the full reconstruction period (temperature, precipitation and PET). Lagged variables (e.g. precipitation from previous months) were also included. The best performing ANN inputs were found to be temperature and precipitation from the current month, plus precipitation lagged by one, two and three months. An example ANN structure which generated the best efficiency scores for one catchment is shown in Figure 5. Uncertainty in ANN model structure was explored by varying combinations of neurons in one or two hidden layers. Neuron permutations, varying from one to twenty for each hidden layer (giving 420 independent model structures in total), were used to simulate flows for the given calibration period. Each model structure was then independently evaluated using logNSE and KGE and ranked in order of performance. As per the GR2M model, the top 400 ANN model structures according to each objective function were identified and those which subsequently produced the 200 lowest PBIAS scores were retained. The median (henceforth ANN median) and 95th percentile confidence intervals of simulated flows were then obtained.
F I G U R E 5 Schematic of a typical ANN model structure employed with five inputs, two hidden layers (with 12 and 9 neurons respectively) and monthly flow output. Negative one, two and three values represent the number of lagged months for precipitation Finally, a mixed ensemble was derived from both GR2M and ANN model structures and parameters by combining the 200 retained simulations from each. The median (henceforth Ensemble median) and 95th percentile confidence intervals of simulated flows were obtained and used to evaluate model reconstructions. Skill scores for GR2M, ANN and Ensemble median simulations during validation for each catchment are provided in Table 1. Poorest performances are evident for the Nire at Fourmilewater (ID: 16013) which has a logNSE score of 0.69 (ANN median) and the Finn at Anlore (ID: 36015) with a KGE score of 0.65 (GR2M median). PBIAS scores vary between catchments with the largest bias evident for the Blackwater at Faulkland (ID: 3051) (−16.4%; ANN median) and a minimum of 0% for the Glenamoy at Glenamoy (ID: 33001) (GR2M median). PBIAS values are generally higher for the ANN median.

| Validation results
Observed and simulated monthly flows for the validation period for nine catchments are shown in Figure 7. This subset represents a spread of the best (top row), average (middle row) and worst (bottom row) performing catchments. The proportion of observed variance (R 2 ) captured by the Ensemble median simulation for each catchment is also provided-varying between 0.88 and 0.93 for the nine sample catchments. The average Ensemble median R 2 value across all 51 catchments for the same validation period is 0.90. ANN and GR2M median simulations show good agreement for the majority of catchments. Whilst observed flows are largely contained within the uncertainty bounds for each of the catchment reconstructions, some discrepancies are apparent in peak values. Arterial drainage works have been identified as a probable cause of this, with previous work showing the tendency for elevated peak flows following drainage (Harrigan et al., 2014). Peak flows also tend to be underestimated for smaller catchments where gridded rainfall may not capture flood generating precipitation adequately.

| Assessment of reconstructed flows
Following calibration and validation with observed data, bias corrected Casty data (precipitation/temperature and Oudin PET) were input to the hydrological models to reconstruct monthly river flows back to 1766. The following sub-sections present the resulting annual, seasonal and monthly flow reconstructions across all 51 catchments.

| Annual flow reconstructions
The median of annual reconstructed flows for all 51 catchments from 1766 is shown in Figure 8. GR2M and ANN median reconstructions show close agreement (R 2 = 0.97). In Figure 8 and subsequent plots, observed flows from 1980 onward are displayed as, by this year, observed values are available for over 84% of catchments. Overall, the percentage of median annual observed flow values across all 51 catchments contained within the uncertainty ranges of the median ensemble (henceforth the containment value) is 97%. Observed and Ensemble median simulated series across all catchments show close agreement (R 2 = 0.81). Some divergence is evident between modelled and observed flows around 1989 due to differences between Casty and observed precipitation at that time. ). Close agreement is also evident between GR2M and ANN median reconstructions (R 2 > 0.91) in all seasons. It is notable from Figure 9 that GR2M reconstructions for spring and summer are slightly higher and autumn values lower than ANN reconstructions.

|
Monthly reconstructions are displayed in Figure 10 for all 51 catchments. Good agreement is evident between GR2M and ANN median reconstructions (R 2 > 0.84 in all months). GR2M median reconstructions are slightly higher than the ANN in April, May, June and July, whilst GR2M output in September, October and November is lower than the ANN equivalent, concurrent with summer and autumn differences between GR2M and ANN values identified above. As expected, performance of monthly simulations is poorer than for seasonal and annual time steps. Monthly observed flows generally lie within uncertainty estimates (mean containment value across all months is 68%) and show satisfactory agreement with observations (R 2 for Ensemble median values vs. observations range between 0.56 in April and 0.91 in July).

| Comparison with reconstructions from long-term precipitation series
Monthly river flow reconstructions generated with the bias corrected Casty data were evaluated against reconstructions based on monthly precipitation data for stations within the Island of Ireland Precipitation (IIP) network 1850-2010 (Noone et al., 2016). For each catchment, we identified the nearest IIP station (see Figure 1) and then bias corrected data to catchment average precipitation, as per the Casty data. Bias corrected precipitation, together with bias corrected monthly temperature/PET derived from the Casty data, was F I G U R E 7 Observed and simulated annual mean flows for nine sample catchments representing best (top row), average (middle row) and worst (bottom row) performing models. Plotted are the GR2M (red), ANN (blue) and Ensemble median (black) simulations, together with observed flows (dashed dark-grey). 95% uncertainty range (grey) is derived from the Ensemble median simulations F I G U R E 8 Median annual flow values across all 51 catchments for the period 1766-2016 for GR2M (red), ANN (blue) and Ensemble median (black) reconstructions. The median of observed flows across the catchment sample for years 1980-2016 are in dark-grey, whilst 95% uncertainty ranges (grey) are derived from the ensemble simulations used to reconstruct flows back to 1850, using the same methods as described above. Although some of the IIP data are likely contained within the Casty gridded precipitation (so there is a degree of circularity), it was deemed important to compare both data sources, given the different methods used in their construction. Figure 11 shows the Ensemble median annual mean flow reconstructions from 1850 to 2016 for four exemplar catchments, using Casty precipitation or IIP as input. Strong agreement between the reconstructions is evident despite the different input data with IIP reconstructions largely contained within the uncertainty ranges of the Casty reconstructions. Across the four case study catchments, the R 2 between IIP and Casty reconstructed annual mean flows varies between 0.70 and 0.77. Differences between flows generated from the two data sources are not unexpected given that IIP data are station based and often located outside catchment boundaries, whereas Casty data are gridded.

| High-and low-flow assessment
The most notable extreme flow years for seasonal and annual Casty reconstructions were identified (Table 2), with the top five highest and lowest flow year across all catchments displayed for calendar years (1767-2016) as well as winter and summer seasons (1767-2016). The percentage anomaly relative to the mean of the full record is also provided. The most exceptional high-flow years across the F I G U R E 9 As in Figure 8 1826,1975 and 1887 dominate the most notable low-flow years for summer. Annual flow anomalies across all 51 catchments range from 150% to 58% of the long-term mean for all catchments, whilst seasonally winter and summer extreme anomalies range from 173% to 37% of the respective longterm seasonal mean values. Our extreme years and seasons show considerable agreement with a similar evaluation of reconstructed river flows  in the United Kingdom (Jones et al., 2006), with the previously identified exceptional high-and low-flow seasons and years  all found at least once in the top five equivalent events for multiple catchments in that series.

| DATA SET ACCESS, USES AND LIMITATIONS
The derived monthly flow reconstructions (December 1766 to November 2016 inclusive) for the 51 catchments are freely available for download from the PANGAEA data centre (https://doi.org/10.1594/PANGA EA.914306). Data are presented as five individual tab-delimited text files (ASCII), representing reconstructions for each catchment from the GR2M, ANN and Ensemble median simulations, along with 2.5% and 97.5% quantiles derived from the Ensemble simulation. Also included is a table providing the geographical co-ordinates of all 51 flow stations.

| Potential uses
The reconstructed flow series provide a resource for assessing the impacts of extreme meteorological events, such as drought, on river flows across Ireland, extending the work of Noone et al. (2017) and Noone and Murphy (2020). Our reconstructions could also inform spatio-temporal assessments of variability plus support detection of multi-centennial changes in river flows (e.g. Wilby, 2006). Furthermore, the multi-centennial time scale of our reconstructions offers the potential to examine how modes of ocean and climate variability influence river flows over extended periods. For example, it is known that Atlantic multidecadal variability exerts an important control on Ireland's climate (McCarthy et al., 2015), but its impact on river flows is less clear. Our long-term data set offers the means to explore any potential control, including its stationarity. In turn, this could help facilitate improved seasonal forecasting (e.g. Wedgbrow et al., 2002).
This work represents the first reconstruction of monthly flows for a large number of Irish catchments using long-term reanalysis data and observations. Given the uncertainties involved, this data set should be treated as a benchmark and evaluated and improved by future products. The approach to flow reconstruction adopted here is easily transferable to other catchments in Europe (i.e. the domain of Casty data). By taking advantage of observed runoff data, available from the Global Runoff Data Centre (https://www.bafg.de/GRDC/ EN/Home/homep age_node.html), it would be possible to generate similar archives of monthly flow reconstructions for the entire continent.

| Limitations
There are several recognized limitations to reconstructed river flows. First, arterial drainage has had a pervasive impact on Irish rivers. Catchments in this data set that have been drained tend to have higher peak flows during winter months than captured by the reconstructions. This is consistent with the findings of Harrigan et al. (2014) for the        The percentage anomaly relative to the long-term mean (1767-2016) is provided in each case. Values highlighted in progressively darker blue represent the top three occurring high flow events, whilst those in red represent the top three occurring low-flow events.

T A B L E 2 (Continued)
Boyne catchment. Hence, our reconstructions may be useful for quantifying the impact of arterial drainage on flow response. Moreover, we note that there is limited knowledge about how arterial drainage affects low-flow and drought responses-again, our reconstructions may provide a useful point of reference.
Changes in land use can have considerable impacts on flows over time (Yan et al., 2013). Lack of metadata on historical land-use change hinders the quantification of such impacts. Moreover, Slater et al. (2019) highlight that rivers are treated as conduits of fixed conveyance by models even though changes in channel geometry and structure are known to occur in response to periods of hydro-climatic variability. Here, we assume that land-use and channel geomorphology remain static over the period of reconstruction; a common assumption attached to longterm flow reconstructions. Jones (1984) asserts that such assumptions can be justified. Water resource infrastructure designs are based on flows relating to current land use as opposed to historical conditions, suggesting that catchment response tuned to present conditions are a useful resource.
Second, potential biases or inaccuracies in precipitation data could propagate into the reconstructed flow series. The gridded Casty data set employed in this study was generated using both reanalysis and observed precipitation values, with principle component regression to interpolate across space. Interpolation of station data is more uncertain before the 1900s as the number of stations decreases rapidly prior to this time. Casty et al. (2005) highlight that European wide precipitation patterns in the early part of their series should be treated with caution, especially before 1800 when station numbers are low. For Ireland, we believe that data prior to 1850 should be treated with caution due to the sparseness of observed precipitation records on the island. A further source of uncertainty relates to the quality of early precipitation observations. Murphy et al. (2019) show that pre-1870 winter precipitation observations in the United Kingdom were likely affected by under-catch of snowfall due to gauge design and observer practice. It is likely that early Irish precipitation totals are affected by the same biases during winter months .
Third, the sensitivity of hydrological model parameters to prevailing climatic conditions during the calibration period can result in uncertainties when models are used to simulate conditions different to those used for training. Broderick et al. (2016) showed that changes in climatic conditions can affect model performance depending on catchment, model type and assessment criteria. A shift from relatively wet to dry conditions resulted in poorer results. Future work should assess the robustness of monthly reconstructions to the wetness or dryness of periods used for training.

| SUMMARY
This paper presents a data set of monthly river flow reconstructions back to 1766 for 51 Irish catchments. Gridded reconstructions of monthly precipitation and temperature, bias corrected to observed catchment data sets, are used with derived PET to force a conceptual hydrological model and an Artificial Neural Network to generate monthly flows spanning more than 250 years. Reconstructed flows are subject to uncertainties associated with hydrological response to arterial drainage and land-use change, together with potential biases in early precipitation observations and non-stationary hydrological model parameters. With these caveats in mind, the data set is suitable for examining hydrological responses to arterial drainage, tracking hydrological variability and change, or testing the robustness of water plans and/or contextualizing modern hydrological droughts.