Global database of diffuse riverine nitrogen and phosphorus loads and yields

Human activities have increased the input of nitrogen and phosphorus into riverine systems. These inputs can increase algal growth that degrades aquatic ecosystems. We constructed a global database of diffuse loads (kg) and yields (kg ha−1 yr−1) of dissolved and total nitrogen and phosphorus forms for 7 years (centred around 2008) in 1,421 catchments. Yields were calculated from 640,950 measurements that were checked, filtered and harmonized from readily available sources. We used the yield data to create a georeferenced model to calculate yields of nitrogen and phosphorus forms across 6,020 catchments, globally. The database can be used to assess and inform policy to reduce nitrogen and phosphorus losses from land to freshwater, improve nutrient use efficiency on farms, and help calibrate global models being used to explore scenarios such as nutrient management efficiency in a changing climate. The source data and R code are provided at https://doi.org/10.25400/lincolnuninz.11894697.

non-point sources, otherwise known as diffuse sources, reflect how a range of climate and catchment characteristics interact with land use decisions (Carpenter et al., 1998;Álvarez et al., 2017). To manage diffuse losses, we require knowledge of how these interactions vary in space, beginning at the coarsest (global) level and becoming increasingly targeted and tailored to finer spatial scales as more information is available.
To calculate loads, discharge (e.g. L s −1 ) is multiplied by the concentration (e.g. mg L −1 ) of nutrients. Discharge is usually measured continuously, while owing to cost and logistics, nutrient concentrations are often measured on a fortnightly, monthly or quarterly basis. There are several methods available to combine continuous discharge with less frequent concentration measurements to interpolate daily loads over a year or years. In examining the Ratio method and five-and seven-parameter rating methods, Snelder et al. (2017) found that the seven-parameter rating method produced estimates with the highest precision and representativeness from continuous discharge data and monthly or quarterly concentration data. The seven-parameter method can account for seasonal differences in concentration data and has been widely used in countries like the US and New Zealand (Alexander et al., 2002;Cohn, 2005) for the last 20 years; other methods have been developed but are not as widely used (Lee et al., 2016).
Few studies have calculated N and P loads for a range of streams and rivers across either single or multiple countries (Meybeck, 1982;Alvarez-Cobelas et al., 2009;Worrall et al., 2016). These studies were done several decades ago, focused on large catchments, usually in developed countries, and used a variety of methods to calculate loads. Although they have formed the backbone of hundreds of subsequent studies, more recent data over a wider geographic area are required to address contemporary questions such as the consequences of significant land use changes especially in developing countries on nutrient losses (Lambin and Meyfroidt, 2011).
Models can address data gaps, which is especially important to provide information for many developing countries. There are many good models that estimate the load and yield of N and P at a catchment scale (Jeppesen et al., 2011;Elliott et al., 2016); however, few exist at a regional or global scale Bouwman et al., 2013). Models, including those that estimate load and yield, can be categorized as either 'lumped' or 'distributed'. Lumped models often describe loads with statistical relationships to catchment characteristics such as soil type, and land use (Stevenson et al., 2012;Julian et al., 2017) but are restricted by the spatial and temporal scale of measurement. Distributed models rely on operational functions to describe process dynamics at a fine scale (Kroeze et al., 2012). However, unless a lot of data are available at a fine spatial and temporal scale, distributed models are seldom well resolved. This issue has been noted for the comprehensive and well-used Global NEWS models of nutrient flows where calibration data were taken from one source covering 100-200 rivers (Meybeck and Ragu, 1997;Kroeze et al., 2012).
Here, we provide and describe a database of measured nitrite-nitrate-N (NO x -N), total N, dissolved reactive P (DRP) and total P loads and yields for 1,421 streams and rivers distributed worldwide. We combined these data with data on catchment characteristics and land use in a model to estimate the load and yield of these nutrient forms across the globe. We focus our analysis on catchments that are likely to have major diffuse sources of nutrient losses although we cannot discount some input from point sources, which recent evidence has shown to be highly variable in magnitude and contributing factors (van Puijenbroek et al., 2019). Our global data set is expected to be useful for multiple purposes, for example: (a) in the catchment-specific comparison of loads or yields to address the impact of land use on surface water quality and informing policy to target measures to prevent eutrophication or toxicity (McDowell et al., 2016); (b) on questions of resource efficiency such as what farming systems make the best use of nutrient inputs especially where resources like P are finite? ; or (c) in the calibration or development of earth system models that can be used to estimate and explore nutrient losses under future scenarios like a changed climate or across different spatio-temporal scales (Greaver et al., 2016;Fetzel et al., 2017). Our georeferenced global data set is an order of magnitude larger than those of Meybeck and Ragu (1997) or Alvarez-Cobelas et al. (2009) and provides much more data for Africa, Asia and South America. A range of rivers from Australasia is used to test the representativeness and performance of the modelled outputs.

DEVELOPMENT
The methods are described in six steps depicted in Figure 1 and outlined in detail below.

| Database acquisition and checking (step 1)
Nutrient concentration and discharge data were obtained from publicly available databases or by email requests to researchers from relevant peer-reviewed literature (GEMStat, GLORICH, New Zealand Water Quality database [NZWQ] and the Murray-Darling database; Table 1). Load data were already available for the NZWQ database (Snelder et al., 2018). Since these loads used the same checks, georeferencing and filters as outlined below, the corresponding yields were input directly into the global yield models (step 5).
The databases were chosen for their geographic representativeness and use of systems to ensure that the quality of the data was robust. Each database had its own reporting conventions, measurement units and laboratory detection limits. To produce a globally consistent nutrient data set, we checked the data using a multi-stage process that involved: (a) checking data at each site for outliers; (b) Imputing replacement values for data below the detection limit; and (c) Checking that analytical methods were acceptable. Details of these checks are found in an aligned publication that explored concentration data only (McDowell et al., 2020).

| Georeferencing (step 2)
We checked the location and name of each site within the 6,020 catchments of the fourth level of the HydroSHEDS drainage network (Lehner and Grill, 2013) (Figure 2). For the GLObal RIver Chemistry Database, (GLORICH) (Hartmann et al., 2014) and Global Environmental Monitoring System (GEMS) (United Nations Environment Programme, 2018) databases we located sites within each catchment where discharge had been recorded that were within 50 km downstream of sampling sites but had no major urban centre (population > 100,000) between the sampling site and the discharge site. We obtained discharge from these sites from the Global Runoff Data Centre (GRDC) (Federal Institute of Hydrology, 2018) and assigned a GRDC identifier to each sampling site.
We did not apply a GRDC identifier to these data as discharge was not needed to calculate loads or was available from other sources. However, we did reference them to a corresponding catchment at the fourth level of the HydroSHEDS drainage network shapefiles (Lehner and Grill, 2013).

| Further filtering and data harmonization (step 3)
For each variable at each site, we examined the availability of N and P data from 1990 to 2016. Sites were further filtered from our database if they: Did not have ≥36 sampling dates spread over at least a recent 3-year period, and where the latest sampling date was before 1995. The 36-sample size minimum and date filter is a trade-off between the good spatial coverage and representativeness of sites and having too few recent data to produce precise load and yield calculations (Snelder et al., 2017). F I G U R E 1 Flowchart of the steps involved in checking, filtering, georeferencing, harmonizing and analysing the data to produce a database of loads and yields for up to 1,421 different catchments and models to estimate yields globally. Note that blue boxes are input data GEMS N and P data GLORICH N and P data NZWQ N and P data

Murray-Darling/ NZWQ N and P data
Step 1. Data acquisiƟon and checking GRDC discharge data Catchment boundaries Step 2. Georeferencing Step 4. Load and yield calculaƟons Step 5. Global Load and yield models Step 6. ValidaƟon of load and yield models

Predictor variables
Output

Validated catchment maps
Step 3. Further filtering and data harmonisaƟon Output Catchment maps | 135

MCDOWELL Et aL.
Were associated with blanks, replicates or control samples (where identified in the source databases).
Were recorded in the GLORICH database and were already present in the GEMS, NZWQ or Murray-Darling databases. Likewise, we removed sites from the GEMS database that were recorded in the NZWQ or Murray-Darling databases. We did this because the NZWQ and Murray-Darling databases contained more data for the same site listed in either the GLORICH or GEMS databases.
Once checked, georeferenced and filtered, the resulting data were harmonized for the most recent 7 years of data. This resulted in a data set whose mean age was 2008. The 7-year period was chosen to obtain annual means that were less likely to be skewed by 1 year, while avoiding substantial variation associated with continental or global climatic trends over longer periods. For example, in the Southern Hemisphere, the Southern Oscillation Index results in distinct trends in loads associated with wet and dry years (Scarsbrook et al., 2003). The 7-year period also minimizes, but does not exclude, any likely influence from anthropogenic trends (Larned et al., 2016). A similar approach is used on a 9-year rolling average by the US Geological Survey (Lee et al., 2017).
After harmonization, a total of 1,004,233 nutrient concentration measurements were available at 1,492, 1,915, 1,412 and 2,052 sites for DRP, TP, NO x and TN, respectively (Table 1). Discharge data were available for, on average, 98% of the 7-year period for which concentration data were available. Loads were truncated where discharge data were not available.

| Load and yield calculations (step 4)
For each sampling site, we calculated mean annual loads for DRP, TP, NO x -N and TN using the commonly used sevenparameter rating method rating method (Alexander et al., 2002). We chose the seven-parameter method as it provides estimates with the highest precision and representativeness from continuous discharge data and monthly or quarterly concentration data (Snelder et al., 2017). Rating methods derive a relationship between the sampled nutrient concentrations (C i , mg L −1 ) and discharge (q i , m 3 s −1 ), which are then used to estimate concentration for each day of the entire sampling period (Cohn et al., 1992;Hirsch, 2014). The model is based on fitting seven parameters: where, β 1,2,…,7 are regression coefficients, t i is the time in decimal years, T is the mean value of time in decimal years, (ln (q)) is the mean of the natural log of catchment discharge on the days sampled, and Ĉ i is the estimated concentration at time i.
(1) ln(Ĉ i ) = 1 + 2 ln q i − (ln (q)) + 3 ln q i − (ln (q)) 2 + 4 t i − T + 5 t i − T 2 + 6 sin 2 t i + 7 cos 2 t i T A B L E 1 Data sources and richness at several steps used to calculate the load and yields of phosphorus and nitrogen forms Database N or P fraction Step 1: Number of sites (data records) after checking Step 3 Coefficients were estimated by multiple linear regression of the sample data and used to calculate the concentration on each day of the 7-year period of record. Estimates of Ĉ i are back-transformed and corrected for retransformation bias by adding a smearing estimate (Duan, 1983): where n is the number of data points and ̂ i are the residuals of the regression models. The correction factor (S) is applied over the whole range of estimations as it is assumed that the residuals are homoscedastic.
The total load (L), expressed as a yield (kg ha −1 yr −1 ), was calculated by multiplying and summing discharge and estimated concentrations over the 7-year time series as: where A c is catchment the area in ha obtained from (United States Department Of The Interior -United States Geological Survey, 2008), K is a unit conversion factor equal to 31.6 kg s mg −1 yr −1 , C j is the concentration for each day in period of record estimated from the above model in mg m −3 , q j is the daily mean discharge for each day in period of record in m 3 s −1 , and N is the number of days in the period of record.
Equation 1 was fitted using both a linear model using log-transformed data and a generalized linear model with a Gaussian distribution and log-link. In general, the two models gave similar results, but in a few cases, one or other generated results that were not plausible. In these cases, and to avoid poorly estimated yields from biasing the global yield models, we inspected the results to determine if they were within expected ranges. The generalized linear model was chosen for further analysis except where yields fell outside the range for large catchment yields of 0-30 kg ha −1 y −1 for P and 0-100 kg ha −1 y −1 for N (Hale et al., 2015). Where this occurred the yield from the simple linear model was chosen (n = 341). Where neither method produced yields within these ranges (n = 32) we substituted them with average yields for large agricultural catchments across Asia, Australasia, Europe, and North and South America (McDowell and Wilcock, 2008; Billen et al., 2011; Strokal et al., 2015;Hale et al., 2015;Bustamante et al., 2015) of 0.25, 0.5, 2, and 2 kg ha −1 y −1 for DRP, TP, NO x -N and TN, respectively.

| Global yield models (step 5)
We modelled yields, instead of loads, globally. While modelling loads results in greater coefficients of determination than modelling yields, this is because loads are principally a function of catchment size and specific runoff (mm yr −1 ) not because they better account for processes. We chose to model yields so that relative nutrient losses could be easily compared between catchments (Smith et al., 2005). Yields were log 10 -transformed to remove skewness and derive an approximately normally distributed dataset. Predictor variables were obtained from a wide variety of sources, listed in Table 2. These variables reflected known categorical and continuous factors likely to affect N and P losses to water such as land use and land use intensity (Julian et al., 2017;Álvarez et al., 2017), catchment (e.g. soil wetness) and climate characteristics (van Meter and Basu, 2017). Continuous variables (e.g. percentage cropland) were expressed as a mean proportion across each catchment. Where more than one categorical variable (e.g. hierarchical USEPA ecoregion classifications Dinerstein et al. (2017)) was present in a catchment, the catchment was assigned to the dominant category for that catchment. After yields were combined with predictor variables, we had data for 1,421 catchments. This is fewer than those output after harmonization because not all predictor variables were available at all sites or there was more than one site in a catchment (Figure 2). The number of sites is much greater than have been used in the analysis of nutrient loadings to the ocean (165 rivers) , and the 100-200 rivers used to calibrate the Global NEWS model or the Integrated Model to Assess the Global Environment -Global Nutrient Model Beusen et al., 2015).
We used a best subsets routine in the R-programming language (www.r-proje ct.org) to generate an over-fitted model to estimate catchment yields which was then reduced by sequentially removing the least significant variable until we were left with a parsimonious model where all variables were significant. An optimal regression output (i.e. avoiding over-parameterization) was generated with the aid of the Mallows' Cp statistic. Significant (p < .05) variables and their coefficients for the equations predicting concentration and load of N and P forms are available in Supplementary Tables S1-S4.
Models estimating yields of DRP, NO x -N, TP and TN were extended globally in ArcGIS using the datasets in Table 2. Raster grids were created with a spatial resolution of 0.025 degrees, which corresponded to the coarsest grid cell associated with input data listed in Table 2. To provide consistent global coverage for presentation, grid values were averaged within the boundaries of 288 catchments in the HydroBASINS Level 3 (Lehner, 2014) shapefiles. However, note that the model is produced and can be used to estimate yields at the fourth level of the HydroSHEDS network. Catchments were assigned to continents based on majority area covered within a continental boundary. Global nutrient yield data are presented in Figure 3, estimated yields were back-transformed and corrected for retransformation bias by adding the smearing estimate (Duan, 1983).

| Model performance (step 6)
The models accounted for between 35% and 59% of the variation in the data (Table 3; Supplementary Tables S1-4), with apparent higher reliability for TN and NOx compared to TP and DRP. In hindsight, an alternative approach such as machine learning may have produced greater coefficients of determination. However, the selected approach was like those used previously thereby enabling the quality of the data to be compared and not the method. The coefficient of determination is similar to or better than the best performing empirical models (r 2 for TP = 0.38) of global nutrient yields by others but contains many more catchments (Meybeck, 1982;Meybeck and Ragu, 1997;Smith et al., 2005). The exception is for the study by Alvarez-Cobelas et al. (2009) who produced coefficients of 0.60 DRP and 0.73 for TP for 104 and 132 rivers, respectively. However, their models included runoff (viz, discharge) which is not an independent variable given it is a critical component of calculating loads and yields. Coefficients of determination were also like those of data-hungry process-based models like Global NEWS 2 ) (r 2 = 0.54 for dissolved inorganic N, which is mostly comprised of NO x ) or IMAGE-GNM (Beusen et al., 2015) (r 2 = 0.58 for TN) who used far fewer sites to calibrate their estimates.
As expected, our models contained variables thought by a range of studies to control nutrient losses such as the percentage of cropland in a catchment, hydrological parameters such as soil wetness or potential evapotranspiration, and ecoregions which capture climate by soil by vegetation interactions (Julian et al., 2017;Álvarez et al., 2017;van Meter and Basu, 2017). Their importance differed between nutrient forms (see 'Supplementary information yield model outputs.xlsx') but a detailed investigation is beyond the scope of this paper.
Unless continuous discharge can be combined with continuous concentration measurements, yields will always be influenced by the intervals at which concentrations are sampled. Hence, the validation of estimated yields against 'true' yields is not strictly possible (Jordan et al., 2005). We chose the seven-parameter method to try and account for the paucity or frequency of data at some sites. The performance of this method is discussed elsewhere (Cohn, 2005;Hirsch, 2014;Lee et al., 2016;Snelder et al., 2017). We did, however, examine the ability of our models to estimate yields calculated from an independent dataset of nine sites in the Murray-Darling River Basin (Biswas and Mosley, 2018) and a 20 rivers from the NZWQ database (Snelder et al., 2018) that were not included in database used for the global yield models. We chose these catchments for their size (stream order 6 or 7) to match the mean stream order in our global database of 6.7, and for their wide coverage of biomes: Deserts and Xeric Shrublands and range of Temperate and Tropical biomes, but no Tundra.
The output (Figure 4) showed similar or better coefficients of determination amongst N and P forms to that produced in the global yield models, although estimates generated using the global model overestimated DRP and underestimated total P, NO x -N and total N.

| DATASET ACCESS
The database is provided within the zip folder 'Global_ Nutrient_Yields.zip', hosted at https://doi.org/10.25400/ linco lnuni nz.11894697. A list and description of the data and files contained within the zip file are given in the file-'Description.txt'. Briefly, the file contains Excel files of the checked, filtered and harmonized DRP, TP, NO x -N and TN concentrations at each site and data set used in the calculation of loads and yields: 'Global River Data with flow sites.xls' for the GEMS data, 'GLORICH_hydrochemistry_GRDC_ Order_3.xls' for the GLORICH data and the Murray-Darling data were housed in the '/MD_Dat' folder. The GLORICH T A B L E 3 Variables, coefficient of determination, and the significance of models used to estimate log-transformed global dissolved reactive P, total P, NO x -N and total N yields and GEMS sites are linked to GRDC site identifiers, while the NZWQ sites have a 6-8 digit code and the Murray-Darling sites are identified by name. The R code is available to calculate nutrient loads and yields as 'Site yield estimates_GEMS.R', 'Site yield es-timates_GLORICH.R' and Site yield estimates_Murray_ Darling.R'. The code to model yields globally is available in the file 'Global yield models.R' Data and scripts are also given to calculate Total Kjeldhal N (TKN) and Total Suspended Solids yields, but these are not used in the present global model. Additional Excel files give the load and yield of N and P forms for all filtered and harmonized catchments separately (Global load dataAN.xls) and merged with predictor variables (Global yield and predictor merge.xls; step 6). Each Excel file contains metadata associated with their listed abbreviations, a full description of the parameter and their units of measurement.
The processed nutrient load and yield database generated in this paper can be used for any purpose provided the source is acknowledged. Use of the concentration and discharge data should acknowledge the original source. Furthermore, the original data sources should be contacted before using the concentration and discharge data for commercial purposes. Updated discharge data are available, free of charge, from the GRDC River Discharge Data service at https://www.bafg. de/GRDC/EN/02_srvcs/ 21_tmsrs/ river disch arge_node.html, and for the NZWQ database from https://hydro webpo rtal. niwa.co.nz/. Updated nutrient concentrations for the NZWQ database are available from: https://www.lawa.org.nz/downl oad-data/#river -water -quality, and for the GEMS database from: https://gemst at.org/data/data-porta l/.
The units of concentration are µg m −3 in the NZWQ database, µmol L −1 in the GLORICH database, and mg L −1 in the Murray-Darling and GEMS databases. We converted F I G U R E 4 Relationship between estimated yields using the global model and yields generated via an independent data set of order 6 and 7 rivers from Australia and New Zealand | 141 MCDOWELL Et aL. µmol L −1 into mg L −1 by multiplying the GLORICH data for N and P by 0.014007 and 0.030974, respectively. If obtaining updated nutrient data from their data sources note that we treated dissolved inorganic P as analogous to DRP, and nitrate or NO 2 -NO 3 -N as NO x -N.
It is intended that this database is updated every 5 years.

AND REUSE
Possible applications of our nutrient loads and yields database include, but are not limited to, the isolation of nutrient loss hotspots, inputs to marine systems and the calculation of nutrient use efficiencies (Mueller et al., 2012;Roy et al., 2016;Lu and Tian, 2017;Powers et al., 2019). However, our estimates are for large rivers. The mean Strahler stream order across the database is 6.7. These data are therefore most appropriately used at large scales such as biomes, countries or continents. If making comparisons at these scales note that many of the catchments could cross more than one biome or jurisdiction.
We also caution against the use of the data beyond the years we could calculate robust yields and loads. On average, this was 2005-2012. Although we constrained the number of years to avoid trends in climate influenced by continental and oceanic weather systems (Scarsbrook et al., 2003), we cannot guarantee that localized or regional trends or events (e.g. storms or droughts) may have impacted on calculated loads and yields (Ockenden et al., 2016). These data can, however, be used as a baseline from which past or future yields can be compared.
We did not specifically account for atmospheric deposition or point source discharges in our calculations of yield and load. Deposition and point source discharges can affect yields and influence biotic growth (Elser et al., 2009;Tipping et al., 2014;Bowes et al., 2014). However, deposition data were only available for a few catchments or lakes. We therefore considered their effect to be included within riverine exports. It is possible to calculate the contribution of point source discharges (Bowes et al., 2008), although we expect its influence to be low in this instance given we filtered out most sites up to 50-km downstream of an urban centre and likely wastewater discharge.