The CLARIS LPB database: constructing a long‐term daily hydro‐meteorological dataset for La Plata Basin, Southern South America

CLARIS LPB database was built within the framework of the CLARIS LPB project “A Europe‐South America Network for climate Change Assessment and Impact Studies in La Plata Basin” of the European Community's Seventh Framework Programme (FP7). The main variables available in the database are rainfall, temperature, radiation, heliophany and streamflow, constituting a high‐quality daily hydro‐meteorological dataset for scientific purpose available at http://wp32.at.fcen.uba.ar/. The objective of this article is to describe CLARIS LPB database construction, quality control and spatial and temporal characteristics. Due to the interactions with more than 60 institutions, the network of stations expanded from 107 stations in the FP6 CLARIS to more than 9000 stations in the FP7 CLARIS LPB. More than 800 maximum and minimum temperatures series, more than 8000 rainfall series, 68 radiation series, 29 heliophany series, and 58 streamflow series are available in the database webpage. The number of stations also varied greatly as a function of time, and decadal variations were evident in both rainfall and temperature stations with at least 20% of data missing. According to the characteristics analysed, this dataset provides spatially consistent climatic time series which enable a variety of empirical climate studies. It was already used as input for hydrological models, for the validation and analysis of present‐day regional and global climate model outputs, for improvement in the analysis of recent past climate variability in La Plata Basin, for analysing palaeohydrological reconstructions of the past climate variability, among others. Finally, the spatially highly dense daily database of rainfall and maximum and minimum temperatures allowed the generation of gridded products.


Introduction
Adequate temporally and spatially distributed climate datasets are necessary for different kinds of atmospheric studies. For example, climate scenarios research presupposes the continued development of high-quality climate datasets for model calibration and validation (Wilby & Wigley, 1997). Rainfall and maximum and minimum temperatures are the key variables that characterize the climate of a region. Global warming may lead changes in these variables at regional scale through the modification of large-scale circulation patterns. In turn, this may affect in a greater or lesser degree other variables, such as streamflow and radiation. These variables play an essential role in the La Plata Basin (LPB) in South America, reservoir of enormous biological wealth, where the agriculture is the main source of incomes. The LPB, which covers the entire territory of Paraguay and parts of Argentina, Brazil, Bolivia and Uruguay, is the third largest basin in the world with an area of approximately 3 200 000 km 2 (Garc ıa & Vargas, 1998). Besides, the basin generates around 70% of the Gross National Product (GNP) of these five countries, and has a population of over 200 million inhabitants. The LPB is also one of the major producers of hydroelectric power in the world. According to Peviani and Rafaelli (2010), hydropower produces about 67% of the electric energy consumed within La Plata Basin. This production represents the 32% of the basin total power generation in Argentina and Bolivia, the 76% in Brazil, the 99% in Paraguay and the 62% in Uruguay.
During recent decades different researchers have shown annual and seasonal rainfall increases in different LPB regions (Krepper et al., 1989;Penalba & Vargas, 1996Minetti & Vargas, 1997;Castañeda & Barros, 2001;Minetti et al., 2003;Liebmann et al., 2004;Boulanger et al., 2005;Rivera et al., 2013;Robledo et al., 2013). Moreover, the frequency and the intensity of extreme events of daily rainfall have increased in the last decades (Barros et al., 2008;Penalba & Robledo, 2010). These extreme events cause enormous losses in the agricultural and cattle-raising sectors. From the hydrological perspective, small changes in the percentage of rain that evaporates or percolates into the soil can produce important changes in run-off, which should be considered when designing objective models of the basin water system. Regarding temperature, several studies detected a warming in the mean values and also in the extremes across south-eastern South America. Rusticucci and Barrucand (2004) showed negative tendencies in the number of cold nights and warm days for summer for Argentina during the second half of the 20th century. In the same period, Tencer and Rusticucci (2012) detected decreases in the intensity of extreme warm events jointly with an increase in their frequency of occurrence. Similar studies were carried out for southern Brazil (Marengo & Camargo, 2008) and Uruguay (Rusticucci & Renom, 2008).
With the goal to advance in the relevant hydro-and agroclimate issues in the basin, an international research project was developed, targeting the major topics of interest. CLARIS LPB is the acronym of the project "A Europe-South America Network for Climate Change Assessment and Impact Studies in La Plata Basin" of the European Community's Seventh Framework Programme (FP7). CLARIS LPB project aims at predicting the regional climate change impacts on LPB, and at designing adaptation strategies for land use, agriculture, rural development, hydropower production, river transportation, water resources, and ecological systems in wetlands (http://www.claris-eu. org/: CLARIS, 2013a). In the project there are two major lines of research: one dedicated to the past and future hydroclimate of LPB and the other related to the impacts in the context of climate change. To accomplish their main objectives, it was necessary to build a high-quality daily database, as the climate observing system exhibits serious deficiencies in the region, in spatial and temporal resolution.
The baseline for this database was created during a previous CLARIS project, which belongs to the European Union Sixth Framework Programme FP6 (Boulanger et al., 2010b). Figure 1 shows the spatial distribution of rainfall and temperature stations for the FP6 CLARIS project, in which the grey shaded area corresponds to LPB. Most of the stations cover the post-1950 period, allowing studies of extreme events and long-term climate trends. However, the density of daily observations across LPB and its surrounding areas is notably poor (Figure 1), for example, to develop gridded products mainly for comparison with regional climate model outputs. This shortcoming precludes analysis related to assess the impact of climate change on the natural climate variability for the 21st century; to determine the uncertainties involved with the projected hydroclimate changes and to carry out the assessment of the relationship existing between land-use systems, climate and water cycle, to define integrated adaptation strategies to climate change impacts. Therefore, to overcome these limitations, it was necessary to go beyond the official institutions and involve non-governmental agencies that have their own climate networks and can provide good quality data. Such data sources have been already identified in CLIVAR/VAMOS/South American Low Level Jet Experiment (SALLJEX) .
The purpose of the current article is to give an overview of the contents of the CLARIS LPB database. Section 2 describes the data collection and the quality-control procedures that were applied to the daily series in the dataset. The Database features and the spatial and temporal distribution of the stations are presented in Section 3. Section 4 illustrates the potential of the CLARIS LPB database.

Sources
The former FP6 CLARIS project gathered temperature and rainfall information of 107 meteorological stations, only from governmental institutions of Argentina, Brazil, Chile and Uruguay (Figure 1). The new FP7 CLARIS LPB project is focused in La Plata Basin, which comprehends the shaded area in Figure 1. The lack of spatial density over the basin is evident, and only 50 locations possessed long-term high-quality data. With this information as the initial database source, it was necessary to resort to other governmental and nongovernmental institutions to improve the spatial and temporal resolution of rainfall and temperature time series and to collect others variables, as streamflow and radiation. Such initiative strengthened collaborations with public and private stakeholders at national and local levels, who provided their daily station data and were interested in the CLARIS LPB results. More than 60 institutions were gathered for the improvement of the database, whose final list is available in the CLARIS LPB website (http://eolo.cima.fcen.uba.ar/ sweb/collaborators.php: CLARIS, 2013b). The national institutions contributed to rainfall and temperature daily data, and most of the local institutions provided only daily rainfall data. The National Weather Services also worked in data rescue initiatives, digitalizing historical climate data in several countries of South America. This effort allowed having a greater number of official stations with longer periods. Streamflow data were obtained for the main rivers in LPB, located over Argentina and Brazil at daily and monthly basis, through interactions with the water offices of Argentina and Brazil. Daily data for radiation and heliophany were obtained only for the Argentinean territory, supplied by Universidad Nacional del Litoral and the study group of solar radiation of the Universidad de Lujan (GERSolar-INEDES, 2013, http://www.gersol.unlu.edu. ar/). Among all the institutions, we were able to build an integrated high-quality database with more than 9000 stations. Table 1 shows the number of stations for each climate variable and country, which reflects the fact that most of the institutions measured rainfall data.

Quality control
The metadata files provided by the several sources involved in the project are available through the CLARIS LPB Data Archive Centre (CLDAC) (Goodess et al., 2011). These metadata files were processed and put into a unique conventional format for an easier handling. After that, we developed some basic algorithms for the quality control of the data. Within the database webpage, the user will find a list with the corrections performed in an appendix file, to let him or her decide if the correction proposed is right for their purposes or if the user thinks that another value can be given to the corrected data.
Several methods were used to evaluate the consistency of the time series. First, an intuitive measure was proposed as negative values were not allowed in rainfall, radiation, and streamflow data. In all the cases, we verified that the real values were the absolute amount of daily value, which means that the negative sign was associated with typing errors. This was the only kind of evaluation performed to radiation and streamflow data, given that the impact groups of the CLARIS LPB Project worked with these variables and submitted the data to adequate quality controls. In the case of temperature, we followed the procedures described in Rusticucci and Barrucand (2001), and, moreover, it was verified that the maximum temperature exceeds the minimum temperature in, at least, a value of 0.5°C. This value was selected based on the inherent instrumental error and typical errors in the estimation and measurement of temperature. In cases that this did not happen, we analysed if the differences were physically possible. For this propose, we used the nearest stations to perform a comparison of the temperature spatial patterns. In most of the cases, the difference was associated with typing errors. In other cases where we could not identify which were the real value of the variable, we considered the data as missing. For the rainfall data, we also analysed the length of the consecutive dry days to verify if these dry spells were natural and regionally consistent or some data missing were filled with zeroes. To identify if the dry spells were regionally consistent, we took into account the climatological study performed by Llano and Penalba (2011). When a suspicious spell was identified, this was indicated in the correction file, but no changes were performed to the original data. For temperature and rainfall data, the values which were away from the daily average value over a range greater than four standard deviations were analysed.
In the case of rainfall values, we considered for the calculation of the mean and standard deviation only the wet daysrainfall greater than zero. The spatial analyses of these dates enable identifying whether they were outliers or extreme events. This kind of analysis allowed identifying no less than 1500 errors regarding maximum and minimum temperature and 83 errors regarding rainfall data, which belong to less than 4% of the total amount of stations of the database. There were no procedures for filling data gaps. After these checks and corrections, the data were entered in the database. In the case of rainfall and temperature variables, additional quality control procedures were performed by specific research Jones et al., 2013).
During the construction of the database we did not apply any homogeneity tests to the data. It should be noted that an algorithm for the detection of errors and inhomogeneities, called APACH (Boulanger et al., 2010a;Farall et al., 2010), is expected to run over the CLARIS LPB database in the near future.
All the data collected were returned to the providers with the corrections performed during the quality control period and the recommendations for its appropriate use.

Database features
The CLARIS LPB database is available at http://wp32. at.fcen.uba.ar/ (CLARIS, 2013c), which could be used for scientific purpose under the CLARIS LPB data protocol. This protocol considers that the data shall be offered free of charge, and the users should not use the data for commercial activities or exploitation nor to transfer to third parties. Therefore, the climate community working in LPB can benefit from this unique high-quality database. Up to July 2013, there are 185 registered users in the database, who performed more than 1300 successful data downloads.
The features currently available in the website allow the users to easily obtain the desired data. The four main variables available in the database are rainfall, temperature (maximum and minimum), radiation (and also heliophany), and streamflow. After the selection of the variable of interest, the user can restrict the search by country, by area, by station ID, and by date. All these options can be combined to obtain the desired data. As a result, the user will obtain a list of the selected stations according to the search options introduced before and will be able to export the data in a compressed file. Given the huge amount of information that a query can bring to the user, the exportation was limited to 500 stations. This was done to optimize the performance of the database server, although the user can export several searches containing up to 500 stations. The file containing the data will be sent as a web link to the registered e-mail of the user.
Some of the institutions also provided monthly data for rainfall and streamflow. In the case of rainfall, we received information of 93 stations located in the northern portion of Argentina, near the limit with Paraguay territory. These stations roughly cover the 1979-2008 period, although some locations started their measurements in 1967. In the case of streamflow, the water offices of Argentina and Brazil provided monthly data for the upper LPB for 39 stream gauges, some of them covering the whole 20th century. Quality control procedures were not performed upon these data. The user can obtain these monthly data from the Search/ Export Data menu of the database webpage.
When the supplier provided, the metadata of the station series were collected and are available in the website and in the CLDAC website.

Spatial and temporal distribution of daily station data
As previously shown in Table 1, the number of stations available will depend on the variable and the study area. Therefore, some aspects of the spatial and temporal distribution of the stations were analysed in this section.
The total number of series for radiation and heliophany comprises 68 and 29 locations, respectively, which are distributed only over Argentina (Figure 2). The highest density of stations is in the region of interest, with measurements in the period 1960-2009. Daily streamflow gauges are distributed along the Paran a, Paraguay and Uruguay rivers and its main tributaries (Figure 3), which corresponds to the lower portion of the LPB. The longest records are located along the Paran a River, whose gauges started their measurements during the first decade of the 20th century.
Regarding maximum and minimum temperatures, the interactions with local and national institutions resulted in a total of more than 800 stations (Table 1, Figure 4). In the case of rainfall series, we reached more than 8000 stations, most of them distributed along the Brazilian territory (Table 1, Figure 5).
The number of daily rainfall series is markedly greater than those collected for daily temperatures. However, in both variables most of these series have very short measurement periods and/or several dates with data missing. To get a general approach about the spatial coverage in different periods with the least amount of data missing, the following analyses were carried out. First, the temporal evolution of the number of stations for temperature and rainfall is shown in Figure 6.    Given that Figure 6 was built for the complete database, we also analysed the spatial distribution of the rainfall and temperature stations with at least 20% of data missing over periods of 10 years. We performed this analysis because decadal climate variations in LPB could impact upon agricultural and hydrological production; therefore, it is important to know the spatial distribution of stations whether for model validation or decadal variability analysis. The difference in the number of stations by changing the threshold from 20% to 10% of data missing is between 20 and 25% fewer stations (not shown). Figure 7 shows the location of the temperature stations with less than 20% of data missing over 8 periods of 10 years, from 1931 to 2010. The number of stations varies mainly between 1951-1960 and 1961-1970 (from 4 to 28 stations) and from 1991-2000 to 2001-2010 (from 50 to 16 stations). There is a lack of stations over the Brazilian and Paraguayan portions of the basin in most of the decades. The analysis of decadal climate variability could be performed during 1961-1970 decade onward, given the decent amount of stations available over LPB. If we consider periods of 30 years for climatological analysis, the best period is 1971-2000, with 39 temperature stations with less than 20% of data missing. Figure 8 shows the location of the rainfall stations with less than 20% of data missing considering the same periods than for temperature. The number of stations strongly varies between decades, from 48 during 1931-1940 to 1100 during 1971-1980. There is a similar spatial pattern during [1971][1972][1973][1974][1975][1976][1977][1978][1979][1980][1981][1982][1983][1984][1985][1986][1987][1988][1989][1990]. While the last decade has relatively few stations (a total of 134), their location is such as to carry out powerful spatio-temporal analysis of the impact of present climate change on LPB rivers. Regarding the distribution over periods of 30 years, the largest number of rainfall stations is in the period 1961-1990, with 540. Even though in the last period chosen  the number decreases to 119, the location of the rainfall stations is strategic from the point of view of the impact of water stress in the basin (not shown).

Daily gridded data
As the given figures show, for rainfall and maximum and minimum temperatures, we obtained a spatially highly dense daily database. This allowed producing gridded products that can be obtained in the database webpage and this improves the existing observed gridded datasets in terms of length of record and spatial and temporal resolution. A brief description of the gridded datasets is available in the CLARIS LPB database webpage. However, the user can refer to Tencer et al. (2011) for temperature gridded data and Jones et al. (2013) for rainfall gridded data. Both daily gridded data cover the 1961-2000 period, with a spatial resolution of 0.5°9 0.5°and can be downloaded in NetCDF format from http://wp32.at.fcen.uba.ar/ gridded/temp (CLARIS, 2013e) in the case of temperature and http://wp32.at.fcen.uba.ar/gridded/prec (CLARIS, 2013d) in the case of rainfall.
These kinds of products are extremely important for climate model validations, and the differences between the observed gridded datasets and the model outputs can be applied to future integrations of the Regional Climate Models (RCMs) when these are used for impact assessment.

Concluding Remarks
During the 4 years of the CLARIS LPB Project, a highquality hydroclimate observational database was developed for extreme event and decadal-to-interdecadal variability analysis. The network of stations expanded from 107 stations in the FP6 CLARIS to more than 9000 stations in the FP7 CLARIS LPB. This achievement was accomplished due to interactions between local non-governmental institutions that provided their climate information and also with the official National Meteorological Services. Moreover, this exchange helped strengthen ties among scientists, institutions, and decision makers. A result of this interaction was the course "M etodos de homogeneizaci on -Control de calidad de los datos meteorol ogicos-APACH" that took place in the National Weather Service of Argentina in November 2011, where the participants learned different tools for quality control of meteorological data.
Temporal and spatial distribution of the available stations was studied considering the amount of data missing. This analysis, differentiated by variable, allows identify benefits and limitations of CLARIS LPB dataset. According to the characteristics analysed, this dataset provides spatially consistent climatic time series which enable a variety of empirical climate studies, such as to evaluate the changes of extremes in relation to changes in mean temperature and total rainfall. However, despite the high density of stations in the northeast of Argentina and south of Brazil, it is not enough to hold studies in the mesoscale such as the pioneer work carried out by Velasco and Fritsch (1987) about the mesoscale convective systems. Nevertheless, the analysis made in this article helps to identify specific locations in which new stations are needed to be located.
The database allowed an improvement in the analysis of recent past climate variability in La Plata Basin (e.g. Cavalcanti et al., 2011;Carril et al., 2012) and provided a baseline for the validation and analysis of present-day regional and global climate model outputs (e.g. Men endez et al., 2010;Solman & Pessacg, 2012;Bl azquez & Nuñez, 2013). Moreover, the CLARIS LPB database allowed the working groups related to climate change impacts to elaborate hydrological models of the Iber a Wetlands (Grimson et al., 2013) and to use observational data as inputs for these hydrological models (e.g. Popescu et al., 2012). In particular, it helped for analysing palaeohydrological reconstructions of the past climate variability (Piovano et al., 2009;Troin et al., 2010). In addition, temperature and rainfall gridded data were developed by Tencer and Rusticucci (2012) and Jones et al. (2013), respectively, from daily data to improve the development of a high-quality hydroclimatic observational database for extreme events and decadal-to-interdecadal variability analysis.
Regarding the continuity of the database, there is a commitment for keeping the database operative for further research, and it will remain available for the next years within a computer server in the Department of Atmospheric and Oceanic Sciences at the University of Buenos Aires. However, as the CLARIS LPB project has finished, the database will be no longer updated. Also, the metadata will remain available in the CLDAC, which is located in the Research Center for the Sea and the Atmosphere at the National