Identifying, attributing, and overcoming common data quality issues of manned station observations

In situ climatological observations are essential for studies related to climate trends and extreme events. However, in many regions of the globe, observational records are affected by a large number of data quality issues. Assessing and controlling the quality of such datasets is an important, often overlooked aspect of climate research. Besides analysing the measurement data, metadata are important for a comprehensive data quality assessment. However, metadata are often missing, but may partly be reconstructed by suitable actions such as station inspections. This study identifies and attributes the most important common data quality issues in Bolivian and Peruvian temperature and precipitation datasets. The same or similar errors are found in many other predominantly manned station networks worldwide. A large fraction of these issues can be traced back to measurement errors by the observers. Therefore, the most effective way to prevent errors is to strengthen the training of observers and to establish a near real‐time quality control (QC) procedure. Many common data quality issues are hardly detected by usual QC approaches. Data visualization, however, is an effective tool to identify and attribute those issues, and therefore enables data users to potentially correct errors and to decide which purposes are not affected by specific problems. The resulting increase in usable station records is particularly important in areas where station networks are sparse. In such networks, adequate selection and treatment of time series based on a comprehensive QC procedure may contribute to improving data homogeneity more than statistical data homogenization methods.


Introduction
Assessing climatological trends and extremes requires data that meet high quality standards. If data are affected by quality issues, a consequence may be misleading and erroneous results (Nese, 1994;Petrovic, 2001;Viney and Bates, 2004;Mahmood et al., 2006;Westerberg et al., 2009;Zhang et al., 2009;Rhines et al., 2015). In order to eliminate errors, quality control (QC) should be applied before analysing observational records (Westerberg et al., 2009;WMO, 2011). The World Meteorological Organization (WMO) provides guidelines on QC in various documents (WMO, 1993;Plummer et al., 2003;WMO, 2008;Klein Tank et al., 2009;WMO, 2010;WMO, 2011). A wide range of different QC approaches are used by National Weather Services and research projects. In some cases, only a few basic methods are applied that may not ensure the desired quality of the data. According to Štěpánek et al. (2013), there is a lack of a generally accepted methodology for QC. Furthermore, if data are retrieved from a standard dataset, the suitability of the data for the intended purpose is often not questioned (Brönnimann, 2015).
Although satellite and reanalysis data find widespread use in climate science, they are not a substitute for ground-based observations due to limitations such as a lower precision, shorter record lengths, calibration problems, and the need for in situ data for validation (Plummer et al., 2003;Bližňák et al., 2015;Mantas et al., 2015;Schmocker et al., 2016). Therefore, quality issues in observational records must be addressed.
Data quality issues may occur at any point of the data generation chain, for example, due to unsuitable station configuration or siting, poor station maintenance, erroneous instrument reading, or inaccurate data digitization and post processing (Brönnimann, 2015). The characteristics and frequencies of errors may vary between 4132 S. HUNZIKER et al. different networks depending on factors such as the grade of automatization, instruments used, observer training, station maintenance, and climate type. Hence, a QC approach may be suitable for one network but not for another.
Not all data quality issues are detectable by controlling measurement data. Metadata are another important source in the quality assessment. According to the Global Climate Observing System monitoring principles (WMO, 2002), metadata (i.e. the details and history of local conditions, instruments, operation procedures, data processing algorithms, etc.) should be documented and treated with the same care as the measurements. Metadata are also needed to ensure that end-users are able to extract accurate conclusions from their analyses, and they play an important role in the detection and correction of inhomogeneities and are essential to managing networks and support systems (Aguilar et al., 2003;Plummer et al., 2003;WMO, 2008). Nevertheless, in many networks worldwide, metadata are fractional, of poor quality, or simply not available (e.g. Westerberg et al., 2009;Trewin, 2010;Fall et al., 2011), and therefore measures should be taken to address this shortcoming.
Assuring the quality of a dataset is more difficult for sparse than for dense station networks. Important QC tests depend on the availability of suitable neighbouring stations (Plummer et al., 2003;Kunkel et al., 2005;Durre et al., 2010). Furthermore, records in dense networks may be excluded from the dataset without losing much information (e.g. Vertačnik et al., 2015), which allows removing time series that are incomplete or whose quality is uncertain. This, however, may not apply to sparse network datasets. In order to obtain the greatest benefit from weather observations in regions of low station density, identification of specific errors and, if possible, error corrections are particularly important.
In this study, we investigate data quality issues in daily maximum/minimum temperature and precipitation observations in Bolivia and Peru as exemplary cases for predominantly manned station networks. According to Vuille et al. (2008), data quality problems and the lack of long-term weather observations hinder reliable trend analyses in the Central Andean area. The same applies to other regions, such as Central America (Westerberg et al., 2009) or eastern Africa (Kizza et al., 2012). Analysing weather observations from the Central Andean area is challenging, as the region is characterized by complex terrain and strong climate gradients. Consequently, the impacts of climate change vary (Vuille et al., 2000;Diaz et al., 2014;Wang et al., 2014). This would require investigations on relatively small spatial scales, but station networks are sparse in the region. Furthermore, metadata records are fragmentary or completely missing. Hence, Bolivian and Peruvian station records are good case studies to address various challenges in climate monitoring.
This study describes the current situation of station networks in Bolivia and south-eastern Peru (Section 2). Measurement conditions, practices, and instrumentation are outlined, and efforts on QC and metadata gathering are described. Because metadata are largely missing, we suggest strategies to reconstruct and generate metadata (Section 3). The close collaboration of various organizations, extensive investigations on site, and detailed data analyses enabled us to characterize common errors found in the data and to identify error sources (Section 4). In order to describe the impact on results, we mention for each major data quality issue particularly affected and unaffected climate change indices. Finally, this study shows methods for error detection, and discusses how station records may potentially be used for certain purposes despite existing quality issues (Section 5).

Data
In order to investigate common data quality issues of manned station observations, data from the Central Andean area are analysed. The study area includes Bolivia and the Peruvian districts Puno, Cuzco, and Junín ( Figure 1(a)). In the following, the 'Peruvian data' refers to data from these districts. Note, however, that the data of the Peruvian districts used in this study is not complete and does not cover the full area of the districts. The climate variables investigated are maximum temperature (TX), minimum temperature (TN), and precipitation (PRCP), which are key variables in climatology (Plummer et al., 2003) and are the most frequently measured parameters in the Central Andean area.
Data were accessed between 2014 and 2016 from the National Meteorological and Hydrological Service (SENAMHI) Bolivia, the project Pilot Program for Climate Resilience (PPCR) for Bolivia, and SENAMHI Peru. The dataset of PPCR includes transcriptions that originate from SENAMHI Bolivia and the civil airport administration (AASANA). There are several different digitized datasets that are based on weather observations of SENAMHI Bolivia. They are the result of multiple transcription programs implemented at different times. Even though they have the same primary data source, these datasets differ regarding their station coverage, time series completeness, and transcription quality. This study was conducted within the framework of the projects 'Data on climate and Extreme weather for the Central AnDEs' (DECADE) and 'Servicios CLIMáticos con énfasis en los ANdes en apoyo a las DEcisioneS' (CLIMANDES). As both projects aim to increase the quality of observational data in the Central Andes, the available datasets were evaluated, merged, corrected if possible, and updated. An output of the DECADE project will be a new and improved dataset for Bolivia and south-eastern Peru.

Station networks in Bolivia and Peru
The most important organizations conducting weather observations are the National Meteorological and Hydrological Services, i.e. SENAMHI Bolivia and Peru. Both institutions work with limited resources, which may increase the risk of the occurrence of quality issues. According to WMO (2008), successful observing practices require skill, training, equipment, and support, which is not always available to a sufficient degree. Furthermore, organizational weaknesses may contribute to observational data quality issues in many networks (Westerberg et al., 2009). A large fraction of the Central Andean station records cover only a short measurement period or contain large data gaps, which is also an issue in many other networks worldwide (e.g. Kizza et al., 2012;Trewin, 2013;Brunet et al., 2014a) and drastically lowers the amount of time series suitable for climatological studies. Currently, the station network of SENAMHI Bolivia contains around 255 manned and 33 automatic stations (F. Villalpando, 2016, pers. comm.). Note that the latter are not included in this work. For the digitized time series that were used in this study, the median length of the sum of observations at each station of SENAMHI Bolivia is slightly below 20 years for TX (Figure 1(a)), TN, and PRCP. The spatial and temporal distributions of TX and TN observations are nearly identical. The spatial distributions of temperature and PRCP measurements are similar, but the total number of PRCP time series is about twice as high. Since the year 2000, there has been a decline in the number of daily temperature measurements (Figure 1(b)). For PRCP, the decline starts already in the early 1990s, resulting in about half as many observations in the last decade compared with the maxima in the 1980s and 1990s. This pattern is observed in many weather observation networks in the developing world (Sene and Farquharson, 1998). The reasons for this development are manifold, and include inadequate institutional frameworks and the lack of appreciation of the worth of long-term data (Sene and Farquharson, 1998). The oldest digitized measurements of SENAMHI Bolivia date back to 1917. However, in the context of this study, a non-digitized record dating back to 1892 was detected. Historic weather data are often not completely transcribed (e.g. Brunet and Jones, 2011, Trewin, 2013, Brönnimann, 2015. Currently, a program of the International Environmental Data Rescue Organization (IEDRO) is addressing this issue in Bolivia (http:// iedro.org/activities/bolivia-data-rescue-program).
Besides SENAMHI Bolivia data, nearly all station records of AASANA were included this study. Observations go back to 1942. In contrast to the SENAMHI Bolivia network, the amount of daily measurements remains nearly constant since the 1990s (Figure 1(b)). The median observation length is 40 years for temperature and 52 years for precipitation. Collecting and analysing data from different networks of the same region has been shown to be beneficial to the accuracy of meteorological analyses (e.g. Westerberg et al., 2009).
The first measurements in the Peruvian time series available for this study were taken in 1931. In contrast to SENAMHI Bolivia, the number of daily measurements in SENAMHI Peru is more constant (Figure 1(b)), and no clear decline in daily observations occurs in recent years, either for temperature or for PRCP. The median observation duration is 20 years for temperature and 28 years for PRCP.

Measurement practices and instrumentation
The vast majority of the stations in the Central Andes are manned. According to Fiebrich and Crawford (2009), the automatization of a network results in a clear increase in data quality. Automatic weather stations (AWS) have proven to be more accurate than manned observations for many variables (e.g. temperature), but for others (e.g. PRCP), there are measurement issues that have to be addressed (Ciach, 2003, Plummer et al., 2003, Tokay et al., 2010. However, AWS may be exposed to disturbances, such as technical defects, thievery, or vandalism. Such data losses may make time series unusable for climatological analyses (e.g. Westerberg et al., 2009). Furthermore, the capabilities, personnel, and equipment must be adequate to run the network (WMO, 2008). For instance, AWS require station visits at least twice a year (Plummer et al., 2003). This would be a great challenge in the Central Andean region, where access to remote areas is seasonally impeded and station inspections are currently done very infrequently. Without strict adherence to prescribed inspection and maintenance programs and schedules, the quality of AWS observations can deteriorate dramatically (Plummer et al., 2003). Hence, a sudden automatization of the networks may not only cause considerable inhomogeneities in the measurement records (e.g. Hubbard and Lin, 2006, WMO, 2008, Trewin, 2010, but also a likely quality decrease of Central Andean weather observations. Observers in Bolivia and Peru are largely laypersons, who conduct the measurements besides their main occupation. At some stations, the official observers can also be temporarily replaced by aides (usually family members), which results in changes in the way data is collected.
Measurements of SENAMHI Bolivia stations are taken at 0800 LST (1200 UTC). Partially, a second daily measurement is taken at 1700 or 1800 LST. Bolivian airports measure hourly during operation times, but only daily summaries are transcribed. At stations of SENAMHI Peru, measurements are normally taken three times a day at 0700 (1200 UTC), 1300, and 1900 LST. Data are largely transmitted to SENAMHI in the form of the original documents (in Bolivia every second month), as it is the actual or former practice in other networks (e.g. Kunkel et al., 2005). Currently, the observers at some Peruvian stations transmit the data immediately to SENAMHI by mobile phone.
In Bolivia, TX and TN are mostly measured with Six's thermometers. These thermometers are considered as robust and simple to use, but were superseded for professional field work in many networks a long time ago (Austin and McConnell, 1980). In general, only high quality instruments should be used for meteorological purposes (WMO, 2008). However, as instruments should also be suitable regarding reliability, durability, simplicity of design, the resources for observer training, and the costs (WMO, 2008), Six's thermometers may still be the best choice for certain stations. At some Bolivian and most Peruvian stations, liquid in glass TX and TN thermometers are used. PRCP is mostly measured with Hellmann rain gauges without wind shields. The use of different types of instruments to measure a single parameter may cause data continuity problems. (WMO, 2008).

Metadata records
Metadata analysis may reveal quality issues that are not detectable in the measurement data. For instance, station siting has a great impact on temperature and PRCP measurements (Aguilar et al., 2003, Mahmood et al., 2006, WMO, 2008, Yilmaz et al., 2008. According to WMO guidelines (Plummer et al., 2003, WMO, 2008, an observation site should be representative of the climatic regime for which it is intended. If this requirement is not met, the measured data may not be meaningful for comparisons (Plummer et al., 2003, Brönnimann, 2015. However, good station exposure as defined by WMO (2008) is difficult to achieve and compromises are necessary. Questionable station siting occurs in the Central Andes, as well as in many networks worldwide (Trewin, 2010). For instance, the vast majority of the United States National Weather Service's cooperative observer network (COOP) was reported to be poorly sited (Fall et al., 2011). The specific effects of poor station exposure on the measurements are complex (Mahmood et al., 2006, Kumamoto et al., 2013, but definitely increase the uncertainty in the data (Vose et al., 2005, Mahmood et al., 2006, Pielke et al., 2007. Hence, high quality metadata allow users to assess the circumstances at which the data were recorded, and time series that seem inadequate for a certain purpose can be removed. In Bolivia and Peru, metadata are not systematically recorded. In Bolivia, efforts have been made to collect metadata for SENAMHI and AASANA stations. The resulting information is available through a webpage (http://www.senamhi.gob.bo/sige/). Even though this is a helpful tool, the metadata are very basic and in some cases incorrect or outdated. SENAMHI Peru operates a website that provides similar information (http://www .senamhi.gob.pe/include_mapas/_map_data_tesis.php).

QC practices
No systematic and consistent QC is applied on Bolivian and Peruvian station data, as it is the case for other networks (e.g. Fiebrich and Crawford, 2009;Westerberg et al., 2009). However, there are some procedures that should guarantee the quality of the data. In Bolivia, a supervisor of SENAMHI verifies the monthly data sheets provided by the observers. Suspicious values are highlighted by handwritten notes. However, these notes are not always taken into account in the subsequent digitization of the original documents. At SENAMHI Peru, data sheets are partly digitized twice in order to avoid typing errors. Furthermore, some QC has been done in the frame of different projects. According to WMO (2008), the provision of meteorological data of good quality is not a simple matter and is impossible without a quality management system. The lack of such a system may increase the danger of inadequate actions that may not only result in the loss of original data, but also may introduce errors (Reek et al., 1992;Aguilar et al., 2003). Because no quality management system is established in the Central Andean networks, original time series should be treated as non-quality controlled data. The same applies for other networks where QC processes are not clearly documented, or if the adequacy of the applied QC is questioned.

Metadata generation
There are different sources for reconstructing station information if metadata are partly or completely missing. To get the best possible set of metadata, the information of all sources should be combined. This is a very time and resource consuming process. However, the methods described in the points 1-3 in the following paragraphs can be applied rather easily, because there is no need to be on site. Accessing the other data sources (points 4 and 5) requires on site investigation, except if photos of the required documents are available (point 4).
Using the 5 steps below, we successfully generated and collected metadata from Bolivia and Peru: 1. Collection of all transcribed information: Digitized metadata may not be stored in only one document.
Hence, the different information sources should be compiled. Discrepancies (e.g. different coordinates for the same station) may give evidence to changes in the station's history (such as station relocations). In Bolivia and Peru, these sources are available at the aforementioned websites (see Section 2.3), in the headers of the data files, or in separate station lists. 2. Investigation of the station surroundings: Station surroundings can be investigated and reported based on 'Google Earth' images. This or similar methods were used in previous studies (Mahmood et al., 2006;Fall et al., 2011). The approach furthermore allows the identification of erroneous or suspicious station coordinates such as station locations in the middle of a lake. 3. Extracting metadata from the data: A detailed QC reveals many indicators for events in the history of the stations. For example, a transition in the measurement precision (see Section 4.3.1) is a clear indication for an undocumented event (Rhines et al., 2015) such as an observer or instrument change. Reporting such observations creates additional metadata that can be used, inter alia, for homogeneity estimations and breakpoint corroboration (Rhines et al., 2015). This source is particularly valuable if documentation about the station's history is missing. 4. Screening of original documents: Search for non-digitized notes on archived data sheets, station inspection reports, working memorandums, and similar documents. Furthermore, some information is hidden by the digitization process. For instance, a change in handwriting is a strong indication for an observer change. In addition, comparing original records with digitized data enables to assess the quality of the data transcription. 5. Station visits: Station condition and environmental surroundings can be determined from in person station visits that should include measuring the station's coordinates with GPS and photographing the station from all directions. The station's history may be, at least partially, reconstructed by interviewing observers or inhabitants living in the vicinity. Furthermore, station visits allow investigating previously detected discrepancies and errors in metadata and measurement records. It is reasonable to create a station visit form in order to gain objective and comparable metadata. Station inspections aiming to generate metadata were performed in previous projects (e.g. Davey and Pielke, 2005;Mahmood et al., 2006;Westerberg et al., 2009;Fall et al., 2011).

Description of common quality issues
This section describes common data quality issues that do not just affect single measurements but may appear systematically during several years. The list is not complete, but it covers the most common errors found in the Central Andean time series. The cases demonstrated in the following sections are exemplary. Note that these data quality issues often manifest in a less pronounced and/or somewhat different form. In such cases, the impact on results of data analyses might be less drastic, but error detection and attribution are made more difficult. Furthermore, errors frequently occur intermittently, have individual specificities, and overlap. Various approaches may be suitable to detect data quality issues, but the models suggested here have been proven to be effective. In some cases, errors could be clearly attributed to a specific cause, while in other cases, error sources could only be assumed. The effect of data quality issues on specific analyses will be mentioned by means of selected climate change indices defined by the Expert Team on Climate Change Detection and Indices (ETCCDI, http://etccdi.pacificclimate.org/). Thereby, we demonstrate that time series affected by data quality issues may still be usable for some analyses, but not for others. The ETCCDI indices are widely used in studies on changing climate extremes (e.g. Alexander et al., 2006;New et al., 2006;You et al., 2011;Zhang et al., 2011;Donat et al., 2013;Sillmann et al., 2013) and are included in the software package RClimDex (Zhang and Yang, 2004).

Missing temperature intervals
In some time series, measurements in a certain temperature range are missing or the frequency of records is reduced. Such missing temperature intervals may extend up to several degrees. Values around 0 ∘ C are predominantly affected. Because nighttime temperatures at high altitude stations in the Central Andes frequently fall below 0 ∘ C, TN records are more often affected than TX measurements. The occurrence of the error is usually not associated with an increase of missing values in the time series. In Bolivia and Peru, approximately 10-20% of the station records are affected by missing temperature intervals. A very clear example of the error is found in the TN observations of the station Progreso in the Peruvian district of Puno (Figure 2(a)). The instrument used at this station is a liquid-in-glass thermometer with a dumbbell-shaped rod (Figure 2(b)), from which TN is read at the right side of the rod. When discussing the issue with the observer, it was found that he read the temperature on the left side of the rod whenever the rod's centre was below 0 ∘ C. The length of the rod corresponds to about 4.2 ∘ C, and hence determines the size of the missing temperature interval.
Even though different causes may induce missing temperature intervals such as the misinterpretation of zeros as invalid numbers, erroneous instrument reading is often the most likely source of this type of quality issue. The error may bias means as well as extremes. In the case of TN time series of Progreso, the error affects all ETCCDI indices [except the 'number of tropical nights' (TR) that is irrelevant for this station]. However, as the cause of the problem was clearly identified and the observer did the same error consistently for many years, the biased negative temperature values may be corrected by adding 4.2 ∘ C corresponding to the rod's length. The carefully corrected time series may then be used for any analysis.  The temperature must be read on the right side of the rod (slightly above +2 ∘ C for the case shown). However, if the centre of the rod was below 0 ∘ C, the observer erroneously read the temperature on the rod's left side. Hence, the length of the rod (corresponding to about 4.2 ∘ C) determines the missing temperature range. [Colour figure can be viewed at wileyonlinelibrary.com].

Heavy precipitation truncation
PRCP records may have a deficiency of heavy PRCP events above a certain threshold value. This may manifest as a complete truncation or a frequency reduction. Often, there is an accumulation of data points around the threshold value. The occurrence of the error does normally not come along with an increase of missing values. A relatively large fraction of about 35% of the Bolivian time series are affected by a temporary lack of PRCP values above 20 mm (sometimes 40 mm). In Peru, about 15% of the stations are affected by the same issue, but the truncation occurs usually at 10 mm. A typical example of heavy PRCP truncation is found in the records of the station Aguirre in Bolivia (Figure 3(a)). The source of the error was detected when inspecting the Hellmann rain gauge that is generally used in Bolivia (Figure 3(b)). Observers measure the PRCP in the instrument's inner container that captures 20 mm of PRCP before overflowing. Rainwater in the outer container is measured by refilling the inner container. Finally, the measured PRCP amounts are totalled. Apparently, various observers failed to measure the rainwater collected in the outer container.
Even though other error sources are theoretically possible such as leaking outer containers, the error can generally be attributed to incorrect measurement practices. Very similar errors are known from other networks, e.g. from Brazil (P. V. da Costa Pereira, 2016, pers. comm.). Much higher threshold values could be found at stations located in regions where PRCP amounts of extreme events are very large. For instance, at some stations in the Bolivian Amazon basin, the instrument's storage capacity of 200 mm was too small for several 1-day PRCP events. The same may happen if observers do not take measurements with the required frequency that may be increased during intense PRCP events. At a station of MeteoSwiss, for example, this issue lead to a partial heavy precipitation truncation in the 1970s and 1980s (experienced personally by co-author Mario Rohrer).
Missing heavy PRCP events are a severe data quality problem, and affected time series are not suitable for most analyses. However, some of the ETCCDI indices, such as the 'annual count of days when PRCP ≥ 10 mm' (R10mm) or the 'maximum length of dry and wet spells' (CDD and CWD, respectively), are not affected by a 20 mm PRCP truncation. In the case of station records of Aguirre, however, the accumulation of values around 10 mm and missing values between 1 and 2 mm (see Section 4.2.2) impede the application of such indices.

Small precipitation gap
PRCP records may have missing measurements of small PRCP events. This may affect a range of very low values (e.g. 0-1 mm), but may also reach up to two-digit numbers. Small PRCP gaps do not usually come along with an increase of missing values, but with an increase of precipitation free days. In some cases, an often uniform accumulation of very low values (≤1 mm) is found instead of an increase of zeros.
A large fraction of about 60-65% of the Bolivian and Peruvian time series is affected by small PRCP gaps. A typical example of the error is found in the records of Chorocona, Bolivia (Figure 4(a)). As in many cases, the size of the gap varies in time. Presumably, the most frequent cause of this error is attributable to observer behaviour because the detected pattern occurs if measurements are taken only after substantial rain events. As a result, small PRCP events remain unrecorded and get accumulated. Most observers will fill in a zero for the missed measurements, but some observers seemingly recorded a uniform low number (≤1 mm) as a rough estimation instead. Another cause that may lead to small PRCP gaps is the degradation of the scales used for the measurements. Some of the scales are weathered and the marks are faded, especially towards the scale's ends (Figure 4(b)). Unrecognizable scale and water marks may get interpreted as zero PRCP.
Small PRCP gaps are also found in other station networks. For example, the error occurs quite frequently in  Brazilian station records (P. V. da Costa Pereira, 2016, pers. comm.) and the United States COOP network (Daly et al., 2007). If we have good reasons to assume that PRCP amounts of small rain events are accumulated and accredited to the next measurement, the impact of the error on yearly or monthly PRCP sums may be insignificant. Hence, the time series may be used to calculate the ETC-CDI index 'Annual total precipitation in wet days' (PRCP-TOT). However, note that wet days (WDs) are defined as daily PRCP ≥ 1 mm (Klein Tank et al., 2009). If rain events get accumulated, also PRCP events <1 mm get integrated in the index calculation. On the other hand, evaporation losses are increased if collected rainwater remains longer than 1 day in the rain gauge. In arid climates, these two effects may not be negligible. Other indices that depend on the count of WDs, such as the CDD and CWD or the 'simple precipitation intensity index' (SDII), are strongly affected by small PRCP gaps. Indices on extreme PRCP events, such as the 'monthly maximum 1-day precipitation' (Rx1day) or the R10mm, are altered by the accumulation of rainfall. To what extend these indices are biased depends on the specific peculiarity of the error.

Weekly precipitation cycles
PRCP frequency in time series may occur in weekly cycles. This may happen because no observations are taken on a certain day of the week. Mostly, weekends are affected, as daily routines on those days usually differ from workdays. The typical resulting pattern is a PRCP lack followed by a PRCP excess because of the multi-day rainfall accumulation. Missed measurements should be marked by an unambiguous code (e.g. a horizontal line '-'). Furthermore, the observation practice should be mentioned in the metadata. Hence, data users are informed about the issue and can take appropriate measures. Some relevant methods are discussed by Viney and Bates (2004). However, weekly PRCP cycles are also found in station records without indications of non-daily measurements. Any manned station may be affected if the observer uses an inappropriate missing value code such as a zero (Durre et al., 2010). Furthermore, missed measurements could be associated with the negligence of the observer's task, which may tempt observers to insert a zero rather than a missing value code. Weekly PRCP cycles were detected in various manned station networks worldwide. Viney and Bates (2004) found that the majority of Australian high quality PRCP time series are affected, and Schmocker et al. (2016) detected the error in about 20% of station records in the Mount Kenya area.
However, not only observer routines, but also air pollution caused by human activity may cause weekly PRCP cycles (Simmonds and Keay, 1997;Cerveny and Balling, 1998;Wilby and Tomlinson, 2000). These findings are yet often indistinct and not confirmed by other studies (DeLisi et al., 2001;Schultz et al., 2007;Stjern, 2011). Roughly half of the relevant publications have found pollution-induced weekly PRCP cycles, while the other half did not (Tuttle and Carbone, 2011). Generally, the signal of observer behaviour on weekly PRCP cycles is much stronger than the potential influence of air pollution. Furthermore, the two causes lead typically to fundamentally different patterns of weekly cycles.
Identification of weekly PRCP cycles requires specific checks. Results may well be indistinct, because the missed measurements may occur irregularly and stretch over more than 1 day of the week. Furthermore, checks should be made not only for the whole time series but also for shorter time periods, as the problem usually occurs temporarily. The weekly distribution of WDs is a much better indicator than the highly variable weekly distribution of PRCP amounts. Viney and Bates (2004) suggest to compare the number of wet Sundays with the assumingly unaffected mid-week days (Tuesday to Friday). This method seems accurate for the Australian dataset, where stations were predominantly run by public service employees with regular working hours on weekdays only. However, it may not be transferable to other networks and countries, where the observers' occupations are manifold and Sundays are not generally days off. Furthermore, date shifts (see Section 4.3.3) may misalign the weekly cycle pattern. We therefore suggest a more general approach that does not assume affected Sundays and unaffected weekdays. Instead, each day of the week is expected to have the same probability for PRCP, which is calculated by dividing the total number of WDs by the total count of measurements. The number of WDs of each day of the week is then tested by a two-sided binomial test on the 95% confidence level. The same test should be applied for shorter time periods such as single years. A row of consecutive years with days outside the confidence interval indicates a time period affected by weekly PRCP cycles.
Weekly PRCP cycles are found in roughly 15 and 5% of the station records in Bolivia and Peru, respectively. A typical case of the error was detected in the time series of San Calixto, Bolivia. A significant WD lack on Saturdays is followed by a significant WD excess on Sundays ( Figure 5). The signal on Saturdays is more pronounced than on Sundays. This pattern is explained due to rain events that extend over several days, contributing to the WD lack on Saturdays but not to the WD excess on Sundays. The test on annual time scale revealed that the problem started after a measurement interruption in 1992. Since 1993, the fraction of WDs differs strongly between Saturdays (0.1) and Sundays (0.3), while the period before 1992 is unaffected. The rain gauge of San Calixto is located at a Jesuit geophysical observatory (www.osc.org.bo). Until 1991, the observatory's director Rev. Ramón Cabré Roige (Udias, 2003) was responsible for the measurements. Since the resumption of the observations in 1993, measurements are taken by the observatory's caretaker who has his days off on Sundays. Because measurements are taken in the morning, they are ascribed to the previous day, which shifts the WD lack from Sundays to Saturdays.
In some station records, significant weekly PRCP cycles are found that do not follow explicable patterns. For instance, there are several cases where the WD fraction is evenly distributed among the days of the week except for one day with a much higher WD fraction. Such a pattern may occur if measurements are taken very infrequently (often just once a week) or if external factors disturb the observations. For instance, water from irrigation of the station's compound or vicinity may fill the rain gauge, as it happened at a station in Italy (Y. Brugnara, 2016, pers. comm.). Hence, if the pattern of a weekly PRCP cycle cannot be attributed to a certain cause, the records should be considered suspicious. However, if the error is explicable such as in the case of San Calixto, it is not unreasonable to expect that total rainfall on larger time scales will be more or less preserved (Viney and Bates, 2004). Hence, the disturbance of rain sums is most likely negligible on time scales ≥7 days. However, the data should not be used on shorter time scales, such as for the ETCCDI indices Rx1day or Rx5day. Furthermore, indices using thresholds, Figure 6. (a) The minimum temperature (TN) time series of Tiraque (Bolivia) reveals six major measurement precision transitions in 1956, 1970, 1976, 1991, 1998, and 2002. Precisions switch between 0.1 ∘ C, 0.5 ∘ C, 1 ∘ F, and 1 ∘ C. (b) Yearly frequencies of decimal numbers of the TN time series of Tiraque. The state of the measurement precision cannot be clearly defined in some segments (e.g. 1971-1975 or 1998-2002). Between 1976 and 1991, measurements were originally recorded in full degrees Fahrenheit.
such as the R10mm or the CWD, are also affected by weekly PRCP cycles.

Measurement precision inconsistencies
In this section, the term 'precision' is not used in a mathematical way, but rather as it is used in previous studies Rhines et al., 2015). Hence, the measurement precision is defined here as the precision of the reported observations, which may be influenced by the instrument type, the observing practice, unit conversions, or transcription errors. Coarse measurement precision and switches of the precision state are known from a large number of datasets (Schaal and Dale, 1977;Nese, 1994;Trewin, 2010;Rhines et al., 2015). This is an important source of uncertainty that has not been adequately addressed (Rhines et al., 2015). Therefore, Rhines et al. (2015) introduced a precision decoding algorithm based on a hidden Markov model that was successfully applied on a large number of time series. Currently, the algorithm requires that the observers follow deterministic rules for rounding and conversion of the data. If the data structure diverges from this idealized behaviour, the algorithm may not perform satisfactorily. This was the case for the vast majority of the Bolivian station records.
The most frequent measurement precisions in the Bolivian and Peruvian datasets are 0.1, 0.2, 0.5, and 1.0 ∘ C for temperature, and 0.1, 0.5, and 1.0 mm for PRCP. Nearly all records contain precision transitions and segments of low precisions (such as 1 ∘ C or 1 mm). Often, the measurement precisions are indistinct and cannot be assigned to one precision state. Furthermore, odd frequencies of decimal numbers are detected in many time series. For instance, in several cases only the decimal numbers 0-4 are reported, while the decimal numbers 5-9 are completely missing, and sometimes only the decimal numbers 0 and 2 are reported.
An illustrative example of precision inconsistencies is found in the time series of the Bolivian station Tiraque ( Figure 6(a)). Note the large inhomogeneities that may come along with precision switches (e.g. in 1976). Between 1976 and 1991, the precision structure can be attributed to a unit conversion from full degrees Fahrenheit to 0.1 ∘ C. A typical fingerprint of this conversion is the absence of 'fives' in the decimals of the Celsius scale. However, in the time series of Tiraque, 'sixes' are missing instead (Figure 6(b)). Apparently, degrees Fahrenheit values that convert to Celsius decimals of 5/9 (e.g. 42 ∘ F) were erroneously rounded down.
As precision transitions occur usually simultaneously in TX and TN (sometimes also in PRCP), the most likely cause is observer changes. However, instrument replacements (identified as the main cause in Tiraque), new observer instructions, or new measurement forms may also cause precision inconsistencies. Odd frequencies of decimal numbers could be explained by erroneous instrument reading. For instance, if thermometer tick marks of 0.2 ∘ C are just counted or interpreted as 0.1 ∘ C, only the decimal numbers 0-4 will be reported. Furthermore, transcription errors (see also Section 4.3.3) may influence the measurement precision. For example, a consistent misinterpretation of observer's handwritten 'fives' by 'twos' leads to data records containing only decimal numbers of 'zeros' and 'twos' if the original precision state is 0.5 ∘ C. As previously demonstrated, unit conversions may also lead to precision inconsistencies. Even though the Celsius scale has always been in use in the Central Andean area, several time series were apparently converted from Fahrenheit to Celsius scale. Similar cases were found for PRCP, where original measurements in 0.05 or 0.10 inches were converted to the metric scale. In Tiraque, original documents confirmed that measurements were reported in full degrees Fahrenheit. Most likely, the issue emerged due to the use of thermometers having both Celsius and Fahrenheit scales, which misled some observers to report the values of the wrong scale.
Odd frequencies of decimal numbers and, to a lower extent, unit conversions introduce a bias in the time series and are therefore the most problematic form of precision inconsistencies. Low and varying precisions, on the other  hand, do most likely not adversely affect means and standard deviations of large and long term datasets (Nese, 1994). However, many statistical methods assume that data are continuously distributed (Rhines et al., 2015). For instance, the effect of precision inconsistencies is not negligible for threshold statistics (Schaal and Dale, 1977;Nese, 1994;Zhang et al., 2009;Rhines et al., 2015). Hence, the error affects a large number of the ETCCDI indices, such as the 'number of frost days' (FD), 'number of summer days' (SU), or the CDD and CWD.

Reduced variability in time series segments
A segment in a time series may have clearly reduced variability compared to the rest of the records. Different climate variables may be affected simultaneously. The error is detected rather easily by data visualization or by algorithms searching for inconsistencies of the standard deviation.
Roughly 20-30% of the Bolivian and Peruvian stations records are affected by this error, whereas TX and TN are more frequently affected than PRCP observations. An exemplary case of reduced variability in time series segments is found at the station Urubamba in Peru (Figure 7). The figure only shows TX (Figure 7(a)) and PRCP (Figure 7(b)) records, but TN as well as temperature measurements at 0700, 1300, and 1900 LST are affected too. As all measured variables are affected, the most likely cause is an observer error. In Urubamba, the affected time period coincides with the outbreak of the internal conflict between the Peruvian government and Maoist and socialistic guerrilla groups in the early 1980s.
During social unrests, the frequency of quality issues and data gaps in weather observations is increased. For example, the period of World War II is data-poor in many datasets (e.g. Brunet et al., 2014aBrunet et al., , 2014b. Data quality may be affected by station relocations to unsuitable, but easier accessible sites, changes to new and barely instructed observers, or pure estimation of measurements (Westerberg et al., 2009). It has also been found that broken thermometers may cause a variability reduction. Furthermore, if the issue occurs abruptly and uniformly, it may be caused by a decimal point error or an untransformed unit change. However, a clear error attribution is usually not possible such as in the case of Urubamba, and the affected time series segments cannot be used for any data analyses.

Transcription errors
Data in a digital dataset may differ from original handwritten values due to errors in the transcription process that are hard to avoid completely. If the digitization of the data is done in a double-keying process, comparison of the independent transcriptions allows for the detection of inconsistencies (Kunkel et al., 2005;WMO, 2007). This method is partly applied at SENAMHI Peru. If inconsistencies between two transcriptions are found or if there are other doubts about the transcription, comparing the digital data with the original documents is mostly the only way to clearly detect and correct transcription errors. The benefit of such comparisons was mentioned in earlier studies (e.g. Durre et al., 2008;Fiebrich and Crawford, 2009;Brunet et al., 2014a). The most common transcription errors in the datasets of SENAMHI Bolivia and AASANA are described below.
Incomplete availability of original data sheets at the time of digitization caused some considerable data gaps in the digital Bolivian datasets. Furthermore, insufficient understanding of the original document structure by data transcribers resulted in various errors. For instance, in the original data forms used at the airports, the sign of temperature measurements is reported in a separate column. In some cases, the sign was not transcribed, leading to missing negative temperature records. Similarly, decimals numbers of PRCP in the same forms are stored in a separate column and were therefore not always transcribed. In some transcriptions, negative temperature values were interpreted as either negative or positive decimals places.
According to WMO guidelines (Plummer et al., 2003), TX and PRCP measurements should be credited to the previous day if only a morning measurement is taken at a station. It is reported from many networks that such shifts were done directly when reporting the observation, either manually by the observer (Reek et al., 1992;Fiebrich and Crawford, 2009) or by means of specially keyed data forms (Menne et al., 2012). In other cases, data shifts were introduced in post processing (Turco et al., 2013). However, shifting practices were often not applied uniformly and they changed in time, resulting in undocumented inhomogeneous shifts among many datasets (Reek et al., 1992;Kunkel et al., 2005;Menne et al., 2012;Turco et al., 2013). The comparison between original documents and digitized data in Bolivia revealed many discrepancies of dates, varying between back shifts of time series of up to 3 days and forward shifts of 1 day. Note that shifting practices are not recorded in Bolivian metadata, as it is the case for other networks (e.g. Kunkel et al., 2005). Besides accessing the original documents, detection of shifts may also be possible by comparing measurements of neighbouring stations (Reek et al., 1992;Menne et al., 2012) or using other datasets as references (Turco et al., 2013). However, these methods rely on high correlations between time series, and the shifting state of the reference data must be known. This hardly applies for the Bolivian station observations.
Other, less frequent transcription errors in the digitized Bolivian datasets are the misinterpretation of the observer's handwriting (see also Section 4.3.1) and swapping of time series segments (e.g. the months June and July). Similar problems are reported from many other manned networks (Feng et al., 2004;Fiebrich and Crawford, 2009;Westerberg et al., 2009).

Detection of systematic errors
Many QC methods have problems in detecting the data quality issues described in this study. The error detection of most approaches focuses rather on suspicious single values than on patterns in the data. The frequently used spatial consistency test relies on the availability of suitable neighbouring stations, and the test's efficiency is hence impeded by complex terrain and sparse station networks (Plummer et al., 2003;Kunkel et al., 2005;WMO, 2008). Furthermore, the application of internal consistency tests that draw on the physical relationships between different climatological variables (Plummer et al., 2003;Trewin, 2013) may be limited due to a low number of observed variables.
As QC tests should be designed to detect known data problems (Durre et al., 2008), existing methods should be enhanced for the detection of systematically occurring data quality issues. However, there are three main difficulties: in datasets such as those from the Central Andes, data quality issues occur in a highly variable manner in time, the errors have a large amount of individual specificities, and several errors may occur simultaneously. Hence, even methods that have been successfully applied on other datasets [e.g. the method for precision decoding by Rhines et al. (2015), see Section 4.3.1] may be ineffective in such cases. Possibly, novel error detection approaches that do not depend on previously defined assumptions such as machine learning (Bishop, 2006;Alpaydin, 2014;Lakshmanan et al., 2015) could successfully address the issue. However, the use of such approaches in error detection of station observations is a new field, and to the authors' knowledge, no such work has yet been published in a peer reviewed journal.
According to WMO (1993), machine and man together can achieve much more in quantity and quality of work than either can alone. For the Bolivian and Peruvian station networks, this statement certainly still holds true. Data visualization is an excellent tool that can be of great help, as the human mind is especially skilful at identifying spatial patterns (Plummer et al., 2003;Klein Tank et al., 2009). Hence, visual QC allows the detection of all sorts of quality issues, independent of specific error characteristics. Furthermore, visual QC allows for the assessment of systematic errors, which is crucial for the further use of the data (see Section 5.2). Various QC software tools, such as RClimDex (Zhang and Yang, 2004) or RClimTool (Llanos, 2014), offer graphic data outputs. However, these plots are not necessarily suitable to detect systematic patterns in the data. For instance, time series are often visualized by lines that often hide certain errors such as missing temperature intervals. A more illustrative general overview is achieved by point plots.
The methods used in this work to detect and identify common data quality issues may be applied to any observational time series, as they do not depend on neighbouring stations or measurements of other, physically related climate variables. Of course, the suggested methods are not a self-contained QC approach, but they should be combined with other test methods. As systematically occurring data quality issues may reduce the effectiveness of detection algorithms, we recommend identifying and treating (see Section 5.2) these errors before applying other QC tests. Furthermore, data quality issues may require adapting the parameters of statistical QC tests. For instance, the threshold for consecutive equal values must be higher if measurement precisions are low in order to avoid overdetection.

Overcoming systematic errors in time series
The best way to overcome systematically occurring errors is to prevent their occurrence. As in other manned station networks (e.g. Fiebrich and Crawford, 2009), a large number of data quality issues in the Bolivian and Peruvian records can be traced back to observer errors. Hence, the importance of well-trained and informed personnel cannot be stressed enough (Plummer et al., 2003). Strengthening observer training and providing clear and simple measurement and maintenance guidelines may have a large positive impact on data quality with relatively little effort. Therefore, in the context of this study, readily comprehensible guidelines for station observers were produced. However, in order to maintain the quality of the measurements, further actions are necessary. WMO (2008) suggests near real-time QC for manned stations, because error sources can thereby be identified and eliminated. Data transmission methods such as the data transfer by mobile phone at some Peruvian stations offer new options for such approaches. Also, controlling the quality of station records that are transmitted conventionally directly after delivery, and intervening immediately if needed, could reduce the error rate drastically.
However, data users can generally not influence measurement practices, and such efforts are an investment for the future rather than of use for today's climatological research. Hence, data users have to deal with data quality issues in many datasets. The rejection of time series affected by errors may result in substantial loss of available records (e.g. Westerberg et al., 2009;Kizza et al., 2012), and the removal of suspicious values causes data gaps. However, the completeness of time series is very important, as any dataset containing a large amount of missing data is of little use to climatologists (Plummer et al., 2003). According to WMO (1989), not more than five missing temperature measurements in a month (only three if occurring in succession), and no missing daily PRCP totals (if rain is suspected) are permitted for calculating monthly means and sums, respectively. Hence, strategies should be defined on how time series affected by data quality issues may still be usable for climatological analyses. This is particularly important for regions of sparse station networks.
Besides rejecting station records affected by systematically occurring errors from further use, there are four options that may restore the data's usability: 1. Error assessment: Identifying the specific error characteristics and potentially detecting the error source enables users to decide for what analyses the data are still usable. This is demonstrated by means of ETC-CDI indices in Section 4. As the specificities of the data quality issues vary significantly, each case needs to be evaluated separately. The most effective tools for the error assessment are suitable data visualizations. 2. Data correction: If the source of an error is clearly identified and quantified, the data may be correctable, as in the case of the TN time series of Progreso (see Section 4.1.1). Whenever possible, corrections should be made in order to maintain the largest possible applicability of the data. However, corrections must be done carefully in order to avoid inappropriate alteration of single measurements. Ideally, other data sources should be used to confirm the correction. In the case of Progreso, for instance, parallel thermograph measurements may allow a cross control. 3. Time series trimming: If an error is not correctable and compromises the intended purposes, the affected time series segment may be removed. The remaining unaffected records, if still of useful length, can be used for the intended study. 4. Data adjustment for certain analyses: Some data quality issues can be addressed by approximating the original data structure in order to make the data adequate for specific analyses. For instance, Zhang et al. (2009) introduced a method to artificially restore a data precision of 0.1 ∘ C, which is the measurement precision recommended by WMO (1996WMO ( , 2008. The approach makes data of low precision suitable for threshold statistics, and hence for various ETCCDI indices (see Section 4.3.1). As good results are achieved even if measurement records are over-adjusted , the method is also applicable to time series of inconsistent measurement precisions and indefinable precision states.

Conclusions
Data quality issues are often not adequately addressed and contribute to the uncertainty in results of climatological studies. As many factors affecting the quality of observational records (e.g. poor station siting) are hardly detectable in the measurement data, metadata are an important source for assessing data quality. However, in many station networks, metadata are largely missing and/or of poor quality. For such cases, this study suggests different methods on how to generate or improve metadata. The most common systematically occurring errors in the predominantly manned Bolivian and Peruvian station networks were identified and attributed. The same or similar errors are found in many other station networks worldwide, especially if observation conditions are comparable. Most of the data quality issues are related to observer errors. Therefore, the most effective way to prevent the occurrence of systematic errors is to improve observer training and to establish a near real-rime QC that allows quick intervention if an error occurs.
Many QC approaches have trouble in detecting systematically occurring data quality issues. These methods should be enhanced in order to address this shortcoming. However, the detection of such quality issues is made difficult by the high temporal variability of error occurrence, the vast number of error specificities, and error interactions. As data quality issues described in this article affect the efficiency of many QC algorithms, these systematic errors should be treated first, and parameters of statistical QC tests should be adapted if required.
Data visualization has been shown to be an effective tool to identify and attribute data quality issues. Providing this information enables data users to decide for what purposes a station record is still usable, although affected by a systematic error. In ideal cases, the defective data are even correctable. Furthermore, affected time series segments may be removed from the station records, and for some specific applications, methods to adjust data can be applied. The gain of usable time series achieved by this approach is particularly important in sparse station networks.
All factors that affect data quality may also cause break points in time series (WMO, 2008) and may hence hamper the detection and correction of other inhomogeneities (Domonkos, 2013). This is particularly critical in areas with low station density such as the Central Andes, where low station correlations strongly reduce the performance of relative data homogenization methods, and where data adjustments potentially increase the inhomogeneity of the data (Gubler et al., 2017;personal communication). Because homogenization of monthly or annual means may not be sufficient for daily time scales (e.g. Costa and Soares, 2009;Trewin, 2013;Brönnimann, 2015), daily homogenization would be needed to study extremes that are often most strongly affected by data quality issues. This, however, would require even higher station correlations due to the lower signal-to-noise ratio on shorter time scales. Furthermore, statistical data adjustments of the common data quality issues presented in this article would often not be adequate because of the discrete character of the errors. In summary, the selection and treatment of time series based on a comprehensive QC reduces the number of inhomogeneities. This may thus contribute to improving data homogeneity in the Central Andean and comparable networks more than statistical data homogenization methods. As a next step, nevertheless, carefully applying statistical data homogenization may further increase the data homogeneity. The findings on the systematically occurring data quality issues of this study may be useful for the creation of realistic benchmark datasets for QC and homogenization algorithms.