Analyzing the impact of automatization using parallel daily mean temperature series including breakpoint detection and homogenization

High‐quality time series of meteorological observations are required for reliable assessments of climate trends. To analyze inhomogeneities in time series, parallel measurements can be used. Germany's national meteorological service DWD (Deutscher Wetterdienst) operates a network of climate reference stations. At these stations, manual and automatic observations have been taken in parallel. These parallel measurements therefore allow analyzing the impact of the transition on the homogeneity of time series of several meteorological parameters. Here, we present results for temperature. The differences between automatic and manual measurements are tested on breakpoints caused by instrumental defects or changes in the measurement conditions. The time series are highly correlated such that small breaks can be identified. The detected breakpoints are verified against metadata if available. In the case of no available metadata information, a procedure is suggested to identify the inhomogeneous time series (manual or automatic time series). Afterwards, the time series are homogenized. The homogenized time series are used to analyze the impact of changing the observing system from manual to automatic measurements on daily mean temperature.


| INTRODUCTION
Parallel measurements provide information on how changes in the observing system can affect time series. Furthermore, these measurements can determine uncertainties and can be used to control the quality of the data. If the behaviour of the differences changes significantly, the change can indicate a break in at least one time series and therefore a need for homogenization. Most homogenization methods require a reference series. In the case of parallel measurements, each of the time series can be used as reference series. Usually, parallel measurements are highly correlated which facilitates the breakpoint detection and homogenization.
Thermometer screens have an influence on the temperature measurements. Therefore, several studies have been performed to compare thermometer screens (Brandsma and Van der Meulen, 2008;Brunet et al., 2011;Hoover and Yao, 2018) or changes from unscreened to screened measurement conditions (Böhm et al., 2010). The measurement arrangements or meteorological conditions such as wind speed, cloud cover, or radiation can influence temperature measurements. Large radiant flux can induce significant differences. The radiation effect on temperature measurements can be minimized by applying small-sized sensors (Erell et al., 2005). Auchmann and Brönnimann (2012) used parallel measurements to evaluate a physics-based correction model to homogenize temperature data. At German climate reference stations, manual and automatic measurement instruments are operated in parallel and can therefore be directly compared. Kaspar et al. (2016) analyzed temperature measurements with the result of only minor differences in the comparison of manual and automatic observations at the traditional observing times (06: 30 UTC,13:30 UTC and 20:30 UTC). The analysis of daily maximum temperature revealed an annual cycle in the time series of the differences with warmer automatically measured temperature maxima in summer at some stations. The main reason is a radiation effect on the shelter (LAM 630) used for automatic measurements. This error can be reduced by optimizing the position of the automatic instrument in the shelter (see Kaspar et al., 2016). Another reason for the annual cycle in the differences of daily maximum temperature in Germany is the different screen characteristics (e.g., shelter size and ventilation) between the modern and the historical screen. Parallel measurements of daily sunshine duration are analyzed in the study of Hannak et al. (2019) with the result of significant differences between manual and automatic daily sunshine duration measurements. To homogenize the daily sunshine duration data, a regression model (as introduced in their study) can be used to adapt the automatic measurements. Baciu et al. (2005) compared automatic and historical observations in Romania with the result of minor differences of daily mean temperature values but larger differences for daily minimum and maximum temperature values. Doesken (2005) analyzed the impact of automatization on temperature measurements at one station in the United States.
Usually, parallel measurements have a short temporal coverage. For this reason, nearby stations are often used to detect breaks and to homogenize the data ('called relative method'). In most cases, the correlation between nearby stations and the candidate time series is smaller than using parallel measurements such that small breaks are difficult to detect or to homogenize. Most homogenization procedures are applied on annual or monthly data like the HISTALP dataset (for the Alpine region). Their method includes relative homogeneity testing and metadata information (Auer et al., 2007). Monthly mean temperature and precipitation time series of Switzerland are homogenized by applying the software THOMAS (Begert et al., 2005). Israelian time series of temperature maxima and minima are homogenized by Yosef et al. (2018) and the homogenized data is used for trend analysis. Hannart et al. (2014) introduce a fully automatized breakpoint detection method using pairwise comparisons of the candidate series and neighbouring series, building groups of breakpoints and homogenize yearly time series in Argentina. In their study, the trends in long temperature series are stronger after homogenization. Peterson et al. (1998) introduce several breakpoint detection and homogenization methods used worldwide and discuss the limitation of homogenized data. An updated review of homogenization methods and breakpoint detection can be found in the study of Ribeiro et al. (2016). They conclude that relative methods (with reference series) are better than absolute methods and breakpoint detection methods which are able to detect multiple breakpoints are better than detection methods which can only detect one breakpoint and are run several times to detect multiple breakpoints.
Some studies focus on the homogenization of daily data. The breakpoint detection and homogenization of daily data is complicated by higher variability and autocorrelation compared to annual or monthly data. Breaks can affect the mean and higher-order moments which aggravates break detection and homogenization. To homogenize daily data the software SPLIne Daily HOMogenization (SPLIDHOM) can be used (coded in R [R Core Team, 2015]). SPLIDHOM uses an indirect nonlinear regression method which uses cubic smoothing splines and can adjust the mean and higher-order moments of the candidate series (Mestre et al., 2011). The method which is applied by Della-Marta and Wanner (2006) adjusts the mean and higher-order moments of daily temperature time series as well. Very similar to that Toreti et al. (2010) have enhanced that method to handle autocorrelation and uses an objective parameter estimation. Kuglitsch et al. (2009) homogenize daily maximum temperature series. In their study, the breaks are detected with nearby stations. To adjust the mean and higher-order moments of the candidate series a nonlinear regression method is used which requires a highly correlated reference series. Daily temperature data is homogenized by Hewaarachchi et al. (2017) using metadata information, a reference series and deals with the seasonal cycle and autocorrelation of the series. Lund et al. (2007) considered autocorrelation and periodic features in time series to detect breakpoints.
There also exist fully-or partly automatic homogenization software tools. The European project COST ES0601 (HOME) compared homogenization software tools. The software MASH, PRODIGE and ACMANT showed good results for temperature data. HOMER is a R-software combining features of several tested software tools and was developed after this project. It can be used with metadata in a semi-automatic mode and fully automatically (Mestre et al., 2013).
In this study, we use parallel measurements of temperature, aggregated to daily mean values, to detect breaks and to homogenize these time series. The parallel time series are highly correlated and can be used as reference series for each other (used for the breakpoint detection and homogenization step). Three different homogenization methods are compared to evaluate if they are able to homogenize the detected and identified breaks. The homogenized data are compared to the results of Kaspar et al. (2016). In the first part, the data and methods are introduced including the breakpoint detection, the identification of the inhomogeneous time series and the homogenization method. Afterwards the results of parallel measurements at 13 stations in Germany are summarized. The homogenized data is compared to the raw data in the next part. Finally the results are summarized.

| DATA AND METHODS
In Germany, historical measurements of air temperature were performed with a mercury-in-glass thermometer three times per day. Therefore, this setting is also used for manual measurements at climate reference stations (currently at 6:30 UTC, 13:30 UTC and 20:30 UTC). To calculate daily mean values these three observations are used with double weight on the evening value. The manual instrument is inside a wooden Stevenson screen. To directly compare daily mean temperature values of manual and automatic observations, the same equation was applied to the automatic measurements. Even though the temporal resolution of automatic measurements is higher, only values at 6:30 UTC, 13:30 UTC and 20:30 UTC are used for this comparison. The automatic instrument is a platinum resistance thermometer (PT100, manufacturer Ketterer). At most sites, the ventilated lamellar shelter 'LAM 630' (manufacturer Eigenbrodt) is used for automatic temperature instruments. Exceptions are the stations Brocken (at this station a shelter called 'Gießener Hütte' is used), Fichtelberg and Frankfurt airport (until October 2014) where the Stevenson screen is used for automatic and manual instruments. Figure 1 shows Frankfurt (airport) as one example of a German climate reference station. The geographical position of the stations and the time period of available parallel measurements are summarized in Table 1 (see Hannak et al., 2019). In Kaspar et al. (2016) more information about the instruments and characteristics of climate reference stations can be found.
To filter outliers and to control the data quality, differences greater than four times the pseudo standard deviations (SD) are excluded from both time series. After Lanzante (1996), the pseudo SD can be calculated by the interquartile range divided by 1.349. The pseudo SD is less influenced by outliers itself which is the reason for preferring the pseudo SD instead of the 'original' SD. For a Gaussian normal distribution, the pseudo SD and the SD are equal. This is a very strict outlier control but the results of the detection of breaks and the homogenization are improved by excluding outliers. The number of outliers is summarized in Table 1.
The breakpoints in the time series are compared to metadata information (modification history) of the instrument or shelter type. Examples of available metadata information are the date of a replacement or the date of a calibration.

| Detection of breaks
To detect breaks in time series, differences of automatic minus manual daily mean values (difference series) are used. The assumption is, that both time series have a similar climate signal, such that the difference series do not include climate features like annual cycle, trend, etc. The breakpoint detection is performed using the R-function 'uniseg'. The R-function 'uniseg' (part of the R package 'cghseg') was originally developed for F I G U R E 1 Example of one climate reference station (station Frankfurt airport) comparative genomic hybridization (CGH) data, but works for difference series of climate data as well (Picard et al., 2016). The identification of the positions of breakpoints is based on a dynamic programming algorithm for joint segmentation and uses a maximum likelihood criterion to find the best number of segments and the best position of these breakpoints. More information about the method and algorithm can be found in Picard et al. (2011).
The first step in the breakpoint detection procedure is the calculation of monthly mean differences between automatic and manual measurements. Then, the R-function 'uniseg' is used to detect breaks in the monthly difference series. The next step is to use the daily difference series. Within a time range of plus/minus 2 months around the break detected using monthly data, 'uniseg' is used with daily data to get a more precise break date. If 'uniseg' is not able to detect a break in this time range using daily data, the break date based on the monthly data is used for further steps. Figure 2 shows an example for the results of the method 'uniseg' with monthly and daily data.

| Identification of time series with breaks
Differences facilitate breakpoint detection but do not provide information about the time series responsible for the break in the difference series. To identify the inhomogeneous time series, different comparisons are made, for example, with metadata information or nearby stations.
For first comparison, metadata information of the manual and the automatic instrument is used. In a given T A B L E 1 Time range with parallel measurements; location and elevation (in meters), pairs of data, and number of outliers of each climate reference station (Hannak et al.,  time range around the break date, metadata information for each instrument is counted ('metadata score'). The metadata information with the smallest time lag (in days) between the break date and the metadata information has an extra weight (+0.5 to the total 'metadata score'). The time range used for the comparison depends on the signal-to-noise ratio (SNR) and the probability to miss a break (Lindau and Venema, 2016). The SNR is calculated after SNR = |D/2|/σ, where D is the difference of the mean value (daily data) before the break and after the break (Lindau and Venema, 2016). When there are multiple breaks in the time series, D is calculated with the mean values of two subsequent segments and the SD σ of the first segment is used. For the second comparison, a reference series is calculated with the help of nearby stations. The stations are weighted with their correlation coefficients between the day-to-day changes of daily mean temperature of the automatic time series and the day-to-day changes of the neighbouring station. A minimal correlation is set to 0.9. Only stations with a higher or equal correlation coefficient are used for the estimation of the reference series with the following equation (Alexandersson and Moberg, 1997): where x j stands for the different time series of the nearby stations, y mean is the mean value of the automatically measured time series and cor j are the correlation coefficients of each station. In a given time range around the breaks, all breaks detected by 'uniseg' (using the differences of automatic/manual observations minus the reference time series) are counted. The third comparison is based on related parameters. One automatic instrument is used to measure daily mean, daily maximum and daily minimum temperature. For manual measurements three different thermometers are used. So if there is a break in more than one difference series of different parameters, it is likely that the automatic instrument is causing the break.
Finally, the presumed inhomogeneous time series can be derived from the three comparison. The total score of the first comparison is weighted four times, the total score of the second comparison is weighted twice and the total score of the third comparison is weighted once. If the sum of these scores for the automatic instrument is larger than for the manual instrument, then it is likely that the automatic instrument is causing the break and the automatic time series has to be homogenized. If the score of the automatic and the manual instrument is equal, it is not possible to draw a conclusion which instrument is responsible for the break in the difference series and therefore no homogenization can be done. Figure 3 summarizes the procedure of breakpoint detection and identification of the inhomogeneous time series.
A final comparison is carried out at the end of the procedure to identify a break date that is as accurate as possible. First, the instrument with the highest total score is identified. Afterwards, it is checked whether there is a metadata information of the instrument in the given time range. If there is metadata information, the metadata information with the smallest time lag in days (within the given time range around the break) is used as break date instead of the break date detected by 'uniseg'.

| Homogenization
With the detected breakpoints and the information about the inhomogeneous time series (manual or automatic), the data can be homogenized. The first step is to divide the time series into segments. The breakpoints define the segment areas. The most recent segment is used as training period and the other segments are adjusted to that segment. For each time series, different segments are used (dependent on their break dates) but the training period is the same for both series. The segments are adjusted with the oldest segment first. At the end, all segments are adjusted to the training period by using the difference series of automatic minus manual observations.
To homogenize the data, three different methods are used. The first method is called Linear Scaling (similar to Vincent et al., 2002). For this method, monthly correction factors are estimated to homogenize the data. The monthly correction factors are determined by the differences of the mean differences (candidate minus reference) between the training period and the break period.
For example, data are available from January 1, 2008 to January 1, 2015 and the automatic time series has a break on May 1, 2010. To calculate a correction factor for January, the mean value of the difference series of all January values in the period January 1, 2011 to January 1, 2015 (training period) is calculated. This value is compared to the mean difference of all January values in the period January 1, 2008 to January 31, 2010 (break period). The difference of these two mean values is the correction factor for January. To correct January values in the period January 1, 2008 to January 31, 2010, the January correction factor is subtracted from the automatic observations. The same approach is repeated for each month.
With these monthly factors, monthly data can be corrected. To homogenize daily data, the monthly factors are smoothed using a spline. With this method, every day of the year has an own correction factor. This method only corrects the data in the mean value, not in the higher order moments. If the break also affects the SD of the time series, the method is not able to correct this feature.
The second method to homogenize daily data was suggested by Della-Marta and Wanner (2006) (called HOM). This method uses quantile mapping to adjust the data. The distribution of the differences (candidate minus reference series) during the break period is compared to the training period and adjusted such that after the correction the distributions are more consistent to each other. The adjustments are applied separately for each season and segment for example, to adjust winter values from the break segment only winter data of the training and break period are used.
The last method is based on SPLIDHOM. With an indirect nonlinear regression method and cubic smoothing splines the data of the break period is adjusted to the  (Mestre et al., 2011). The adjustments are done separately for each month and each segment (similar to the training and break periods described for the method Linear Scaling). These three methods are examples for homogenization methods which can be used for daily data. There exist also other homogenization procedures but in most cases these methods are comparable to one of the three methods described here.
3 | EVALUATION 3.1 | Results of breakpoint detection Table 2 summarizes the results of the breakpoint detection. At four of the 13 stations, no breaks are detected. At nine stations, at least two breaks are detected. Usually, the break size (in term of differences in the mean value) is small. The mean SNR is 0.46 and in most cases the automatically measured time series is inhomogeneous. Potential reasons for the breaks are replacements of the automatic instrument or modification of the instrument position inside the lamellar shelter (type: LAM 630). A replacement of an instrument (done in regular intervals) can have impacts on the homogeneity of the time series. For example, the uncertainty of the automatic instrument is 0.1 K (checked in the calibration laboratory). Accordingly, the combined calibration uncertainty of two instruments is 0.14 K (JCGM J, 2008).
The first break at Fichtelberg is caused by a calibration of the manual instrument and the other breaks can be related to modifications of the Stevenson shelter (not specified in details). The best identification method is the Station name, date of breakpoint (detected by 'uniseg'), signal-to-noise-ratio (SNR), total number of the first comparison (with metadata), total number of the second comparison (with reference series), total number of third comparison (with related parameter, daily maximum and minimum temperature), and total score for each instrument (manual or automatic) comparison with metadata information ('metadata score', first comparison) and the comparison of related parameters (third comparison). The comparison with nearby stations (second comparison) is less successful. Probably, the break size is too small and the difference series of manual/automatic minus reference time series is too noisy resulting in a small SNR. At the station Brocken, Helgoland and Hohenpeißenberg no reference series can be calculated. The correlation coefficients between manual/automatic time series and the series of nearby stations are too small. These three stations are located on a mountain top (station Brocken and Hohenpeißenberg) or on an island (station Helgoland). Only in one case (station Hamburg), the break in the difference series (automatic minus manual) can also be found in the differences of automatic minus reference series (second comparison). In two cases (station Schleswig and Hamburg), it is not possible to identify the inhomogeneous time series using the three comparisons.
• The difference series and the break (detected and identified) for the station Schleswig is shown in Figure 4. At March 16, 2012 the PT100 instrument was replaced by a new one. After the detected break, the difference series has a linear trend. One reason for a trend in the difference series can be a drift in the instrument. This trend period affects the results of the breakpoint detection method ('uniseg'). The second break (detected but not identified) can be an artefact of the detection method dealing with the linear trend. This can be the reason why no metadata information is available for that time period. • At the station Hamburg, the two detected breaks are small (inside the uncertainty of the instruments). There is only metadata information for the second break. The first break has no metadata information (first comparison), there is no break in the difference series between the measurements of Hamburg and nearby stations (comparison two), and no breaks are detected in the difference series of the related parameter daily maximum temperature and daily minimum temperature (comparison three). The total score of the manual and the automatic instrument is zero. For that reason, the detected break can not be assigned to one instrument (manual or automatic) and no homogenization is done.
The series of Hohenpeißenberg has breaks with large SNR (compared to the other breaks in Table 2) and metadata information is available (see Figure 5, top). Additionally, the break can be detected in the difference series (automatic minus manual observations) of the parameters daily mean temperature, daily maximum temperature and daily minimum temperature (third comparison). This indicates a break in the automatically measured time series.

| Results of homogenization
All homogenization methods used here are based on the idea of using a training period (including the most recent measurements) and adjust the data of the break period to the training period. Table 3 summarizes mean and SD of the differences between automatic and manual observations for the complete time series, for the training period and the complete time series after homogenization using Linear Scaling, SPLIDHOM or HOM. Differences between the homogenization methods are small. All methods are able to adjust the data to the training period. The reason for the small differences between the homogenization method is, that in most cases the breaks in this study affect the mean but not the SD of the individual segments of the difference series. Figure 5 shows the difference series of automatic minus manual observations before and after the homogenization with all three methods for the station Hohenpeißenberg. Differences are very small between the methods. After homogenization no further breaks are detected (i.e., the homogenization was successful). Changes in the distribution before and after homogenization are small (see Figure 6). The distribution of the automatic measurements is shifted to the right towards the manual distribution.
In a few cases, the procedure of breakpoint detection, identification and homogenization failed. For the series of Hamburg, Helgoland and Schleswig, breaks are detected in the difference series after the homogenization. At these stations, the result is independent of the homogenization method.
• In Helgoland, the detected break has a time lag to the metadata information (Figure 7, first row). One possible explanation is that the metadata information is not related to the break and the wrong time series is adjusted. Another possible explanation is that the metadata information has a wrong or shifted date. If metadata is available, the date of the metadata information is used instead of the detected break date. Therefore, an incorrect date in the metadata will result in an incorrect break date and the homogenization is influenced.

T A B L E 3
Mean values and standard deviation of difference series (automatic minus manual observations) before homogenization, in training period and after homogenization using the methods Linear Scaling, SPLIDHOM, and HOM • In Schleswig, the last period with the linear trend in the difference series is used as training period. The homogenization methods have problems with this linear trend (Figure 7, second row).
• In Hamburg, the breakpoint is detected at the same position as before the homogenization because it was not possible to identify the inhomogeneous time series (automatic or manual). No homogenization is done for F I G U R E 7 Difference series of raw data (left) and homogenized data (right) with Linear Scaling at station Helgoland (top row), difference series of homogenized data with SPLIDHOM at station Schleswig (second row), difference series of homogenized data with Linear Scaling at station Hamburg (third row), and difference series of homogenized data with HOM at station Brocken (forth row). The grey area in the bottom part of the plots on the right represent the different segments of the time series separated by the detected and identified breaks. The pink points in the bottom part of the plot show the detected breaks with 'uniseg'. The vertical lines show the dates of metadata information, the detected and identified breaks (orange for automatic and blue for manual) and the detected breaks in the difference series after homogenization (dark pink line) that break/segment (Figure 7, third row). This affected the homogenization results of the complete time series.
• After the homogenization using HOM a break can be detected at a similar position as in the raw data for the series Brocken (Figure 7, bottom). The break in the difference series (with homogenized data) is smaller than before (with raw data) so there is an improvement.
Using the other two homogenization methods (SPLIDHOM and Linear Scaling) no breaks are detected.

| Comparison of raw data and homogenized data
After the homogenization, the differences between homogenized data and raw data (only been controlled for outliers) are analyzed. Figure 8 shows the histograms of the differences between automatic and manual measurements without outliers of the original data, and the data after homogenization with Linear Scaling, SPLIDHOM or HOM. The mean value of all differences remains almost identical (−0.03 K), that is, breaks in the time series compensate each other. Some breaks are related to higher temperature values for the automatic instrument and some are connected to smaller temperature values for the automatic instruments. On average breaks have no effect on the mean differences between the automatic and manual daily mean temperature values. The mean difference between the two measurement systems (manual and automatic) is close to zero. No break is expected in long time series of daily mean temperature in Germany related to the automatization (at least for stations with the same measurement  14 0.14 0.14 0.14 0.14 0.12 0.12 0.12 0.12 n = 34242 F I G U R E 9 Monthly boxplots of differences in K. Top row: based on raw data, second row: homogenizes data (with Linear Scaling), third row: homogenizes data (with SPLIDHOM), fourth row: homogenizes data (with HOM). The mean difference and the standard deviation (SD) of the differences are presented below the monthly boxplots conditions as they are present at German climate reference stations). Station relocation or environmental changes may have a stronger influence on long time series than the automatization. The SD of the differences between automatic and manual measurements after the homogenization only differ in a small range (0.01 K) compared to the original data. As generally expected, the SD is smaller after homogenization because the SD of the original data is increased by the breaks.
As shown in Figure 9, there is no annual cycle in the differences of manual and automatic daily mean temperature observations of the raw data and after homogenization.

| SUMMARY, CONCLUSIONS AND OUTLOOK
In this study, differences between automatic and manual daily mean temperature measurements of 13 stations are analyzed including an outlier control, a breakpoint detection and the homogenization of time series. The mean (−0.02 K) and the SD (0.14 K) of the differences between automatic and manual measurements of daily mean temperature before the homogenization are small. This finding is in agreement with the results of Baciu et al. (2005). With these values no break is expected in long time series of daily mean temperature (calculated with the traditional equation) caused by the transition from manual to automatic measurement instruments. To study the effects of breaks in time series, the time series are analyzed on breakpoints and the data is homogenized (with three homogenization methods: Linear Scaling, SPLIDHOM and HOM). Afterwards the differences are analyzed again. The mean difference of the two observing techniques (manual and automatic) remain almost constant. In most cases, the break size is below the instrument calibration uncertainty. The homogenization of the time series only has a small effect but after the homogenization the SD of the differences is even smaller than before. The largest breaks were found for the automatic instrument at the station Brocken and Hohenpeißenberg. Here, the break size is larger than the instrument calibration uncertainty.
The analysis of German climate reference stations has shown that for Germany the homogenization of the data is only of minor relevance in the context of analyzing the impact of the automatization on long time series of daily mean temperature values. In this case, the breaks in the time series (during the time period of parallel measurements) are small and compensate each other. In some cases, replacement of PT100 instruments causes small breaks caused by the instrument calibration uncertainty (resulting in a potential offset inside a range of 0.14 K). The maintenance intervals of the instrument are short enough to detect problems of the instrument sufficiently early to ensure that the quality of the data is not strongly affected by breaks. Replacements of instruments or calibration dates are well documented such that breaks can be identified easily (in most cases).
The results of this study can be summarized as follows: • The mean differences between manual and automatic daily mean temperature values are small. Therefore, it can be concluded that the automatization of temperature measurements did not cause relevant breaks in the German time series of daily mean temperature. • The differences between the results of the three homogenization methods SPLIDHOM, HOM and Linear Scaling are small. All three methods are able to homogenize the breaks as for example the breaks in the time series of Hohenpeißenberg. • The detected breaks in the time series of daily mean temperature (within the time period of parallel measurements) are small indicating consistent data quality and sufficiently short maintenance intervals.
At German climate reference stations also parallel measurements of other meteorological parameters are performed (e.g., precipitation, daily sunshine duration, relative humidity, and wind speed). The analysis of the impact of changing measurement systems on the homogeneity of long time series of these parameters will be subject of future studies. daily temperature series and its relevance for climate change analysis. Journal of Climate, 23(19), 5325-5331. Vincent, L.A., Zhang, X., Bonsal, B. and Hogg, W. (2002)