Random trend errors in climate station data due to inhomogeneities

Inhomogeneities in station series are a large part of the uncertainty budget of long‐term temperature trend estimates. This article introduces two analytical equations for the dependence of the station trend uncertainty on the statistical properties of the inhomogeneities. One equation is for inhomogeneities that act as random deviations from a fixed baseline, where the deviation levels are random and independent. The second equation is for inhomogeneities, which behave like Brownian Motion (BM), where not the levels themselves but the jumps between them are independent. It shows that BM‐type breaks introduce much larger trend errors, growing linearly with the number of breaks. Using the information about type, strength, and frequency of the breaks for the United States and Germany, the random trend errors for these two countries are calculated for the period 1901–2000. An alternative and independent estimate is obtained by an empirical approach, exploiting the distance dependence of the trend variability for neighbouring station pairs. Both methods (empirical and analytical) find that the station trend uncertainty is larger in the United States (0.71 and 0.82°C per century, respectively) than in Germany (0.50 and 0.58°C per century, respectively). The good agreement of the analytical and the empirical estimate gives confidence in the methods to assess trend uncertainties, as well as in the method to determine the statistical properties of the break inhomogeneities.


| INTRODUCTION
Trends in climate records are affected by inhomogeneities, which are mostly caused by relocations of the stations or changes in the operational measuring techniques . These observational changes introduce a break signal into the climate record, which is modelled as a step function consisting of homogeneous subperiods interrupted by sudden jumps. The sizes of these breaks are expected to follow a normal distribution (Menne and Williams, 2005), which means that series also contain many small breaks, which are too small to be detected but do influence the homogenization and can still produce considerable trend errors. Based on detected breaks and expert judgement, the standard deviation of the jump size distribution is estimated to be about 0.8 K for European networks (Auer et al., 2007;Brunetti et al., 2006;Caussinus and Mestre, 2004;Della-Marta et al., 2004;Venema et al., 2012). In the United States, jumps are thought to be somewhat smaller .
A few of these spurious jumps can be sufficient to perturb the actual trend considerably. Changes in the immediate or wider surroundings of the station may also produce sudden jumps or more gradual increases. The latter can be modelled by multiple small jumps (Venema et al., 2012), and this study will only consider jumps.
Homogenization algorithms are applied to detect these inhomogeneities and to eliminate their influence as much as possible in a correction step. In most cases, relative homogenization techniques are applied, where the difference time series of a considered candidate station with its neighbouring stations is analysed to reduce the dominating regional climate signal so that the inhomogeneity signal becomes better detectable (Conrad and Pollak, 1950). After the removal of the complicated climate signal, the difference time series is well modelled as a step function representing the inhomogeneities plus noise due to weather differences and observational errors. The signal-to-noise ratio (SNR) can then be defined as the square root of the variance of the break signal divided by the variance of the noise signal. Next to the above mentioned break sizes, the correlations between stations are thus also crucial. Lindau and Venema (2016) concentrated on the detection of breaks and showed that the uncertainty of the detected break positions can be approximated by a Brownian motion (BM) with drift, where the SNR defines the drift size. This fits qualitatively to earlier numerical results on position errors by  and to the relationship between the hit rate and the SNR by, for example, Wang (2008). Lindau and Venema (2018a) showed that, for SNR < 1, problems may occur in the detection part. The problem is not simply that the detection itself may become difficult. More fundamental is that breakpoints explaining only the noise are erroneously assessed to be significant, because they also explain a large part of the break variance at the same time. In this way, completely false break positions could be the outcome. Lindau and Venema (2018b) examined the correction part of homogenization algorithms and concentrated on the trends. They distinguished between biased and unbiased breaks, where the latter are characterized by an average jump size of zero, which on average do not change the trends. Again, the SNR was found to be crucial for the performance of the trend correction. They showed that the trend error of unbiased breaks could only be reduced if the number of breaks divided by the SNR is smaller than 6, even if the detection of breaks is perfect. Thus, for more than six breaks in each time series and SNR = 1, the trend error of the homogenized station series is larger than for the original data. Biased breaks additionally introduce a trend bias, which can be entirely corrected only if the break detection is perfect. However, in realistic cases, where the detection of the break positions is imperfect, the bias is only corrected partly. Lindau and Venema (2019) pointed out that it is further important to distinguish between two types of inhomogeneities. On the one hand, there are random deviation (RD)-type breaks, where the levels between two adjacent breakpoints are RDs from a fixed baseline and, on the other hand, BM-type breaks, where the jumps between the levels are random so that the inhomogeneity signal has the characteristics of a BM. Consequently, BM levels have a memory and cannot be modelled as single random numbers but as the sum of n random numbers, where n is the number of all previous jumps. In their study, they presented a method to provide an average quantification of both break types in a given dataset. For RD-type breaks, the strength σ δ 2 and the frequency p δ could be separately given, whereas for BM-type breaks, only the combined parameter consisting of the product of strength and frequency p β σ β 2 could be derived.
These break parameters were determined for the contiguous United States, where a mixture of RD-and BM-type breaks were detected and for Germany, where fewer and smaller breaks were found. An important difference was that German data contain only RD breaks and no BM breaks. Dunn et al. (2014) pointed out that there is a lack of small breaks sizes in the distribution of detected breaks, which was confirmed by Yosef et al. (2018) and Trewin (2013). Lindau and Venema (2019) derived the number of breaks without applying homogenization algorithms and found more than the double number of breaks compared with these traditional estimates. However, this is plausible because their estimate comprises also the large number of small size breaks, which are not detected by the homogenization algorithms.
In this article, we concentrate on the station trend errors as they are contained in the original data before any homogenization. Besides homogenization, quality control is an important issue in data processing, especially for the determination of extremes. However, Hunziker et al. (2018) showed that the error of mean temperature trends depends strongly on whether the data are homogenized or not, whereas the kind of applied quality control is less important. As the aim is to derive trend errors of the raw data neither homogenization nor specific quality control is applied.
In Section 2, we derive the expected trend uncertainty caused by the two inhomogeneity types (RD and BM) as a function of strength and frequency of the breaks. Using the derived parameters from Lindau and Venema (2019), an error estimate of the station trends in the United States and in Germany can be given; see Section 3. These estimates are then compared in Section 4 to the trend difference of neighbouring stations, which provide a second and independent measure of the trend uncertainty. Thus, an indirect method that derives the expected trend variance from the known breaks characteristics is compared with the results directly calculated from the data.
Please note that only the random part of the uncertainty is considered because both Lindau and Venema (2019) as well as the empirical method applied in Section 4 use difference time series of neighbouring stations. This is necessary to eliminate the dominating climate signal, but it removes at the same time any possible mean trend bias caused by inhomogeneities.

| DERIVATION OF THE EXPECTED ERROR VARIANCE OF TRENDS
In this section, we determine the theoretical trend variance (caused by inhomogeneities), which is expected for both break types. These relationships are compared with numerical results based on simulated station series containing inhomogeneities. For this numerical part, we simulate the inhomogeneity signal by a step function of length l = 100 years containing a number of k breakpoints at random positions τ i , i = 1 to k. In these time series, breaks occur stochastically with a probability p so that the number of breaks k varies between the different time series. The breaks divide the time series into k + 1 homogeneous subperiods (HSPs) of length l i . Within a HSP, the step function remains at a constant level. For BM-type breaks, the jumps between the levels are random, independent, and normal distributed with zero mean. For RD-type breaks, the same is true but for the levels themselves.

| RD-type breaks
We created an ensemble of 10,000 series with a variance of the HSP levels of σ δ 2 = 0.5 K 2 . As each jump between two adjacent levels consists of two of such level deviations, the variance of the jumps is equal to 1 K 2 . Then, we calculated the trend of each time series with least square linear regression. Note that the expected mean trend of all step functions is zero; the crucial parameter is the variance of the obtained trends: it represents the uncertainty caused by the inserted inhomogeneities. Figure 1 shows the obtained trend variance (given by the crosses) for 200 different break rates p between 0.1 and 20 breaks per century (cty). For break rates well below one, the variance increases steeply with a slope of about 1.2 K 2 cty −1 as given by the thin straight line. The maximum is reached at a rate of about 2-3 breaks cty −1 .
To compare our numerical result with the theoretical expectation, we start with an approximation for the relative trend variance of entire networks f 2 given by Lindau and Venema (2018b, their eq. (17)): It is given as the quotient of the variance of the temperature change Var RDNet (lT) divided by the variance of the inhomogeneity levels σ δ 2 . Here, T is the temperature trend and l the length of the series so that their product gives the temperature change. The relative trend variance f 2 depends on the number of breaks k in a station and the number stations n within a network. To convert the variance of the temperature change into the variance of the trend, we have to divide by the squared length of the time series l 2 . Furthermore, for the station trends, the variance is n times larger, so the division by n in Equation (1) has to be omitted. Solving for the variance of the station trends, we obtain: This theoretical approximation is given as thick continuous curve in Figure 1, which reaches its maximum at k = 2. That a maximum of variance must exist somewhere at moderate break rates is qualitatively plausible. RD breaks cause a random fluctuation of the HSP levels, which is comparable to the effect of noise, but on the longer time scale of the HSPs. Consequently, the variance of the trends converges with increasing number of breaks to zero, as more and more independent values are underlying. However, for zero breaks, the variance does not become maximal but on the contrary is equal to zero. Thus, a maximum must exist somewhere in between, which is obviously reached at k = 2.
The approximation given in Equation (2) is valid for k being a constant number of breaks for every time series. Here, we consider, however, stochastically occurring breaks so that the number of breaks will vary according to the F I G U R E 1 Variance of the trend as a function of break rate for RD-type breaks. The crosses denote the result for modelled data, the solid line gives an approximation for constant break numbers. For a transfer to varying break numbers, a weighted average using the Poisson distribution is calculated, which is given by the thin curve. The thick straight line denotes a slope of 1.2 K 2 Ácty −1 , which fits well to the data at small break rates Poisson distribution; this reduces the height of the peak occurring in the numerical result. Thus, to obtain the results for mean break rates, a weighted average with the according Poisson weights w is necessary.
where λ denotes the average and k the actual number of breaks. The thin curve in Figure 1 shows the result of this (weighted) averaging process. It leads to a reduction especially around the maximum at p = 2 cty −1 so that the average curve matches the data much better. At a mean rate of 5 cty −1 , the thick and thin curve cross, as the curvature (second derivative) changes its sign so that the averaging process leads for larger break rates to a slight increase compared with the original curve. Apart from these details, the approximation given in Equation (2) fits for break rates larger than about 4 cty −1 reasonably well to the modelled data.

| BM-type breaks
To create BM-type breaks, we choose a variance of the jumps between two adjacent HSPs of σ β 2 = 1 K 2 . Formally, σ β 2 is twice as large as σ δ 2 . In this way, the size of the jumps is equal for both break types because two deviations of the RD signal produce one jump. Figure 2 shows that the trend variance caused by BM-type breaks quickly becomes much larger than the corresponding variance for RD-type breaks. It grows linearly with the number of breaks instead of decreasing as we have found for RD breaks. Lindau and Venema (2019) already expected a linear variance increase of the BM trend variance and gave an approximation based on the following consideration. The slope of the connection line between the first and the last value of a BM is a rough estimate for the trend. The last value is equal to the sum of k random numbers so that its variance is kσ β 2 . As the first value of a BM is constant and zero, the variance of the temperature change is also kσ β 2 . However, for the linear trends, we empirically found 1.2 kσ β 2 , thus a 20% larger variance. That a discrepancy between approximation and truth exists is not surprizing but the intriguing factor of 1.2 asks for an explanation. This is given in detail in the Appendix, where the factor 6/5 is confirmed. Thus, also the theory shows: where p β denotes the frequency of BM-type breaks and l the length of the considered time series. Thus, for BM-type breaks, the variance of the trend grows linearly with the break rate and the slope is 6/5 K 2 cty −1 . The same slope was found for RD breaks at very small break rates (please compare the thick straight line in Figure 1). This shows that the two break types are difficult to distinguish for small break rates. This is plausible, as it is impossible to assign a single break in a time series to one of the two break types. At least two breaks are necessary to test the independence of consecutive jump heights as it is necessary for the proof of BM-type breaks. For small break rates, this is important to bear in mind.

| INDIRECT CALCULATION OF TREND UNCERTAINTY BY USING THE KNOWN BREAK PARAMETERS
Lindau and Venema (2019) were able to separate the effects of RD-and BM-type breaks by considering the variance of the difference time series Var(D(L)) between two neighbouring climate stations for different time lags L. RD breaks produce a saturating exponential signal, whereas BM breaks cause a linear increase of the variance as a function of lag. The noise reflecting the weather differences between two stations affects a lag independent offset that depends only on the noise variance σ ε 2 . Summing these three terms, the variance of the difference is a function of time lag L and given by: Fitting Equation (5) to the data, Lindau and Venema (2019) obtained break parameters for the United States and F I G U R E 2 As Figure 1 but for modelled data with BM-type breaks (O-symbols). The result for RD-type breaks (crosses) from Figure 1 is much smaller and given for comparison German stations, which are given in Table 1. In the United States, a mean rate of p δ = 17.1 RD breaks per century is found with a size of σ δ 2 = 0.1175 K 2 . When we insert these values into Equation (2), we obtain the trend variance caused by RD breaks: The combined BM parameter is 0.45 K 2 cty −1 . Inserted in Equation (4), we get: As the two break signals are assumed to be independent, their variances can be summed up to obtain the total error variance (0.672 K 2 cty −2 ). The square root yields the standard deviation of the trend in the United States due to inhomogeneities: In Germany, they found only RD-type breaks with a frequency of 4.1 breaks cty −1 . Their size is 0.1275 K 2 : This is considerably larger than the error variance for RD breaks in the United States (compare Equations (6a) and (8)). The main reason is that the break frequency in Germany is about four times lower. This means fewer independents, which increases the random scatter. That fewer breaks causes larger errors may seem counterintuitive, but is indeed true for the unbiased (zero mean) breaks assumed here. Applying the square root to Equation (8) yields again the standard deviation, now for Germany.
Thus, the overall effect of inhomogeneities is larger in the United States caused by the strong impact of BM-type breaks, which are not found in Germany.

| DIRECT CALCULATION OF THE TREND UNCERTAINTY BY USING THE OBSERVED TREND VARIABILITY
In this section, trends are directly calculated from real observations. The variance of the trend difference between two stations is considered as a function of station distance. The effect of the inhomogeneities is independent of distance and appears as an offset readable at zero distance, whereas with increasing distance between two stations, the variance grows due to real differences in the regional climate.
We used monthly data of the International Surface Temperature Initiative (Rennie et al., 2014) and extracted time series from the United States and Germany with at least 80 years of data in the entire period 1901-2000 and at least 20 years in the baseline period 1961-1990. Then we subtracted the monthly station mean for the years 1961 to 1990 to remove the variance of the annual cycle. From this data, least square linear trends are calculated for the period 1901-2000. In the next step, pairs of these station trends are built and sorted into 12 distance classes from 0 to 100 until 1,100 to 1,200 km. All possible pairs of stations are built so that each station occurs n times when n partner stations are found. Within these classes, two parameters are calculated: The variance of the difference (i.e., the trend differences of pairs of stations) and the sum of the variance (i.e., the variance of all contributing station pairs). The latter fluctuates weakly with distance ( Figure 3, uppermost thin curve), but this is only caused by the slightly different selection of the underlying stations; a station will not occur in a distance class, if no partner is found in that specific distance. However, essentially the sum of the variance is a constant describing the (double) variance of the trends in the whole area considered, for example, contiguous United States.
The variance of the trend difference (bold curves) consists of two components: A constant offset describing the variance caused by the inhomogeneities and a second component caused by the different climate zones, which is growing with increasing distance. Thus, the inhomogeneity part, that is, the uncertainty of trends caused by the breaks, can be read at the (marginally extrapolated) axis intercept of the lower curve.
For the upper fat curve representing the United States, the intercept is equal to about 1 (K cty −1 ) 2 . From this value, the SD for a single station can be obtained by dividing by T A B L E 1 Characteristic break parameters for Unites States and German station data from Lindau and Venema (2019) p β σ β 2 (K 2 Ácty −1 ) Note: The variances σ β 2 and σ δ 2 denote the strengths of the BM and RD breaks, respectively, p β and p δ their frequency and σ ε 2 the noise variance.
2 (because two stations are contributing) and computing the square root, which yields 0.71 K cty −1 . The difference between the thick and the thin curves at zero distance, being for the United States equal to about 0.6 K 2 cty −2 , can be attributed to real climatic differences of the trends. This additional component nearly vanishes for distances larger than 1,200 km. Thus, with 0.6 K 2 cty −2 , we have additionally an estimate for the magnitude of real trend variations on large spatial scales.
An analogous test, but on finer scale for distances below 120 km, shows that both curves remain rather constant ( Figure 4). Only for the shortest distance class of 5 km, the lower curve decreases abruptly. However, not only the trend correlation changes but also the correlation of the time series themselves jumps from 0.935 to 0.984. This suggests that undetected identical data segments from different data sources are the reason rather than real climate signals.
The procedure is repeated for Germany where much less data are available so that the two curves are characterized by larger scatter. Nevertheless, clear differences compared with the U.S. results are obvious. The first is that the thick and thin curves are nearly equal and more or less constant over all distances classes. This indicates that climatic differences in the trends play almost no role, although over similar distances there is an effect in the United States. The trends of all stations throughout Germany are similar, only the inhomogeneities impose a constant variance offset. A second difference with the U.S. results is that this level is much lower with about 0.5 K 2 cty −2 . The square root of the half of this value yields 0.5 K cty −1 . However, due to the sparse underlying data, this value has a larger confidence interval.
In Germany, a reliable analysis of the variance for short distances similar to Figure 4 is difficult, as only few pairs are available for these distance classes. Consequently, the result is dominated by large scatter and therefore not shown.
For U.S. data, the trend variability caused by inhomogeneities is shown to be as large as 1 K 2 cty −2 for nearby station pairs, which corresponds to a SD of about 0.71 K cty −1 for a single station. For Germany, the corresponding value is about 0.5 K cty −1 . These values are slightly lower, but nevertheless in good agreement with the indirectly derived estimates given in Equations (7) and (9).

| DISCUSSION AND CONCLUSIONS
We have developed an analytical and an empirical method to estimate the uncertainty of secular stations trends due to inhomogeneities. The empirical method found for contiguous United States and Germany a random trend uncertainty of about 0.71 and 0.5 K cty −1 , respectively. The alternative method to use the analytically derived Equations (2) and (4) together with the statistical properties from Lindau and Venema (2019) yielded slightly larger values with 0.82 and 0.58 K cty −1 , respectively.
Trends are certainly less sensitive to missing data compared with, for example, extremes. However, the effect of missing data will be included in the trend uncertainty, slightly increasing the uncertainty for the empirical method. The analytical method, however, is based on lag differences. Data gaps make the correct estimation of the parameters more difficult but will not lead to systematically larger trend uncertainties. Consequently, missing data can make the estimates of the empirical method somewhat higher than those of the analytical estimate. The effect of missing data seems to be limited, as the analytical estimates are even slightly larger than the empirically obtained trend uncertainties.
The random trend error is a large part of the station trend error budget but for some regions considerable networkwide trend biases have been found (Begert et al., 2005;Brunetti et al., 2006;Böhm et al., 2010;Menne et al., 2010), which are smaller but comparable in size and should thus be part of a complete trend uncertainty budget.
The large-scale bias introduced by the breaks is difficult to determine directly from the data. In single-station series, this component cannot be separated from the true climate trend; and in difference time series, actually used to eliminate the climate signal, this component is lost, too. Thus, we are restrained to indirect methods that compare the trends before and after applying a homogenization algorithm to the data. How accurately these are able to estimate the full trend bias should be tested in the future.
We can compare our estimates of the trend errors with those in the inhomogeneities of the HOME benchmark based on a community consensus of typical European inhomogeneities, which were 1.2 C cty −1 (Venema et al., 2012). However, the breaks in the HOME benchmark were √2 times too large; taking this into account, the station trend errors would have been 0.85 K cty −1 . In the present study, we found errors between 0.5 and 0.82 K cty −1 , depending on country and method. However, these comprise only random errors and no biases so that they may underestimate the full uncertainty. Taking this into account, these values seem to match reasonably.
Random errors between 0.5 and 0.82 K cty −1 , as we found them for individual stations, will cause random errors also in the network mean trend. Assuming 10 independent stations in a network, the error will be reduced by a factor of ffiffiffiffiffi 10 p to 0.16 to 0.26 K cty −1 . This is still not a bias but solely the consequence of the underlying random errors. Such an uncertainty is smaller than the observed climate trends but not negligible. Furthermore, the error estimate is a lower threshold for the uncertainty of network mean trends. Any dependency between the break signals of the stations will further increase it. The same is true for possibly occurring trend biases, which are not considered in this study.
The above estimate is valid for the situation before homogenization. However, Lindau and Venema (2018b) showed for realistic scenarios (but assuming a perfect detection of the breaks) that the correction step is responsible for the production of dependent solutions for the stations. Consequently, similar trend errors have to be expected also in homogenized data.
Even ignoring network-wide trend biases, inhomogeneities can be very complicated: the break frequency does not have to be constant nor do the break positions need to be independent in time or between stations. RD and BM are idealizations. Clearly BM must be an approximation as its variance grows unbounded in time. Thus, that the analytical and the empirical estimations found about the same station trend uncertainties gives some confidence that the description of inhomogeneities as a superposition of a RD and a BM signal is sufficiently realistic and that complications are at least not too important for trend errors due to inhomogeneities. We consider a set of Brownian motion step functions y t of length n containing a number of breakpoints at random timings τ i , i = 1 to k. Breaks occur stochastically with a probability p so that the number of breaks k varies between the different time series. The breaks divide the time series into k + 1 homogeneous subperiods (HSPs) of length l i . Within a HSP, the step function remains at a constant level z i . The levels z i form a Brownian motion in time with:

ORCID
The increments x i are random, independent, and normal distributed with zero mean.
The correlation r between data y and time steps t defines the trend: The linear trend T is given by the slope of the regression line where the data are treated as independent and the time as dependent parameter: where σ y and σ t denote the standard deviations of data and time, respectively. The covariance between data and time is denoted by Cov(y,t) and the variance of the time by Var(t). The sum over the time steps and sum over the squared time steps, which occur in Equation (A2), are given by: X n t =1 t 2 = 1 6 2n +1 ð Þ n +1 ð Þn: The sum over the data y t is: 2 : V is a constant for all time series and depends only on the length n. It can be rewritten to: Now the trend T for each time series could be obtained by dividing C by V. However, we are interested in the variance of the trend of many of such time series. As V is a constant depending only on n, it is sufficient to calculate the variance of C and divide just by V 2 . Furthermore, the expected value of C is zero so that the variance is equal to the mean squared C, denoted by square brackets.
Equation (A11) shows that C is obtained by multiplying a time-depended coefficient a i by a random number x i , summing up this product over all HSPs i and dividing by 2. Thus, denoting: we can rewrite Equation (A11) in a concise form: For the mean squared C, it follows: Equation (A17) is valid for two reasons. First, all mixed terms that arise when a sum is squared becomes zero because x i is a random variable. The second is that the mean squared x i is always equal to σ β 2 independently from the HSP. In the last transformation, the order of summation and averaging is again reinverted. Thus, the average of the sum over a i 2 is the crucial term.