Volume 43, Issue 11 p. 4771-4789
Open Access

CADTEP: A new daily quality-controlled and homogenized climate database for Catalonia (1950–2021)

Marc Prohom

Corresponding Author

Marc Prohom

Department of Climatology, Meteorological Service of Catalonia, Autonomous Government of Catalonia, Barcelona, Spain


Marc Prohom, Department of Climatology, Meteorological Service of Catalonia, Autonomous Government of Catalonia, Dr. Roux, 80 E-08017 Barcelona, Catalonia, Spain.

Email: [email protected]

Contribution: Conceptualization, ​Investigation, Funding acquisition, Writing - original draft, Methodology, Supervision, Project administration, Writing - review & editing

Search for more papers by this author
Peter Domonkos

Peter Domonkos

Independent Researcher, Tortosa, Spain

Contribution: ​Investigation, Writing - original draft, Methodology, Writing - review & editing, Software, Conceptualization

Search for more papers by this author
Jordi Cunillera

Jordi Cunillera

Department of Climatology, Meteorological Service of Catalonia, Autonomous Government of Catalonia, Barcelona, Spain

Contribution: Conceptualization, ​Investigation, Methodology, Visualization, Writing - review & editing, Data curation, Software

Search for more papers by this author
Antoni Barrera-Escoda

Antoni Barrera-Escoda

Department of Climatology, Meteorological Service of Catalonia, Autonomous Government of Catalonia, Barcelona, Spain

Contribution: ​Investigation, Methodology, Writing - review & editing, Data curation

Search for more papers by this author
Montserrat Busto

Montserrat Busto

Department of Climatology, Meteorological Service of Catalonia, Autonomous Government of Catalonia, Barcelona, Spain

Contribution: ​Investigation, Writing - review & editing, Validation

Search for more papers by this author
Mònica Herrero-Anaya

Mònica Herrero-Anaya

Department of Climatology, Meteorological Service of Catalonia, Autonomous Government of Catalonia, Barcelona, Spain

Contribution: ​Investigation, Writing - review & editing, Validation

Search for more papers by this author
Albert Aparicio

Albert Aparicio

Department of Climatology, Meteorological Service of Catalonia, Autonomous Government of Catalonia, Barcelona, Spain

Contribution: Visualization, ​Investigation

Search for more papers by this author
Jaume Reynés

Jaume Reynés

General Direction of Environmental Quality and Climate Change, Autonomous Government of Catalonia, Barcelona, Spain

Contribution: ​Investigation, Methodology

Search for more papers by this author
First published: 24 May 2023
Citations: 1


New daily minimum temperature (TN), maximum temperature (TX) and precipitation amount (PPT) database CADTEP (CAtalan Daily TEmperature and Precipitation data set) has been developed for Catalonia, Spain. The source dataset consists of the available climate records from Catalonia provided by the State Meteorology Agency of Spain (AEMET) and the Catalan Meteorological Service (SMC). We selected the long and fairly complete records and created further long series by merging the records of nearby stations. Finally, 26 TN, 26 TX and 72 PPT time series covering the period 1950–2021 have been accepted for CADTEP from the several hundreds of, but mostly much shorter time series. This database has sufficiently dense time series to reach accurate estimates of regional climate variability. CADTEP time series were subjected to thorough quality control and time series homogenization. In the quality control, many shorter series were still in use. Both in quality control and time series homogenization, several series surrounding Catalonia, but close to the borders, were also considered to achieve a higher data accuracy in CADTEP. The homogenization was performed with ACMANTv5.1. In this new version of ACMANT, metadata dates pointing on likely occurrences of nonclimatic changes (breaks) are contemplated in the homogenization as sure break positions. Finally, we examined the trend and variability of TN, TX and PPT over 1950–2021 using CADTEP. Differences between the use of homogenized and raw data, as well as between the inclusion and exclusion of metadata use in the homogenization procedure were controlled. The most important conclusions are related to the trend analysis, namely the speed of the warming of summer TX is faster than in other seasons, faster than the warming of TN in general, and the triple of the mean global warming speed.


Observed climate change and variability demands high-quality and long-term climate series, in order to monitor properly the magnitude, speed and signal of change and to analyse the impact on human and natural systems. The Mediterranean basin is identified as a “hot spot” regarding climate change, as warming is occurring at a much faster rate than is occurring on the global scale (Cramer et al., 2018; IPCC, 2021; Lionello & Scarascia, 2018; MedECC, 2020). As a result, several climate extremes are now more frequent, more intense, and long-lasting, as is the case of heat waves (Efthymiadis et al., 2011) and droughts (Caloiero et al., 2018; Longobardi & Villani, 2010; Luppichini et al., 2022). Nevertheless, the complex geography of the region and the presence of a warm and closed sea result in high spatial variability, especially in the behaviour of extreme rainfall events. Catalonia, located in the northeastern corner of Iberia and in the western edge of the Mediterranean basin, fully participates in this complexity, being even increased by the presence of the Pyrenees in the northernmost sector. Consequently, the availability of quality-controlled and homogeneous climate series with adequate spatial and temporal coverage is crucial in the region.

In the last two decades, several projects have been undertaken to improve the availability of reliable climate data for the Iberian Peninsula, incorporating recommendations of several international initiatives to improve the skills of quality control methods and homogenization of climate series. The first remarkable effort to provide a quality-controlled and homogenized dataset was made by Brunet et al. (2006), the SDATS dataset, containing 22 Spanish daily air temperature series from 1850 to 2005, and updated in 2014. Several years later, Luna et al. (2012) built a monthly precipitation dataset, which was homogenized with the Climatol method (Guijarro, 2013a), and the same method was used in Guijarro (2013b) to homogenize monthly air temperature series of Spain. For the whole Pyrenean region, Cuadrat et al. (2013) provided a monthly homogenized air temperature dataset for the period 1950–2010, using the HOMER software (Mestre et al., 2013). In some other studies, the quality control and homogenization of daily data have appeared as preliminary steps for the generation of datasets in gridded form. Thus, Vicente-Serrano et al. (2010) produced a daily precipitation database for the northeastern corner of Iberia, where the homogenization was checked using the Standard Normal Homogeneity Test (SNHT; Alexandersson, 1986). In the same area, El Kenawy et al. (2011) compiled daily maximum and minimum temperature data, and the series homogeneity was assessed by three different tests. Serrano-Notivoli et al. (2017a) released SPREAD, a new high-resolution daily gridded precipitation dataset for the whole Spanish Iberian Peninsula, which included a novel precipitation quality control method as a first step. This research was followed by a similar project, but for daily temperature, releasing the STEAD data base and encompassing a wider period, 1901–2014 (Serrano-Notivoli et al., 2019). These remarkable efforts have given a deeper view on the nature of the variability and trends of recent climate in the southwestern corner of Europe.

In Catalonia, the Meteorological Service of Catalonia (SMC), a public institute of the Catalan Government, is in charge of providing data and information to monitor climate variability and climate change in the region. In this context, one of the main duties since 2003 is to collect, preserve and unify observed data into a single database, using all the available climate records for the region. This strategy has promoted several data rescue (DARE) campaigns, and many unknown or incomplete series have been saved, improving the availability of data and knowledge of climate evolution (Chimani et al., 2022). In addition, quality control procedures have been designed at a daily resolution, based on a semiautomatic method. In the last two decades, new techniques and approaches have appeared for the detection and adjustment of inhomogeneities in climate series. The European ES0601 Cost Action (HOME) brought a breakthrough, and since then several new and effective homogenization methods have been developed, namely HOMER (Mestre et al., 2013), ACMANT (Domonkos, 2015, 2020) and CLIMATOL (Guijarro, 2018). The SMC has participated in some of these initiatives, gathering information on the most suitable procedures, and supporting the development of new versions, as is the case with ACMANTv5 (Domonkos, 2021).

This paper describes the steps that have been taken to create CADTEP (CAtalan Daily TEmperature and Precipitation data set), a new daily climate database, encompassing the period since 1950 up to the present. Our proposal intends to improve the already available data for the region, including two main issues: (1) building of a unified database with data collected from all the public institutions that rules or have ruled observational networks in Catalonia, for an improved quality control and homogeneity analysis, and (2) presentation of the new version of ACMANT homogenization method (ACMANTv5.1), which takes the advantage of the information from available metadata with higher efficiency than the earlier versions. Thus, in section 2, a description of the sources of data and metadata is provided, while section 3 describes the quality control procedure applied to the daily series. Section 4 reports the homogenization analysis proposed (ACMANTv5.1), with a comparison of homogenization results with metadata use and those without that. In section 5, the mean linear trends detected in the annual and seasonal averages of the new temperature and precipitation datasets are shown, while section 6 is reserved to discussion and conclusions. The study is supplied with an Appendix A presenting the main properties of the ACMANT homogenization method.


2.1 Description

The database used in this study for the first time includes climate series from the two main official weather station networks in Catalonia, the one ruled by the State Meteorology Agency (AEMET) and the other by the SMC. The unification of data was possible, thanks to an agreement between both institutions for sharing meteorological data. AEMET contributed with long series of climate records covering the major part of the 20th and early 21st centuries, while the SMC provided data from stations managed by the Catalan Government since 1988. However, SMC data rescue initiatives improved the temporal coverage of the series initially provided by the AEMET, and helped to find and save unpublished climate records. In addition, data from several weather stations located in southeast France, Andorra and the eastern fringe of Aragon, as well as data from the northernmost part of the Valencian Country were also added to the database.

Stations with daily maximum (TX) and minimum (TN) air temperature and precipitation (PPT) amount series covering the period 1950–2021 were selected as target stations for climate trend analyses. Note that daily mean air temperature has not been treated specifically, as can be approached by the arithmetical average of TN and TX. All the stations, regardless of temporal coverage, were used at a first level to carry out the quality control analysis, including both manned and automatic ones. Table 1 shows the main characteristics of the available data. Figure 1 shows the availability of stations for each year and parameter, while Figure 2 shows the same information, but for different altitude bands. For daily PPT, the 24 h-period selected was the “meteorological day,” that is, from 0800 up to 0800 UTC, as it is the most common period in the database (manned stations rule in this manner). As it can be seen from Table 1, the amount of PPT data clearly exceeds that of the TX and TN data. Data availability increases over time for both parameters, and the initial bias towards a much larger number of rain gauge stations has partly been balanced. The maximum availability of data occurs in the first half of the 2010s, with nearly 570 active rainfall stations and 500 thermometric stations (Figure 1). Regarding the temporal coverage within the study period, the average of station series length is close to 18 years for both temperature and precipitation, but the availability of long precipitation series (>30 years) is clearly higher compared to temperature series. The data availability sharply decreases for heights above 1000 m and, especially, above 2000 m (Figure 2). This is associated with several factors: the relatively small extension of surface above these heights in Catalonia, rural depopulation and the automation of hydroelectric plants in the Pyrenees.

TABLE 1. Main characteristics of the available temperature and precipitation series for the period 1950–2021 over the domain
Variable Number of series Mean temporal coverage years Stations <5 years (%) Stations >10 years (%) Stations >30 years (%) Stations >50 years (%)
TX 983 17.9 133 (13.5) 493 (50.1) 149 (15.2) 26 (2.6)
TN 986 17.8 135 (13.7) 492 (49.9) 148 (15.0) 27 (2.7)
PPT 1392 20.3 219 (15.7) 439 (31.5) 316 (22.7) 90 (6.4)
Details are in the caption following the image
Time evolution of the number of meteorological stations with available data [Colour figure can be viewed at wileyonlinelibrary.com]
Details are in the caption following the image
The same as Figure 1, but for some selected altitudinal bands [Colour figure can be viewed at wileyonlinelibrary.com]

2.2 Metadata

For all the stations, the data relating to georeferencing (latitude, longitude and height) are available, and they have been controlled in the initial database. Additionally, it was possible to retrieve information from the SMC, being related to the history of individual stations; thus, a metadata database, called METADEM, was created (Prohom & Herrero, 2008). Whenever it was possible, reported changes in instrumentation, in the shelter (for temperature), or in the methodology of observation were saved. A change in location is reported as the start of a new station, and by giving a new code and station name. For the stations managed by the SMC, the availability of metadata information is higher, but this issue is more complicated in the case of the stations managed by the AEMET, as most of the information is not yet digitized.

2.3 Series composition

To study long-term climatic trends, sufficiently long time series are needed and preferably with few gaps. Unfortunately, as it was shown in section 2.1, it is not very frequent to have such series in the real world; therefore, long series need to be built by merging the series of observed data when the geographical proximity and climatic similarity allow to do that. It is known also as blending or composition of series (Klein Tank et al., 2002; Squintu et al., 2020).

For the composition of series, it is necessary to establish some minimum criteria. In the case of the SMC and following the recommendations established by previous studies (e.g., Squintu et al., 2020), the criteria are
  1. Identification of the candidate series: The series selection process begins with the identification of those stations that are linked to currently in operation weather stations and with guarantees of continuity in the future. From here, those that meet the following criteria are firstly selected: (a) they must have a temporal coverage >10 years, (b) the ratio of gaps must be less than 5%, (c) the amount of invalid data identified during the QC process must be <5%.
  2. Selection of neighbouring past series: In a retrospective process, series that can be associated with the candidate series are identified. They must meet the following criteria: (1) they are within a radius of 10 km and (2) with a maximum difference in height of 50 m in comparison with the location of the candidate series. The similarity of the geographical exposition (valley, basin, plateau) is also considered, especially in places with complex orography.
  3. Incorporation of the rest of the series: All the series that have not been used in the composition process and the fragments of series that have been rejected in the previous step are preserved, since they will be used in the quality control.

Following this process, we built 26 TX series, 26 TN series and 72 PPT series, in all cases covering the period 1950–2021, with less than 5% data gaps. Data gaps were infilled during the homogenization procedure (section 4). Figure 3 shows the spatial distribution of the selected series for temperature and precipitation, together with the total series available in the initial database. The spatial distribution is quite good, covering the various morphoclimatic units of the domain.

Details are in the caption following the image
Geographic location of temperature (left) and precipitation (right) series from the CADTEM database (1950–2021). In red the selected series in both cases, in green all the available temperature series, and in blue the available precipitation series [Colour figure can be viewed at wileyonlinelibrary.com]


Quality control (QC) is a process that allows to detect and label suspicious or potentially erroneous values. This step is necessary to avoid inserting erroneous data into the database, which could compromise the results in subsequent climate analyses, such as the homogeneity of the series or trend analysis. The QC procedure applied in this case is based on a two-step process. The first step applies a semi-automatic process based on absolute tests, that is able to identify suspicious or erroneous values for their large absolute bias. For this purpose, EXTRAQC routine written in R was used (Aguilar & Prohom, 2010), which has been recently included in the CLIMPACT open-source software (https://climpact-sci.org). The software checks the data to identify possible errors related to the lack of internal coherence, duplicate dates, incorrect rounding, excessive temperature jumps between two consecutive days, flatlines or outliers. Gross errors in daily TX, TN and PPT series can be detected and removed at this step. The second step consists of the visual inspection and expert evaluation of the daily series. First, the time series were grouped into geographically homogeneous regions (mostly they identical with the counties of Catalonia) and then they were visualized graphically. An expert climatologist analysed the display, detected the suspicious/erroneous data and labelled them as “no data” or modified the value depending on the case (whenever possible the original source was consulted). The most common errors are:
  • Data displacement: Usually, series or parts of series can be affected by this problem, exclusively in periods of manual operation, and in our case this problem especially affected PPT and TX. In these cases, PPT and TX values were erroneously assigned to the day on which the observation was made (usually in the morning) and not to the previous day, which would have been the correct dating. This kind of error is often systematic, and its duration can be identified, so it is relatively simple to correct such displacements.
  • TX/TN values clearly divergent with neighbouring series (spatial outliers): It is common to detect in manned weather stations the appearance of a temperature record that is clearly anomalous in relation to the surrounding stations, with bias magnitudes that may reach 5–10°C. This may happen, for instance, if an observer or digitizer transcripts an unnecessary negative sign in front of the temperature value, or its opposite case, when a necessary negative sign is omitted. Such cases can be detected as errors when they are compared to the observed data of surrounding stations. Sometimes erroneous recording is perpetuated during several days. The correct adjustment of such errors is not always straightforward and doing spatial interpolation for infilling data gaps or substituting erroneous values is not recommended before homogenization (Domonkos et al., 2022).
  • TX or TN value repeated systematically: A long streak of the repeated recording of the same TX or TN value, showing a completely flat line graph, indicates error. Such errors may come from instrument failure or observer's negligence. Longer than 3 days streaks of repeated temperature records are considered erroneous. Some of these errors are detected thanks to one of the tests included in the RCLIMDEX-extraqc software, and they were corrected.
  • False zero precipitation: Sometimes, a weather station records zero PPT, while the neighbouring stations record fallen precipitation. When this anomaly repeatedly occurs, it may indicate the lack of observation and erroneous introduction of zeros instead of putting the missing data code. The quality control of this problem is usually done visually, although there are also software-based quality control options for that (see Serrano-Notivoli et al., 2017a, 2017b). For the period 2003–2020, the incorporation of radar images during the QC process has increased the ability to detect isolated or intermittent occurrences of false zero precipitation.

At the end of the quality control process, each value is labelled into three categories (valid without modification, valid with modification, invalid), and for each county, a report on the results of the QC analysis is released (see Figure 4). In general, the ratio of invalid data was around 1%–2%.

Details are in the caption following the image
Example of a section of the CQ report that is released for each Catalan county (in this case “La Selva,” and for the international period 1991–2020). Information on the QC results for each variable and for each weather station is provided (https://infogram.com/selva-eng-1h7v4pw5zlj786k?live) [Colour figure can be viewed at wileyonlinelibrary.com]


This section describes the homogenization method performed with ACMANTv5.1. Two versions of homogenized data have been processed, that is, the homogenization was performed both with metadata use and without that. First, section 4.1 describes the selection of the series included in the homogenization procedure, and presents the mean spatial correlations between time series. A brief characterization of ACMANTv5.1 is given in section 4.2, followed by a discussion of the correct selection of programs from the ACMANTv5.1 software package (section 4.3). In section 4.4, we discuss the selection of useful metadata from the stock of available metadata by the joint analysis of metadata lists and break detection statistics.

4.1 Selection of time series

Homogenization of time series is needed specially for those series that are long enough to be used in climate trend and climate variability analysis. In our case, the target dataset consists of 26 TX, 26 TN and 72 PPT series (section 2.3). However, to achieve higher accuracy, we often also use shorter and less complete time series in the role of neighbour series. In our case, we supplied the target dataset with time series of nearby observations out of Catalonia (7 TX, 7 TN and 8 PPT series) in order to improve the homogenization accuracy of time series of near border areas. Then the check of spatial correlations (see Figure 5) indicated that our dataset is dense and spatially sufficiently correlating, and no further time series are needed to the homogenization.

Details are in the caption following the image
Ordered spatial correlations between the candidate series and its neighbour series from the highest to the 30th highest. TEMP: averages for 33 TN and 33 TX series, PPT: averages for 77 PPT series [Colour figure can be viewed at wileyonlinelibrary.com]

Spatial correlations were calculated with the ACMANTv5 software, which gives correlations based on deseasonalized monthly values of the increment series after the first round of homogenization. Differences between the correlations for TX and TN are very small and therefore only their averages are presented. Figure 5 shows that the spatial correlations are generally higher for temperature data than for PPT, and most of them are above 0.8 (0.65) for temperature (precipitation) series. Note that we excluded 3 PPT series of high-mountain stations from the correlation analysis.

4.2 ACMANTv5.1

The development of ACMANT method (Applied Caussian-Mestre Algorithm for the homogenization of Networks of climate Time series) started during the European project COST ES0601 (“HOME”; Venema et al., 2012). ACMANT adopted the optimal step function fitting and ANOVA correction model from its successful predecessor, PRODIGE (Caussinus & Mestre, 2004). The theoretical properties and practical performance of these modern and highly effective homogenization tools were analysed in several studies (Domonkos, 2017; Domonkos et al., 2022; Lindau & Venema, 2016, 2018a, 2018b), and we consider them to be a part of the best homogenization tools. After HOME, ACMANT has gradually been developed further with the inclusion of bivariate detection, ensemble homogenization, weighted ANOVA model and combined time series comparison (Domonkos, 2020, 2021), among other issues. The most recent version ACMANTv5.1 is usable for the homogenization of several climate variables either in daily or monthly time resolution. The efficiency of automatic ACMANTv4 was extensively tested by the Spanish MULTITEST project, and there ACMANT was found to be generally more accurate than any other tested method (Domonkos et al., 2021; Guijarro et al., 2017).

ACMANTv5 can be used either in automatic or interactive mode, and it can take the benefit of metadata. In a recent study, the advantage of permissive metadata use has been demonstrated (Domonkos, 2022), and the version ACMANTv5.1 already includes the related development. Readers can find more information about ACMANT in the Appendix, and we also recommend a recently written book about state of art homogenization tools (Domonkos et al., 2022).

4.3 Program selection from the ACMANTv5.1 software package

Although developers intend to automatize more and more steps of time series homogenization, there still remain problems which need human intervention. The kind of climate variable partly decides the question of program selection, but not all details are determined by that. Temperature can be homogenized with three different models of the dominant annual cycle of inhomogeneity bias size, and users must choose from them the most appropriate one. For temperature data of midlatitude regions, generally the sinusoid annual cycle model is recommended for the homogenization of TX, and the irregular cycle model is recommended for the homogenization of TN. This recommendation is valid also in Catalonia.

For precipitation homogenization, the duration of snowy season must be defined by users where the dominant precipitation form is snow in a part of the year. Regarding the major part of the study dataset, the dominant precipitation form is rain throughout the year, exceptions are only the data of two high mountain stations (Ransol and Turó de l'Home) where between December and March the dominant precipitation form is snow. Large inhomogeneities are generally more frequent in snow data than in rain data, and a problem is that the seasonal division of the year must be set for networks whose data are homogenized together, and it cannot be varied within networks according to stations. Therefore, first we checked if any large inhomogeneity occurs for the snow precipitation data of the mountain stations, homogenizing them with the supposed snowy season of December to March for a whole network. Then, having seen that such inhomogeneities do not affect our data, we used the mountain station data without seasonal division, that is, in the same way how all the other time series of PPT were used.

4.4 Selection of useful metadata

Permissive metadata use means that the dates of metadata are introduced to the homogenization procedure as valid break dates, without examining their statistical significance (Domonkos, 2022). However, related tests with synthetic data showed that the expected positive effects on the homogenization results may cease when more than 40%–50% of the introduced dates do not point on a true change. Therefore, to decide about the usefulness of metadata, we examined if ACMANTv5.1 can or cannot detect breaks around metadata dates when the metadata are not used in the homogenization. First, we present some general break detection statistics (Table 2) and then we will evaluate the importance of metadata by the data of Table 3.

TABLE 2. Some statistics of the homogenization with ACMANTv5.1 without metadata use
Number of time series 33 33 80
Mean length of time series (years) 70.5 70.5 70.6
Detected breaks per 100 years 6.75 6.10 1.23
Detected outlier periods per 100 years 1.07 0.95
Mean length of homogeneous sections (years) 12.3 13.4 38.4
TABLE 3. Number of known relocations/merging dates, shelter changes and sensor changes from metadata (left panel) and ratio of detected breaks at metadata dates (right panel)
Number of metadata issues Ratio of detected issues (%)
Relocation/merging dates 59 59 116 59.3 57.6 11.2
Shelters 11 11 45.5 27.3
Sensors 30 27 10 26.7 25.9 0.0

Table 2 shows that the frequency of detected breaks in the studied TN and TX series is rather normal. Literature suggests that the typical mean length of homogeneous sections of time series between two adjacent breaks is about 15–20 years (e.g., Lindau & Venema, 2018a), and ACMANT often produces slightly elevated detected break frequency (Killick, 2016), which has minor effect on the accuracy of homogenized time series (Coll et al., 2020; Domonkos, 2022). Most of the detected breaks have lower than 1°C magnitude, or they are part of relatively short, platform-shaped breaks.

Table 2 also shows that the detected break frequency is much lower in precipitation series than in temperature series. This finding is not specific either to ACMANT or to the Catalan dataset. For PPT we generally find less inhomogeneities, since PPT (particularly those of rain precipitation) are less sensitive to possible micro-environmental changes, and because the signal-to-noise ratio during homogenization is generally lower for PPT than for many other climate variables (Domonkos, 2015; Spinoni et al., 2015). In our case, no break was detected in 42 series (53%) of the PPT dataset.

We sorted the known metadata to relocation, shelter change and sensor change groups. When more than one change occurred at the same time, the consideration was according to rank order importance, which is relocation, shelter change and sensor change. The change from Manned Weather Stations (MWS) to Automatic Weather Stations (AWS) took place together with relocation in most cases, and when not, they are considered a shelter change for TN and TX and a sensor change for PPT. In Table 3, the left panel shows the amount of the known metadata grouped according to climate variable and metadata type, while the right panel shows the ratio of correctly detected breaks by ACMANTv5.1 when the statistical procedure was performed without metadata use. A detection of metadata event was considered correct when the absolute difference between detected break date and metadata date was less than 24 months. Table 3 shows that the highest ratio of correct detection occurs for relocation events, but in PPT homogenization the ratios are low for all kinds of inhomogeneities. Note that the 25%–27% ratio of correct detection of thermometer change events is not higher than which is expected for randomly selected dates.

Although the presented analysis serves with useful information about the importance of different type technical changes of the observation, any decision on the inclusion or exclusion of pieces of metadata in the homogenization with ACMANTv5.1 is essentially subjective, since possible lack of statistical significance or low preciseness of statistical detection results do not prove that the events related to such metadata did not affect the observed data. Our decision for the metadata of TN and TX was the inclusion of relocation and/or merging dates and shelter change dates, and the exclusion of metadata indicating only thermometer changes. For PPT homogenization, we used only relocation and/or merging dates. Even from them, we excluded those metadata which led to very small breaks (<2% change) or to small and short-term biases (<6% bias for shorter than 6 years) in a homogenization experiment.

Finally, in the homogenization with metadata we used 70 pieces of metadata for the homogenization of TN and TX, and 82 pieces of metadata for the homogenization of PPT.


Here we examine the influence of the performed homogenization on the temporal evolution of temperature and precipitation data. We use the homogenization results obtained with metadata use, except when it is declared in other way. In section 5.1 we compare the temporal variability of data before homogenization and after homogenization. In section 5.2 we continue these examinations on calculated linear trends, but there the trends themselves are the most important, because they are often significant, while the area mean differences between homogenized and nonhomogenized trends are relatively small.

5.1 Impact of homogenization

We can compare the area average temporal evolution of Catalan temperature data before homogenization and after homogenization by Figure 6. The homogenization made TN values (Figure 6, left panel) cooler for all the period before 2010. The size of the negative adjustments is 0.2–0.4°C, except around 1995 they are slightly larger for a few years. Graphics of TX (Figure 6, right panel) show that for this variable negative adjustments were more frequent than positive adjustments again, but adjustment sizes reaching or exceeding 0.2°C occurred only between 1993 and 2010. Homogenization did not cause perceptible changes in the area mean PPT values (not shown).

Details are in the caption following the image
Area mean temporal evolution of TX (left) and TN (right) in Catalonia 1950–2021, based on quality-controlled but no homogenized data (dashed black line) and homogenized data (solid blue line) [Colour figure can be viewed at wileyonlinelibrary.com]

Although the magnitude of homogenization adjustments is generally small in the area average values, it is not always true for individual station series. Figure 7 shows the homogenization of TX, TN and PPT series of 4 stations located in four different geographical areas: Vielha (the Pyrenees), Girona (interior plain), Flix-Vinebre (southernmost sector) and Barcelona Airport (central coast). For the early sections of the TN series in Girona, Flix-Vinebre and Barcelona Airport negative corrections as large as approximately 2°C must have been done, while only much smaller inhomogeneity biases were found for the TX and PPT series and only for relatively short periods. The case of Vielha differs, there the adjustments are larger for TX and PPT than for TN, and they are the most noticeable at the beginning of the series.

Details are in the caption following the image
Temporal evolution of TX (left), TN (central) and PPT (right) in four series located in different geographical areas, based on quality-controlled but no homogenized (dashed black line) and homogenized (solid blue line) series [Colour figure can be viewed at wileyonlinelibrary.com]

Regarding the causes of large inhomogeneity biases in the selected TN series, in Girona a large-size detected break (−1.7°C) coincides with a station relocation. Interestingly, the same relocation caused only very little change in the TX series (+0.15°C). In Flix-Vinebre, the large bias in the raw TN series is for the accumulated effect of two station relocations in 1987 (−052°C) and 2008 (−0.92°C). In this case, notable break sizes but with shifts of opposite signs were detected in the TX series. Finally, the causes of the large biases in the temperature series of Barcelona airport are less clear. A large break was detected in August 2002 in each of the TN and TX series, but metadata do not indicate changes at that time. These results demonstrate the importance of the joint use of metadata and appropriate statistical methods.

5.2 Climate trend analysis

Trend analysis is applied to the database by minimizing square errors of linear trends. The statistical significance of trends was controlled by Mann–Kendall test (Kendall, 1975; Mann, 1945). The calculations were performed both for the homogenized and nonhomogenized databases, and both for homogenization results with metadata use and those without metadata use. Results with confidence levels of >95% were considered to be statistically significant. We calculated monthly, seasonal and annual trend values for all daily series of TN, TX and PPT, and also for the mean values for Catalonia. Table 4 shows the mean trends for the whole domain.

TABLE 4. Mean trend values in Catalonia for TX, TN and PPT considering raw quality-controlled series (Raw-QCd), homogenization without metadata and homogenization with metadata (period 1950–2021)
TX (°C·decade−1) TN (°C·decade−1) PPT (%·decade−1)
Raw QCd Without metadata With metadata Raw QCd Without metadata With metadata Raw QCd Without metadata With metadata
JAN +0.25* +0.23* +0.21* 0.15 +0.20* +0.18* +4.9 +5.4 +5.2
FEB +0.29* +0.27* +0.25* 0.17 +0.22* +0.20* −2.5 −2.2 −2.3
MAR +0.29* +0.28* +0.26* 0.10 +0.14* +0.12 −4.1 −3.8 −3.9
APR +0.29* +0.29* +0.27* 0.19* +0.21* +0.21* +5.2 +5.5 +5.3
MAY +0.29* +0.30* +0.28* 0.14* +0.16* +0.15* −2.4 −2.1 −2.2
JUN +0.48* +0.50* +0.48* 0.22* +0.25* +0.25* −6.6* −6.2 −6.3
JUL +0.36* +0.37* +0.35* 0.21* +0.23* +0.23* −2.7 −2.2 −2.3
AUG +0.48* +0.48* +0.46* 0.26* +0.27* +0.28* −5.3* −5.0 −5.1
SEP +0.25* +0.24* +0.22* 0.09 +0.09 +0.09 −4.8 −4.5 −4.6
OCT +0.34* +0.33* +0.31* 0.22* +0.23* +0.23* −0.8 −0.5 −0.6
NOV +0.19* +0.16* +0.14* 0.13 +0.16 +0.15 +4.3 +4.6 +4.5
DEC +0.25* +0.22* +0.20* 0.07 +0.12 +0.10 −9.1 −8.8 −8.9
ANNUAL +0.31* +0.31* +0.29* 0.16* +0.19* +0.18* −1.8 −1.5 −1.6
WIN (DJF) +0.27* +0.25* +0.23* 0.13* +0.18* +0.16* −2.7 −2.3 −2.4
SPR (MAM) +0.29* +0.29* +0.27* 0.14* +0.17* +0.16* −0.3 0.0 −0.1
SUM (JJA) +0.44* +0.45* +0.43* 0.23* +0.25* +0.25* −5.2* −4.8* −4.9*
AUT (SON) +0.26* +0.24* +0.22* 0.15* +0.16* +0.16* −0.5 −0.2 −0.3
  • Note: Asterisk (*) indicates statistical significance at the 95% confidence level (Mann–Kendall test).

The examined regional mean temperature values have significant increasing trends all the year round, exceptions are only some monthly TN. By contrast, none of the PPT trends are significant, with the only exception of summer (JJA) PPT (−4.9%·decade−1).

Homogenization resulted in small differences in the calculated area mean trends (Table 4). With homogenized data, the warming trend of annual mean TN is more intense with 0.02°C·decade−1 than with using raw data, while the same type of comparison for annual mean TX trends shows the opposite sign effect, that is, the warming of homogenized data is slower with 0.02°C·decade−1. Somewhat larger differences occur in monthly TX trends, that is, for November and December the TX trends with homogenized data is less intense with 0.05°C·decade−1 than when raw data are used. For PPT, the decreasing trend with homogenized data is slightly less intense than with raw data, the difference is 0.3%·decade−1 for the annual totals.

The inclusion or exclusion of metadata use influenced the homogenization results, but the differences in the calculated trends for this reason are very small, and they never exceed 0.02°C·decade−1 for temperature trends and 0.5%·decade−1 for PPT trends.

Table 4 also shows that differences between using homogenized or raw data, or those between the inclusion or exclusion of metadata use rarely influenced the statistical significance of trends. The warming trend is generally fast, and more intense for TX than for TN. The largest temperature increase for the study period took place in summer TX. Among the monthly trends, the largest warming of TX (TN) is 0.48°C (0.27°C)·decade−1 for June (August). The summer warming was accompanied by significantly decreasing precipitation. The decrease of PPT is clearer for early summer months (May, June) and late summer months (August, September) than for July when the monthly precipitation totals are generally lower and more variable than in other summer months. Note, however, that none of the monthly PPT trends are statistically significant. The enhanced degree of summer climate change in comparison with that of the other seasons is also indicated by the differences between TX trends and TN trends, which are the largest for the summer season.


A dense and spatially sufficiently correlating daily temperature and precipitation database, CADTEP, has been built for Catalonia, covering the period 1950–2021. The included climate data has been subjected to thorough data quality control and a sophisticated homogenization procedure, which assure the high quality of CADTEP. We used this new database to assess the climate change in Catalonia over the period covered by CADTEP.

We have found significant warming trends both for TN and TX and for all seasons of the year. The detected warming is generally similar to the mean global land surface temperature increase between 1960 and 2020, which is approximately 1.5°C (IPCC, 2021). However, the warming of summer TX is approximately the double of the mean temperature increase in the other seasons.

Figure 8 visualizes the changes occurred in the annual cycles of TN, TX and PPT between an early subperiod (1951–1980) and a late subperiod (1991–2020) of CADTEP data. It can be seen that the most marked thermal increase occurred in the summer months, and especially for the maximum temperature. With regard to precipitation, a marked decrease in precipitation in the warm half of the year (months from May to September) is evident in the most recent subperiod, and therefore an extension of the summer drought towards spring and autumn. The decline of precipitation in June is noteworthy (by more than 15 mm on average between the two periods) and have been reported by other authors in Iberia (Del Río et al., 2011; González-Hidalgo et al., 2011), and is in accordance with the poleward propagation of the northern Hadley circulation (IPCC, 2021). The summer thermal increase, combined with a decrease in precipitation in this period of the year, can generate an intensification in atmospheric evaporative demand, driving to stressful conditions both of natural and hydrological systems (Vicente-Serrano et al., 2014, 2018).

Details are in the caption following the image
Area mean annual cycles of TX (left), TN (central) and PPT (right) based on the homogenized series of Catalonia in two periods, 1951–1980 (dashed black line) and 1991–2020 (solid blue line) [Colour figure can be viewed at wileyonlinelibrary.com]

In relation to the applied homogenization process, the metadata use had relatively small effect on the calculated trends. It is because the database is of high spatial density and spatial correlations (see also Domonkos, 2022), and no synchronous technical change occurred in the Catalan observing network (according to the metadata information and according to our knowledge). Nevertheless, the best practice is to check metadata first for possible occurrences of coincidental inhomogeneities. Even when the likely effect of metadata use is small (in homogenizing dense networks), the metadata use is always recommended (O'Neill et al., 2022; Venema et al., 2020). Climate change is a very serious issue, and our responsibility is to provide as accurate data as possible for the correct monitoring of ongoing changes in climate variables.

To conclude, the main findings of this study are as follows:
  • We have built a spatially dense and sufficiently correlating daily climate database for the area of Catalonia, CADTEP, whose data cover the period 1950–2021. CADTEP includes quality-controlled and homogenized data, hence the climate change and climate variability that took place in the period of CADTEP can be determined with high reliability and accuracy.
  • Surface air temperature in Catalonia increased significantly over the period 1950–2021. The increase was larger for daily temperature maximums (TX) than daily temperature minimums (TN). The speed of the warming is similar to that of the global mean land surface warming, most of the year.
  • Summer TX increased with enhanced intensity. The mean TX change in Catalonia over the study period is approximately the triple of global mean surface temperature increase (including both lands and oceans), and the double of global mean land surface temperature increase.
  • Precipitation totals do not show significant trends, except for summer. Summer PPT have a significant decreasing trend, which likely contributes to the enhanced increasing trend of summer TX.
  • ACMANTv5.1 is a user friendly and effective homogenization method. Metadata with ACMANTv5.1 can be used in automatic mode and for datasets of any size.


Marc Prohom: Conceptualization; investigation; funding acquisition; writing – original draft; methodology; supervision; project administration; writing – review and editing. Peter Domonkos: Investigation; writing – original draft; methodology; writing – review and editing; software; conceptualization. Jordi Cunillera: Conceptualization; investigation; methodology; visualization; writing – review and editing; data curation; software. Antoni Barrera-Escoda: Investigation; methodology; writing – review and editing; data curation. Montserrat Busto: Investigation; writing – review and editing; validation. Mònica Herrero-Anaya: Investigation; writing – review and editing; validation. Albert Aparicio: Visualization; investigation. Jaume Reynés: Investigation; methodology.


We would like to thank AEMET, Météo-France, Meteorological Service of Andorra, and Roberto Serrano-Notivoli for providing the initial for CADTEP generation. We also appretiate the comments of the reviewers which have improved the final quality of the paper.


    The full description of ACMANTv4 is published by Domonkos (2020), while the developments for version 5 are published by Domonkos et al. (2021, 2022).

    A.1. Relative homogenization method

    ACMANT uses a group of time series of the same climatic region. According to its base model, any time series (X) consists of the climate signal (U), station effect (V) including the mean local deviation from the area mean values plus possible inhomogeneities, and weather dependent noise (ɛ), according to Equation (A1),
    X = U + V + ε X = x 1 , x 2 , , x i , , x n i = 1 , 2 , , n . (A1)
    Equation (A1) shows an additive relation between climate signal and station effect, and it is applicable for most climate variables. In treating precipitation amounts, ACMANT converts the values by a semi-logarithmic transformation before homogenization, and then the additive model is still applicable. For distinguishing inhomogeneities from the temporal variation of climate signal, the differences between a candidate series (XG ≡ G) and its neighbour series (XF or F) are examined. These difference series (T) are called relative time series (Equation (A2)),
    T = G F = V G V F + ε . (A2)

    We use relative time series in spite of the disturbing effect of the station effect of neighbour series (VF), since the climate variation disappears from the calculations. For the attenuation of the impact of VF on the homogenization results, composite reference series are used instead of individual neighbour series, or multiple time series comparisons can be applied. In ACMANT, minimum three neighbour series are needed to perform the homogenization of a candidate series.

    A.2. Optimal step function fitting for break detection

    S K = min j 1 , j 2 j K k = 0 K i = j k + 1 j k + 1 t i T k ¯ 2 . (A3)

    In Equation (A3), K stands for the number of steps, while jk denotes break positions or end of time series. j0 = 0 and jk+1 = n, by definition. In PRODIGE, K is optimized by the semi-empirical Caussinus–Lyazrhi criterion (Caussinus & Lyazrhi, 1997), and its slightly modified version is used in ACMANT.

    A.3. Joint correction of inhomogeneities by the ANOVA correction model

    This model allows the joint calculation of all adjustment terms for all time series homogenized together. In the basic model (common ANOVA model), the climate is presumed to be spatially constant for all the time series s (s = 1,2,…N) of the examined network. Equations (A4) and (A5) show the solution when the total number of breaks for the network is K,
    1 j s , k + 1 j s , k i = j s , k + 1 j s , k + 1 u i ̂ + v s , k ̂ = x s , k ¯ , (A4)
    u i ̂ + 1 N s = 1 N v s , k i ̂ = 1 N s = 1 N x s , i . (A5)

    In Equations (A4) and (A5), x denotes the nonhomogenized values of the input time series, while upper stroke and cap (symbol ^) above a variable denote section mean and estimated value, respectively. Equation (A4) is applied to each homogeneous segment of each time series, while Equation (A5) is applied to each time point (i). The ANOVA model family has weighted ANOVA version where the spatial change of climate is taken into account. Following the model established by Szentimrey (2010), ACMANT applies the weighted ANOVA model (Domonkos, 2020) for the calculation of final correction terms.

    A.4. Automatic networking in the homogenization of large datasets

    Datasets of too large areas or of highly varied climate must be divided to smaller networks. In ACMANT, the 30 best correlating neighbour series are selected to each candidate series when the division to networks is reasonable by dataset size or climatic variety, and the station density is sufficient to find 30 sufficiently correlating series. The number of selected neighbour series can be even larger when some of the neighbour series do not cover the full period of the candidate series.

    A.5. Combined time series comparison

    In ACMANT, both composite reference series and pairwise comparisons are applied. More specifically, the so-called combined time series comparison is applied in an early phase of the homogenization procedure. This method unifies the advantages of pairwise comparisons and the use of composite reference series.

    A.6. Separated treatment of long-lasting inhomogeneities and short-term data quality issues

    In the main steps of break detection the shortest possible distance between two consecutive breaks is 3 years. It is because the distinction between noise and inhomogeneities is highly uncertain for short periods. ACMANT refines the dates of preliminary break positions and examines the possible occurrences of short-term, platform-shaped biases of significant magnitude. For these reasons, break dates can be much closer than 3 years in the final break detection results.

    A.7. Multiphase homogenization

    Before homogenization, any time series might include large magnitude inhomogeneity biases, therefore three cycles of break detection and inhomogeneity removal are performed in an ACMANT procedure. However, in iterative homogenization error propagation might occur resulting in a convergence of homogenized time series to a common time evolution differing from the true climate signal (see Domonkos et al., 2022, sec. 6.2). To attenuate possible error propagations, some tricks are included, from which the most important is the ensemble homogenization. In the second iteration cycle, the uncertainty of homogenization results is monitored by performing the homogenization with varied composite reference series. In the last iteration cycle, an ensemble homogenization is performed, in which the variation of the reference series properties between ensemble members reflects the previously estimated uncertainty range.

    A.8. Several models for the annual cycle of inhomogeneity bias

    In ACMANT, users can choose the appropriate model of seasonality of inhomogeneity biases from a set of available models.

    A.8.1. Sinusoid cycle with modes at the solstices

    This model is recommended for climate variables whose station effects are often affected by radiation changes (temperature, sunshine duration, relative humidity), for datasets of extratropical regions. When the annual cycle of radiation moderately differs from the regular harmonic cycle, the use of this model is still beneficial. In this model, two annual variables are homogenized together, they are the annual mean and the summer–winter difference of the studied climate variable. Common break dates for the two variables are searched by a bivariate extension of A.2. In the inhomogeneity removal, distinct ANOVA model calculations are performed for the two variables.

    A.8.2. Irregular shaped annual cycle

    This model is recommended in most cases when the conditions for using model (A.8.1) are not given, but is not recommended when the signal-to-noise ratio is known to be much lower than usual. In this model, the break detection results for the annual mean series are taken, and distinct adjustment terms are calculated by the ANOVA model for each calendar month. Before applying the monthly adjustments, a smoothing is applied between the adjustment terms of adjacent calendar months.

    A.8.3. Flat annual cycle

    This model is recommended when the signal-to-noise ratio is low either for the general characteristics of the homogenization task, or for the lack of significant seasonal changes in the studied climate variable. For rain precipitation this model is the only option. In this model, the annual mean adjustment terms are evenly applied to each month and day throughout the year.

    A.8.4. Division the year to rainy season and snowy season

    This model is applied only in precipitation homogenization and only when snow precipitation dominates in a part of the year. Users must introduce the first and last calendar months of the snowy season based on climatological knowledge. In this model bivariate homogenization is performed where one variable is the annual precipitation total of the rainy season and the other variable is the annual precipitation total of the snowy season.

    A.9. Homogenization both for monthly and daily datasets

    Either daily or monthly datasets can be homogenized by ACMANT, but note that only section mean values are homogenized, and the higher moments of the probability distribution are not examined.

    A.10. High missing data tolerance

    ACMANT can be used in the homogenization of time series with high missing data ratio, and time series covering varied time periods can be homogenized together. However, taking the benefit of short time series in the homogenization of longer series needs further methodological developments.

    A.11. Automatic and interactive options

    ACMANT has both automatic and interactive versions, and metadata can be used in both versions. A most recent development in ACMANTv5.1 is the permissive metadata use (Domonkos, 2022), which is applicable both in automatic and interactive modes. In permissive metadata use metadata dates are considered detected break position in the setting of the ANOVA correction model. This metadata use improves homogenization accuracy when more than 50% of the metadata dates point on truly occurred changes in the technical conditions of the climate observation. No kind of statistical significance is expected to its application.


    CADTEP data are available on request from the Catalan Meteorological Service.