What global reanalysis best represents near‐surface winds?

Since global reanalysis datasets first appeared in the 1990s, they have become an essential tool to understand the climate of the past. The wind power industry uses those products extensively for wind resource assessment, while several climate services for energy rely on them as well. Nowadays various datasets coexist, which complicates the selection of the most suitable source for each purpose. In an effort to identify the products that best represent the wind speed features at turbine hub heights, five state‐of‐the‐art global reanalyses have been analysed: ERA5, ERA‐Interim, the Japanese 55‐year Reanalysis (JRA55), the Modern Era Retrospective Analysis for Research and Applications‐2 (MERRA2), and the National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) Reanalysis 1 (R1). A multi‐reanalysis ensemble approach is used to explore the main differences amongst these datasets in terms of surface wind characteristics. Then, the quality of the surface and near‐surface winds is evaluated with a set of 77 instrumented tall towers. Results reveal that important discrepancies exist in terms of boreal winter seasonal means, interannual variability (IAV), and decadal linear trends. The differences in the computation of these parameters, which are mainly concentrated inland, reach up to the order of magnitude of the parameters themselves. Comparison with in situ observations shows that the ERA5 surface winds offer the best agreement, correlating and reproducing the observed variability better than a multi‐reanalysis mean in 35.1% of the tall tower sites on a daily time‐scale. However, none of the reanalyses stands out from the others when comparing seasonal mean winds. Regarding the IAV, near‐surface winds from ERA5 offer the values closest to the observed IAV.

variables where in situ observations are lacking. However, these products have inherent uncertainties that derive from model simplifications, observational uncertainties, and the data assimilation procedure. Although there are several ongoing initiatives in the community to produce reanalysis datasets (e.g., Copernicus Climate Change Service (C3S), 2017), only a few efforts have been devoted to comparing their quality in a coordinated way (Fujiwara et al., 2017). It is true, however, that several studies have focused on verifying reanalysis data on a regional scale (Kumar and Hu, 2012;Alvarez et al., 2014;Kaiser-Weiss et al., 2015;Sharp et al., 2015;Olauson, 2018;Rehman Tahir et al., 2018;Uotila et al., 2018). Others intended to cover the whole world by using interpolated observational datasets (Donat et al., 2014) or employing some stations distributed worldwide (Decker et al., 2012). A more comprehensive verification would be beneficial, especially for socio-economic sectors that employ reanalysis datasets.
The energy industry is one of the biggest user groups of reanalysis datasets . Specifically for the wind industry, the lack of long and homogeneous records of wind speed observations has favoured the adoption of reanalyses for assessing wind resources (Cannon et al., 2015). Preconstructive wind resource assessment studies need to determine long-term mean wind speed (Tammelin et al., 2013) and its probability distribution accurately at each turbine location to estimate wind power generation and revenues. The characterization of year-to-year variations of the wind speed is crucial to understanding the risk of a wind farm project (Pryor et al., 2006;Brower et al., 2013) and also estimating the amount of debt that banks can finance. Information on long-term trends can also be used to foresee changes in the performance of wind farms during their lifetime, which typically comprises between 20 and 30 years (Jude and Leseney, 2017). All these evaluations require long records, which typically are not available at the location where the farm is planned to be built, which has led wind energy users to rely on reanalysis products.
Another frequent usage of reanalyses for already operating wind farms is to understand the causes of anomalous wind speed episodes that impacted generation and revenues on monthly or seasonal time-scales. In recent years, the climate prediction community has started to unveil the climate drivers of anomalous events such as wind droughts (e.g., Lledó et al., 2018) and to produce seasonal forecasts (Doblas-Reyes et al., 2013;Clark et al., 2017) to anticipate those wind speed anomalies that can have an impact on wind energy activity. In this climate prediction framework, reanalyses are also used as observational reference for the adjustment of systematic errors affecting these predictions (Torralba et al., 2017b) and to assess the forecast quality (Jolliffe and Stephenson, 2012).
The different applications that use-either directly or indirectly-reanalyses require a scientific evaluation of the quality of the different available reanalysis datasets. An assessment is needed especially in those cases in which one single product is employed and commonly chosen ad hoc without taking into account its related uncertainty.
In this work, we aim to shed light on the uncertainty associated with observational references to help the reanalysis user decide about the suitability of a reanalysis dataset. The assessment will be done by comparing surface wind speeds from different global reanalysis datasets and verifying them further with in situ observations to eventually select the most accurate sources. The analysis focuses on three different aspects of the datasets: mean wind speeds, variability, and trends. Section 2 describes the reanalysis and observational datasets employed. The methodology is described in section 3. An intercomparison of the reanalysis set is presented in section 4 and the verification results are presented in section 5, including an interpretation in both sections. Discussion and conclusions follow in section 6.

Reanalysis datasets
Five widely used state-of-the-art global reanalyses are considered in this study. They include the two European Centre for Medium-Range Weather Forecasts (ECMWF) reanalyses ERA5 (Copernicus Climate Change Service (C3S), 2017) and ERA-Interim (Dee et al., 2011b), the Japanese 55-year Reanalysis (JRA55: Kobayashi et al., 2015), NASA's Modern Era Retrospective Analysis for Research and Applications-2 (MERRA2: Molod et al., 2015), and National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) Reanalysis 1 (R1: Kalnay et al., 1996). Table 1 lists the main characteristics concerning the time periods, data assimilation schemes, spatiotemporal resolution, single levels employed in this study, and references. Other reanalysis products were considered preliminarily to be included in this work, but have been disregarded for several reasons. The climate forecast system reanalysis (CFSR) dataset (Saha et al., 2010), which is published by NCEP, has been produced with two different model configurations before and after March 31, 2011, including different spectral resolutions. 1 Since this discontinuity produces detectable changes in mean wind speeds, the CFSR dataset has not been considered for the present study. Previous versions of the selected datasets such as JRA-25, ERA40, and MERRA have been discarded too, as they have been discontinued and superseded by newer products released by the respective centres. The case of NCEP/NCAR R1 is an exception. After its first publication in 1996, the National Oceanic and Atmospheric Administration (NOAA)/NCEP and NCAR released a newer reanalysis 1 https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/climateforecast-system-version2-cfsv2 T A B L E 1 Summary of the basic characteristics of the five reanalysis datasets used in this study (ordered alphabetically)  Kalnay et al. (1996) Expected to be discontinued at the end of August 2019 Expected to be expanded back to 1950 by the end of 2019 in 2002, namely the NCEP/DOE Reanalysis AMIP-II or R2 (Kanamitsu et al., 2002). Nonetheless, some studies (e.g., Lucio-Eceiza et al., 2018) have revealed its poor performance in estimating surface wind speeds, which is worse than its predecessor R1. Many regional reanalyses with higher spatial resolutions exist as well, but they do not cover the whole globe, and their usage may be limited in some applications, such as verification and bias adjustment of subseasonal to seasonal predictions made at the global scale. Moreover, all regional reanalyses are refinements of a global reanalysis produced through dynamical downscaling and data assimilation techniques (Mesinger et al., 2006;Gleeson et al., 2017). Therefore, their value largely depends on the quality of the driving reanalysis, so that they have not been included here. The whole set of reanalyses used in this study assimilate both conventional data (such as surface synoptic (SYNOP) ocean stations, radiosonde profiles, or aircraft records) and satellite measurements (provided mainly by scatterometers and the Special Sensor Microwave Imager (SSMI)). All of the selected datasets include some observations of surface oceanic winds and all of them, except R1, assimilate satellite observations of ocean surface winds (Fujiwara et al., 2017). It is worth mentioning that none of the reanalyses ingests surface winds from land stations. This is known to be problematic, especially over areas with inhomogeneous terrain, where stations experience strong local influences and do not represent the grid-area winds appropriately. Instead, the 10-m wind speeds, hereafter referred to as surface wind speeds, are parametrized in planetary boundary layer schemes from surface characteristics, stability indices, and the lowest native model winds. For instance, ERA5 and ERA-Interim use Monin-Obukhov theory and an open-terrain roughness to make the field more comparable with SYNOP stations. MERRA2 also uses a scheme based on the Monin-Obukhov similarity theory (Helfand and Schubert, 1995), which includes the effect of a viscous sublayer for heat and moisture transport over all surfaces except land (Molod et al., 2015). The NCEP/NCAR R1 uses a bulk model (see Stull (2012), section 11.2.3) with an aerodynamic formula that considers the fluxes of sensible flux, heat, and momentum proportional to the difference between values at the surface and the adjacent atmosphere (Kanamitsu, 1989). The proportionality constants depend on the wind speed as well as the static stability of the surface layer and are based on the work of Businger et al. (1971), and subsequently the Monin-Obukhov similarity theory. According to Torralba et al. (2017a), the extrapolation of surface winds in the JRA55 reanalysis from the lowest model level is carried out assuming neutral stability. In areas where this level is located too high above the surface (such as forested zones), this assumption may lead to a significant reduction of wind speeds after extrapolation to the surface level.
Some of the most recent reanalyses (e.g., MERRA2 and ERA5) have started to provide winds at turbine hub heights, in reaction to the demands of the wind-power industry. Those winds can describe the vertical wind profile better within the first 100 m of the atmosphere. In particular, ERA5 supplies wind speeds at 100 m above the ground, while MERRA2 offers wind speeds computed at the 50-m level. In this article, they will be referred to as ERA5 and MERRA2 near-surface winds, respectively.

Tall tower observations
The accuracy of the reanalysis in the representation of both surface and hub-height wind speed is assessed in this study through a comparison with hub-height wind-speed observations measured in tall meteorological towers (also known as met masts) distributed worldwide. This type of structure F I G U R E 1 Global distribution of the 77 tall towers. Colours indicate, for each tall tower, the height of the measuring level employed in this study allows the placement of several sensors along a high vertical tower, the altitude of which commonly varies from 10 to more than 100 m (e.g., Van Ulden and Wieringa, 1996;Kouwenhoven, 2007). Tall tower data are widely used within the wind power industry to characterize the wind-flow fine features at turbine hub heights, that is, between 80 and 120 m for modern turbines, as well as monitoring the current weather situation in wind farms. Thus, these measurements are preferred over those taken at surface level, which are usually disturbed by the surrounding elements. The maintenance of these structures, however, is much more expensive than for surface stations, and therefore these data appear sparsely and are usually difficult to find for public usage.
A new unique dataset containing quality-controlled wind observations plus other climate observations such as temperature or relative humidity from 222 different observational sites has recently been created under the name of the Tall Tower Dataset (Ramon and Lledó, 2019). The reporting time sampling of these masts ranges from 10 min to 1 hr. Due to the high tower maintenance costs mentioned above, the record length of these time series represents a major handicap in this study, as most tall towers do not have a lifespan of more than three years. Moreover, these series often contain interruptions in the measuring process, most likely produced by sensor failures or tower maintenance tasks, so that the net record length available can be shortened even more. To reach a balance between quality and amount of data, we have selected those towers containing at least three years of data. Although this threshold has been chosen somewhat arbitrarily, we considered it optimal for the purposes of the present work. Hence, the total number of towers employed in this study is eventually set to 77. Their spatial distribution is presented in Figure 1 and the main meta information is specified in the Supporting Information in Table S1. Even though the masts appear over all continents, their distribution is highly heterogeneous, and most of the locations appear clustered in Europe and North America, which are areas with a large deployment of wind energy. For each of the towers, we have selected the anemometer that is measuring at the closest level to 100 m (see Figure 1 and Table S1), which is the average of modern turbine hub heights. Note that in some towers only one anemometer is installed at the top of the mast, and it is sometimes placed far from a 100 m height.
As far as the authors know, the Tall Tower Dataset constitutes an independent set of observations that has not been assimilated previously by any reanalysis, thus assuring the statistical independence with the reanalysis products. This is important to ensure a fair comparison, since a verification with observations employed in the assimilation of a reanalysis would lead to biased scores.

Reanalysis intercomparison
To establish the main differences and agreements amongst the five global reanalyses, the common period 1980-2017 has been selected. Firstly, the surface wind speeds from all reanalyses have been computed from the corresponding zonal and meridional components. Then, seasonal averages have been calculated from 6-hr (in the case of ERA-Interim, JRA55, and R1 datasets) or hourly (for ERA5 and MERRA2 reanalyses) instantaneous values. Results have been produced for each season: December-January-February (DJF), March-April-May (MAM), June-July-August (JJA) and September-October-November (SON). As the five reanalyses use different grids (Table 1), an interpolation to a common grid is needed for comparison. All grids have been interpolated to the F47(T62) horizontal grid, which is R1's grid and is the coarsest out of the five reanalysis grids. A conservative interpolation method from the R package s2dverification (Manubens et al., 2018) has been employed to this end. The differences in terms of seasonal mean wind speeds, the interannual variability (IAV)-computed separately for each season as the standard deviation of the seasonal means within the 1980-2017 period and normalized by the seasonal climatology of all years under consideration-and the long-term trends have been assessed. A multi-reanalysis approach, which is analogous to the often applied multi-model approach in climate modelling, has been used for mean winds and IAV. Firstly, the seasonal mean and IAV are computed for each of the reanalyses. Then, the multi-reanalysis mean (MR) is computed at each grid point with a nonweighted mean. Together with the MR, the spread is computed as the range of values (i.e., the difference between the absolute maximum and minimum values) at each grid point. This parameter highlights the areas where different datasets are in disagreement. Finally, the difference between each reanalysis and the MR is computed (also referred to as the departure) in order to quantify the contribution of each of the reanalyses to the overall uncertainty.
Long-term linear trends have been computed using a simple linear regression equation, where the independent variable is time and the dependent variable is wind speed. Thus, the slope represents the rate of change in wind speed over time. This parameter has subsequently been normalized by the seasonal mean-wind speed value and unit-converted to be expressed as a percentage change per decade, which can also be referred to as a decadal trend. We also test the significance of the decadal trends being different from zero by applying a Student's -test. This methodology followed in the computation of decadal trends is identical to the approach used in Torralba et al. (2017a) to facilitate the comparison of results.

Verification with tall tower observations
Tall tower observations are originally available at hourly or subhourly time resolution. Daily and seasonal mean values have been computed, so they can be compared against reanalysis daily and seasonal means (which were obtained from hourly or 6-hr model outputs). Although one might argue that reanalysis products with hourly resolution have an advantage in the verification, some tests (not shown) revealed that this factor does not change any of the conclusions. Moreover, a user-oriented verification has to highlight the quality of the entire product available, including all its features, such as better spatial or time resolution.
To verify reanalysis datasets with in situ observations, the gridded data have to be interpolated horizontally at each tower location. Three horizontal interpolation methods have been tested in order to select the most appropriate one for the objectives of this work.
• Nearest-neighbour (i.e., takes the closest grid point to the location of interest). This has been considered in the literature as the most suitable method for variables with high spatiotemporal variability like precipitation (Accadia et al., 2003). Additionally, this approach has also been employed in wind-speed variability studies such as that of Lucio-Eceiza et al. (2018). • Nine-point average (i.e., averaging the nine nearest grid points). For variables with high spatial variability, this method produces smoother results. • Bilinear interpolation (Press et al., 2006, equation 3.6.5).
To compare reanalysis surface and near-surface data against each tall tower's winds, a vertical extrapolation of reanalysis wind speeds ( ) to the tower measuring height closest to 100 m (ℎ) is performed using a power law: where ℎ ref is the reference height of the reanalysis field (i.e., 10, 50, or 100 m). Alpha ( ) is a non-dimensional wind-shear exponent. A value of 0.143 has been used for land (Touma, 1977), while 0.11 has been employed over water bodies (Hsu et al., 1994). We note that both exponents assume neutral stability. According to the previous two references, whereas neutral stability prevails in oceanic areas, the onshore shear exponent is more dependent on stability conditions. This approximation is fair enough when considering long-period averages (such as seasonal means), but might add some uncertainty when considering daily averages (Kelly, 2016). Daily averages at each location have been verified in terms of spatiotemporal correlation, standard deviation, and centred root-mean-squared error (CRMSE: Jolliffe and Stephenson, 2012, equation 6.2). A Taylor diagram (Taylor, 2001) is used here to visualize these three statistical parameters from multiple reanalyses in one single graph. In addition, two tests of significance have been performed to evaluate the statistical significance of the change in correlation and standard deviation when comparing the results from two different datasets. For the correlation, the Williams test (Williams, 1959) has been employed to assess the difference between two dependent correlations that share one variable. Analogously, the statistical significance of the change in standard deviation has been analysed using the -test (von Storch and Zwiers, 1999). Finally, reanalysis seasonal means and IAVs are compared against those computed using the observational dataset.

REANALYSIS INTERCOMPARISON
A comparison of surface wind speeds from five global reanalysis datasets is presented here for DJF. Similar results are obtained for MAM, JJA, and SON, so only the comparison in the boreal summer (JJA) is included in Figures S1, S2, and S3 within the Supporting Information to avoid repetition. Agreements and differences in terms of climatology, variability, and trends at surface level are noted and described in the following subsections.

Differences in mean wind speed
The differences between reanalyses in terms of seasonal mean wind speeds in DJF are shown in Figure 2. Although the strongest winds occur over extratropical oceans (Figure 2a) with values exceeding 10 m/s, the spread derived from the five mean wind speed fields (Figure 2b) reveals that the main differences lie inland. These discrepancies can be partly explained by differences in the land-surface roughness (which is derived from land cover and vegetation databases) and elevation representation. Indeed, the grid resolution of the reanalysis models has a strong influence on the final roughness and elevation representation. It can be noted that those products with coarser resolution (i.e., JRA55 and R1, Figure 2e and g, respectively) tend to show the highest wind speeds over high elevated mountain ranges. The JRA55 (Figure 2e) also shows a systematic negative departure pattern over continental regions, which is particularly extended over Eurasia. Over the ocean, the main differences are noticed by R1 in the eastern Pacific and at high latitudes ( Figure 2g). This disparity maybe explained by the fact that R1 does not assimilate ocean surface winds from satellite observations (Fujiwara et al., 2017).

Differences in variability
The discrepancies in the year-to-year variations of the surface wind speeds in the different reanalyses have been explored through the IAV (expressed as a percentage of the mean F I G U R E 3 Same as in Figure 2, but for the IAV wind speed). The MR for the IAV (Figure 3a) shows that the highest IAVs are encountered over oceanic regions along the Equator, especially in the maritime continent (Indonesia) and Pacific, where values that exceed 16% of the climatology are found. Furthermore, we also notice spots of high IAV in the Iberian and Scandinavian peninsulas, as well as northern Asia. Figure 3b informs us that the highest spread is observed inland. Indeed, the disagreements between the five reanalyses in the computation of the IAV value reach 10 percentage points, which is actually of the same order of magnitude as the IAV itself. The areas with the highest spread values often match densely vegetated regions. A detailed look at the departures of each reanalysis from the MR reveals that JRA55 (Figure 3e) concentrates the highest IAVs in the aforementioned regions and is the main contributor to the high spread. The other products offer negative departures over continental areas (Figure 3c,d,f,g), with the exception of some scattered spots of high IAV displayed by MERRA2 ( Figure 3f) and R1 (Figure 3g).

Differences in long-term trends
Linear trends are presented in Figure 4 as However, the NCEP/NCAR reanalysis (Figure 4e) shows different behaviour for the two hemispheres. For this product, decadal trends are systematically stronger in the Southern Hemisphere and the intertropical area, probably induced by the different amount of observations-which is considerably larger north of the Equator-assimilated by the R1 reanalysis in the two hemispheres (Kalnay et al., 1996). In addition, since R1 does not ingest any ocean surface record from satellites, this difference between hemispheres is more appreciable.
Several causes can produce observed trends in reanalysis surface wind speeds (Thorne and Vose, 2010;Vautard et al., 2010;Dee et al., 2011a;Wohland et al., 2019): • decadal variations in the atmospheric circulation (e.g., changes in the location or intensity of the storm tracks), period. An asterisk indicates that the trends are significant at the 95% confidence level: no asterisk indicates that the trends are not significant. One asterisk ( * ) means that only one of the reanalyses has significant trends, two asterisks ( * * ) inform us that two reanalyses have significant trends, and so on (adapted from Torralba et al., 2017a) • changes in surface roughness due to land cover and vegetation changes, • spurious trends in assimilated observations due to instrumental drift, measuring errors, or station relocations that produce inhomogeneities, or • the growing number of observations available for assimilation in more recent periods.
The role of each driver, however, is still uncertain. Although separating spurious from real causes in the observed trends might be challenging, some remarks can still be made.
Firstly, changes in climate dynamics, such as the strengthening of the Walker circulation due to climate change and the subsequent reinforcement of the Hadley cell, can increase the trade winds within the intertropical area (L' Heureux et al., 2013). This change in the dynamic of the tropical atmosphere is well observed in the multi-reanalysis agreement map (Figure 4f), where all five reanalyses depict a significant trend over several marine tropical areas in the Southern Hemisphere. This has been documented in previous studies, such as Wu et al. (2018). Similarly, a significant strengthening of wind speeds is observed in the Southern Hemisphere storm track (Figure 4f). Lee and Feldstein (2013) studied a shift of westerly winds polewards in the Southern Hemisphere and attributed this fluctuation to either an increase of greenhouse gas concentrations or a decrease of stratospheric ozone.
Secondly, surface roughness can vary substantially over vegetated regions, due to changes in the forest canopy. A statistically significant diminution of wind speeds is noticed over an extended area in Eurasia (Figure 4f), possibly mirroring the wind-stilling effect already reported by Vautard et al. (2010). Vautard et al. (2010) attributed the stilling by alluding to an increase of the surface roughness produced partly by the growth of the forest extension in the last 30 years. A reduction of winds is observed in India too, but no particular driver has been suggested so far.
Concerning the last two items, one might expect that artefacts originated by the assimilation systems and available observations are partially compensated after considering several reanalysis products, since they employ diverging data assimilation schemes and ingest different amounts of records. For this reason, and as recommended in Wohland et al. (2019) for 20th century reanalyses, the use of more than one reanalysis to derive a single trends map, as in Figure 4f, can provide more robust conclusions than any single product.

VERIFICATION WITH TALL TOWER OBSERVATIONS
The reanalysis intercomparison has revealed that several agreements and discrepancies do exist between these products. However, at this point it cannot be concluded which of the sources represents each surface wind speed feature better. A further comparison of the reanalysis output (hereafter referred to as modelled winds) against wind-speed in situ measurements (from now on referred to as observed winds) is needed to evaluate the uncertainty of the reanalyses in reproducing the previously described wind speed characteristics. Furthermore, this comparison can also be helpful to identify the reanalysis that represents the observed winds better within the first 100 m of the atmosphere.
In the following subsections, reanalysis surface and near-surface winds are extrapolated vertically to the selected tower measuring height (for each tower location, this is the measuring level closest to 100 m, see Table S1) and all the comparisons with tall tower data are made at that level. The set of 77 tall towers distributed worldwide is employed (see section 2.2).

Comparison of different horizontal interpolations
The first comparison of observed and modelled daily wind speeds is presented by means of a Taylor diagram. We note that only the vertically extrapolated surface winds from the five reanalysis are utilized here. Three different interpolation methodologies (see section 3 for details) have been applied over the wind speeds from the reanalysis to produce wind speed values at the locations of the 77 tall towers. Results are presented in Figure 5. Generally, the bilinear method provides the highest correlation coefficients out of the three interpolation approaches. A maximum correlation coefficient of 0.82 is obtained for MERRA2 when F I G U R E 5 Taylor diagram of the pairwise observed and modelled daily-averaged surface wind speeds for the 77 tall towers and the five reanalyses. Three different interpolation methods are tested here: bilinear (bil), nearest-neighbour (nea) and nine-point average (9pt). The radial dimension represents the model standard deviations normalized by the observations. Pearson correlation coefficients are represented in angular coordinates, whereas the arcs show the CRMSEs using this method, followed closely by ERA5 with 0.81. The lowest correlations (i.e., below 0.76) are obtained from those reanalyses with the coarsest grids, which are R1 and JRA55. The bilinear method, however, still stands as the best interpolation method in these cases. In terms of variability, both the nearest-neighbour and bilinear methods are approaches that better represent the variability of the wind speed from the tall towers. The nine-point average smooths the reanalysis winds, significantly decreasing the variability compared with the other methods. It is also worth noting that JRA55 displays a substantial overestimation of the variability for the nearest-neighbour method. This result matches the observed increase of IAV with respect to the other reanalyses in Figure 3e. The best results in terms of CRMSEs are also observed for the bilinear method; for that reason, this method has been employed for spatial interpolation of the reanalysis data to the tall tower locations for the rest of the study.

Verification of daily-averaged surface winds
The quality of the vertically extrapolated reanalysis surface winds is assessed now in terms of correlation, standard deviation, and CRMSE. Results are presented by comparing each reanalysis product with the MR obtained from the averaging of the five reanalyses ( Figure 6). Arrows within the Taylor diagrams depict the change of points from the Blue arrows indicate that the MR provides better correlations and standard deviations closer to the observed than the reanalyses individually and that both changes are significant at the 95% confidence level. Red arrows imply that the MR improves neither the correlations nor the standard deviations, and that both changes are significant at the 95% confidence level. Grey arrows represent all other possible cases reanalysis (beginning of the arrow) to the MR (arrowhead) produced by the difference in correlation, standard deviation, and CRMSE, compared with the in situ data set. Only ERA5 (Figure 6b), which contains 1-hr data, presents a definite improvement with respect to the MR, showing statistically significant improvements in correlation as well as standard deviation in 35.1% of the masts (i.e., a total of 27 towers). Nevertheless, the MR still improves the performance of ERA5 in 5.2% of the masts. MERRA2 (Figure 6d) also provides 1-hr data, but NASA's reanalysis does not show an overall improvement or worsening, as most of the sites (i.e., 79.2%) do not present either significant results or improvements in both correlation and standard deviation parameters. The reanalyses with 6-hr values, ERA-Interim, JRA55, and R1 (Figure 6a,c,e, respectively) show poor performance compared with the MR. Neither JRA55 nor R1 provides better (a) (b) F I G U R E 7 Box plots summarizing the differences between observed and modelled seasonal climatologies for 77 tall towers in (a) December-January-February and (b) June-July-August results than the MR in any of the tall towers, and the MR presents a statistically significant improvement in half of the total tall tower set (with percentages of 53.2% and 42.9%, respectively).

Verification of surface and near-surface winds on a seasonal time-scale
In this subsection, we assess the ERA5 and MERRA2 near-surface winds, together with surface winds from the five reanalysis products, in terms of seasonal mean winds and IAV. The MR mean, which has been computed using only surface wind fields, is also included. Both surface and near-surface winds are adjusted vertically to the closest 100-m tower heights.
Seasonal climatologies have been computed from both tall tower observations and reanalysis datasets and their differences are plotted in Figure 7 by means of box plots. In general terms, reanalysis datasets tend to show weaker seasonal mean winds than observed in both DJF and JJA, JRA55 being the dataset that provides the widest range of values, as well as the biggest underestimation, out of the five reanalyses plus the MR. This negative bias was also observed in Figure 2e extended over continental areas and is now confirmed by the observed wind data. Conversely, none of the other reanalyses stands out from the others as the best predictor of seasonal climatologies, since all of them offer quite similar results. With respect to near-surface datasets, no improvement is noticed for MERRA2 near surface winds compared with surface winds in DJF (Figure 7a), whereas, in JJA (Figure 7b), the near-surface winds tend to adjust the observed mean winds better by shifting the median of the differences to the zero value. Likewise, in the case of ERA5 it is observed that the near-surface winds reduce the spread of differences in JJA compared with the ERA5 surface winds (Figure 7b).
Analogously to Figure 7, the observed and modelled IAVs have been computed, and their differences are plotted in Figure 8. We note that the value of the IAV, as defined here, is not affected by the vertical extrapolation, as normalization by the seasonal means cancels the correction factor. As seen in Figure 8, reanalyses tend to underestimate the observed IAV, particularly in DJF (Figure 8a). This is a result somewhat expected, since grid values in a reanalysis represent an average within a cell of hundreds of square kilometres, thus smoothing the modelled variability. The wider range of values noted in DJF (Figure 8a) illustrates the complications of the reanalysis in reproducing the observed IAV in that particular season. These difficulties are related to the high variability of the wind speed in the Northern Hemisphere, where most of the towers are located. The widest spread of difference values is noticed for JRA55 in winter (Figure 8a). We note that JRA55 is the only dataset that overestimates the IAV by 5 or more percentage points in more than one tower location. These sites correspond to the Juelich, Puijo, Sodankyla, and WM09 tall towers, which are located mainly near forested areas in Europe and South Africa, which confirms the results presented in section 4.2 concerning overestimation of the variability over vegetated areas. The near-surface winds provided by ERA5 show a slight improvement in both seasons, presenting a narrower range of values oscillating around zero. In the case of NASA's reanalysis hub-height winds, only a minor enhancement of quality is found in DJF.

DISCUSSION AND CONCLUSIONS
Climate researchers and wind energy users often rely on reanalysis datasets for different activities such as the assessment of the wind energy resources in a particular region. However, the coexistence of several reanalysis products challenges the choice of source that characterizes the hub-height wind speed features most accurately. Indeed, disagreements in the representation of these near-surface winds have been encountered after analysing five global state-of-the-art reanalyses, demonstrating the unavoidable degree of uncertainty affecting these datasets. In this work, we describe and quantify this uncertainty in order to inform the reanalysis user about the product that describes the hub-height winds best.
Differences in DJF mean wind speed, year-to-year variability, and long-term trends have been spotted between the five global reanalyses. In particular, the most significant disagreements are encountered within continental areas. Mean wind-speed differences can be partly explained by different representations of land-surface roughness and elevation at the various grid resolutions employed in the reanalysis models. The ability of the assimilation methods to nudge fields towards observed values can also have an impact, as well as the density and quality of assimilated data or the boundary layer parametrization schemes in each model. Similarly, the main discrepancies in the IAV are seen inland. The strong IAV displayed by JRA55 is in opposition to the variability depicted by the other reanalyses. These high IAV values, as well as the systematic negative biases in the seasonal mean winds, may result from a deficiency in deriving the surface winds, as reported by Torralba et al. (2017a). In addition, the different amounts of available observations over time and the possible quality defects within these time series may affect the derived wind speeds. A comparison of linear trends (following Torralba et al., 2017a) confirms the exceptional nature of the strong trends observed in JRA55, which are not displayed by any of the other datasets.
The multi-reanalysis agreement map reveals that, with some exceptions, significant positive and negative trends are displayed over marine areas and continental regions, respectively. All reanalyses coincide with signalling an increase of wind speeds in the Southern Hemisphere along the storm track. Taking into account the results obtained by Lee and Feldstein (2013), we point out that this strengthening of winds mirrors the shift of westerlies towards the South Pole. It is worth noting that some trends observed in only some of the reanalyses may be generated by artefacts in the assimilation process, due mainly to the abrupt evolution of the availability of observations. To overcome this handicap, we emphasize the advantages of using a multi-reanalysis approach, because most of these spurious trends can compensate each other or even cancel.
After intercomparing the set of five reanalyses, the quality of these datasets has been assessed by means of a comparison with point data measured at 77 different tall-tower sites. The interpolation of the gridded data to each tower location is carried out using the bilinear method, which has been proven to offer a slight improvement in terms of daily correlation compared with the other two approaches tested (i.e., nearest-neighbour and nine-point average). To extrapolate the surface and near-surface reanalysis data vertically to each tower measuring level, a power-law equation has been used. Even though this vertical extrapolation is very simple and assumes neutral stability, the differences between extrapolated reanalysis winds and observed values are centred around zero. However, representativeness errors (single location versus grid area) make the distribution of errors quite large.
The verification process offered insights into which reanalyses perform better than others on daily and seasonal time-scales. Amongst all five surface wind datasets plus the MR, ERA5 shows the best results in terms of correlation and standard deviation. Indeed, the improvement in both correlation and variability with respect to the MR is statistically significant in 35% of tall tower sites, which is considerably high when compared with other reanalyses such as MERRA2 (9.1%) and ERA-Interim (1.3%). Neither JRA55 nor R1 beats the MR in any of the tower locations. Near-surface winds from ERA5 and MERRA2 have been included in some of the comparisons, particularly those made at seasonal scale. Results show a slight improvement of these variables over surface winds from the corresponding reanalyses. Notably, this enhancement is more evident for the IAV than for the seasonal means. For the latter, the high uncertainty derived from the comparison with tall tower seasonal averages leads us to discourage the use of global reanalyses to estimate mean winds. In spite of that, ERA5 outperforms all other reanalysis datasets plus the MR.
All in all, we conclude that the newly produced ERA5 near-surface wind dataset offers the best estimates of mean wind speed and variability at turbine hub heights. This is actually a crucial result, especially if we take into account that ERA5 is a reanalysis that will be updated operationally, providing useful climate information in near-real time that can be easily integrated into different decision-making processes. Moreover, many wind energy applications (e.g., the study of sudden wind changes that may lead to sudden energy ramps) can also benefit from the 1-hr data available from ERA5. Nevertheless, when it comes to the analysis of trends, we recommend a multi-reanalysis approach instead of using one single product.
Unfortunately, it has been difficult to ensure the validity of the linear trends in the five reanalyses by comparison with observations at the tall tower sites, since long and homogeneous records are not generally available. We intended to filter those tall towers within the Tall Tower Dataset, containing data spanning more than 20 years and without homogeneity issues. The experiment left only four tall towers that fulfilled this condition (Barrow, Cabauw, Mauna Loa, and National Wind Technology Center Mast 2), so that any conclusion can be derived from this exercise, due to the reduced sample of masts available for calculations. A more comprehensive description of the experiment design, as well as verification with the set of four masts, is detailed in section S3 and Figure  S4 in the Supporting Information.
Although reanalyses are intended to cover the absence of high-quality and long observational records, in situ data are still needed, not only to verify climate products, but also for other applications such as model configuration. For instance, in the creation of the New European Wind Atlas (NEWA, 2019), only eight meteorological masts placed in northern Europe were used to decide the optimum configuration of the Weather Research and Forecasting (WRF) model employed in the generation of the wind atlas (Witha et al., 2019), even though the domain covers the whole of Europe. In this regard, efforts should be made to facilitate the availability of hub-height wind measurements. New initiatives, such as the Tall Tower Dataset (Ramon and Lledó, 2019) within the context of the European project Integrated Approach for the development across Europe of user oriented climate indicators for GFCS high priority sectors: agriculture, disaster risk reduction, energy, health, water and tourism, are starting to appear to promote the usage of these data. Future work will be devoted to the study of the possible improvements that a vertical extrapolation that considers the stability of the planetary boundary layer could introduce, and their implications for the impact models currently employed in the wind energy sector to perform wind power estimations.

DATA AVAILABILITY STATEMENT
Data from 181 out of the 222 tall towers, within the Tall Tower Dataset, can be publicly accessed in the data repository EUDAT at https://doi.org/10.23728/b2share.0d3a99db75df 4238820ee548f35ee36b. Please refer to Ramon and Lledó (2019) for complete information on the data collection, formatting, and quality control.