WRF wind field assessment under multiple forcings using spatialized aircraft data

The performances of limited area weather models are affected by the choice of core solvers, domain resolutions, and initial and boundary conditions. To understand the extent of such differences on simulated wind fields, weather research and forecast (WRF) simulations initialized by different forcings were extensively compared with an aircraft‐derived high‐resolution data set. The two used forcings were the European Centre for Medium‐Range Weather Forecasts (ECMWF) ERA‐Interim reanalysis and the National Centers for Environmental Predictions (NCEP) Climate Forecast System Reanalysis (CFSR). The model domain covered a large portion of central western Italy (including part of the Tyrrhenian coast) encompassing the aircraft track and allowed the characterization of their performance across the simulation domain rather than a small set of point‐based observations. The WRF results show good agreement with the aircraft data across the whole flight track with both forcings (root mean square errors (RMSEs) < 2.3 m·s−1 and an average r2 = 0.7). Orography and coasts show an effect on simulated wind fields. The presence of a strong orography (which is smoothed by the model internal terrain elevation model) is associated with increased errors. Distance from the coast is also associated with a variation in RMSE (even if in a non‐straightforward manner) because of potential breeze effects. No forcing data set clearly outperforms the other, while the ECMWF has higher correlation co‐efficients when considering wind direction.


| INTRODUCTION
Mesoscale atmospheric models exhibit a growing interest both for scientific and operational use. Among them, the weather research and forecast (WRF) (Skamarock et al., 2008) is today the most adopted of the many different applications. It has been used for weather forecast, being officially adopted by the National Oceanic and Atmospheric Administration (NOAA), and for air quality modelling either when coupled to a chemistry model (Waked et al., 2013) or when integrated with the additional WRF-CHEM chemistry module (Grell et al., 2011;Saide et al., 2011). The WRF has been also used for various purposes, including wellness and health-related applications (Doherty et al., 2009), wind power potential mapping and forecast (Wharton et al., 2013;Giannaros et al., 2017), wind farm power output assessment (Yuan et al., 2017), and as a forcing for Lagrangian particle dispersion models to simulate the atmospheric transport of passive scalars (Nehrkorn et al., 2013) and particles (de Foy et al., 2011;Bei et al., 2013). All these applications are critically dependent on the capability of the WRF to simulate and reproduce the wind speed fields, among other variables, correctly.
State-of-the-art mesoscale models are complex frameworks implementing interrelations and feedbacks between the Earth's biosphere, marine surfaces and atmosphere (Grell et al., 1994;Skamarock et al., 2008). Simulation outputs strongly depend on model parametrizations and settings, in terms of the initialization data set (i.e. forcing), surface scheme, cloud scheme, land-use data, digital terrain model, and so on. The optimal choice of such parametrizations is not universal but depends on the variable of interest. Specifically, several components of the WRF are related to wind speed fields, such as the land surface scheme that contains the parametrization of mass and energy exchanges between the surface and the planetary boundary layer (PBL), and the PBL parametrization itself. Santos-Alamillos et al. (2013) reported that wind speed and direction estimates were mostly influenced by the PBL scheme and, particularly for wind direction, by orography. Giannaros et al. (2017) highlighted a tendency of the WRF to overestimate weak winds and to underestimate strong winds, with the highest biases being related to the parametrization of complex topography. Wharton et al. (2013) highlighted the importance of optimal surface and PBL schemes to improve the surface-atmosphere exchange simulations and therefore the wind speed vector estimation across the PBL.
The presence of complex orography is among the factors influencing the discrepancy between measured and simulated wind fields (e.g. Giannaros et al., 2017). Mesoscale models represent the land surface at a coarse resolution, which may lead to discrepancies between simulated wind speed and measurements from ground stations located at certain altitudes (Carvalho et al., 2014). Orographic-induced biases were also found by García-Díez et al. (2015) in a comparison experiment with the WRF forced by the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis, National Centers for Environmental Predictions (NCEP)-National Center for Atmospheric Research (NCAR) reanalysis, and the NCEP Global Forecast System (GFS) (Yang et al., 2006).
However, all these studies compared model data at the finest grid horizontal resolution, that is, typically between 1 and 5 km, with point-based ground observations close to the surface, and they used a single specific forcing data set as initial and boundary conditions Santos-Alamillos et al., 2013;Yang et al., 2013;Draxl et al., 2014). Moreover, point-based ground measurements, unless very dense networks are deployed, provide a limited assessment of wind spatial variability, which instead is very important when used to evaluate the model's capability to simulate atmospheric circulations.
Light aircraft (Gioli et al., 2006) and unmanned aerial vehicles  are important data sources for environmental and atmospheric studies, especially in sounding the surface-atmosphere interface and the PBL. These platforms allow researchers to cover horizontal scales from a few metres to hundreds of kilometres, and to perform vertical profiling from the surface up to several kilometres. The past 20 years have seen the development of high-sensitivity and high-frequency probes and their associated processing algorithms that give aerial platforms the capability to measure surface fluxes of momentum and passive scalars (Miglietta et al., 2007;Toscano et al., 2011), and the wind vector at high spatial resolution (Kocer et al., 2011;Mayer et al., 2012;Thomas et al., 2012;Bonin et al., 2013), providing novel data to assess the models' performance in both space and time.
This study is therefore focused on the assessment of the WRF's capability to simulate mesoscale wind speed and direction based on two different forcing data sets with different spatial resolutions. The two used forcings were the ECMWF ERA-Interim reanalysis (Simmons et al., 2007;Dee et al., 2011) and the NCEP Climate Forecast System Reanalysis (CFSR) (Saha et al., 2010). Such an assessment is made by comparing model outputs with extensive aircraft measurements over a regional domain. The study area covers a large portion of central western Italy (Tuscany) extending from the coastline to about 100 km inland, covering the main land-use classes and various orography conditions. The aircraft data set is rather unique because it offers repeated measurements over a span of two years, therefore covering seasonal variability. Model performance could be assessed: (1) in terms of temporal and spatial variability; (2) along terrain height variations that are critical in mesoscale atmospheric modelling (Carvalho et al., 2014);and (3) in terms of the capability to simulate local-scale circulations, such as the sea breeze that develops along the transition from the sea to inland areas.

| WRF model set-up
The WRF is a mesoscale numerical weather prediction system that employs a fully compressible, Euler nonhydrostatic equations set (Skamarock et al., 2008). The model uses a terrain-following, dry hydrostatic-pressure vertical co-ordinate and an Arakawa staggered C-grid as horizontal co-ordinates. The WRF system contains two dynamic solvers to perform time and space integration of the motion equations. Version 3.5.1 of the Advanced Research WRF (ARW) solver was used. The model was set up with the following parametrizations on the basis of Mohan and Bhati (2011) and Santos-Alamillos et al.  (Mlawer et al., 1997) for long wave radiative transfer; the Rapid Radiative Transfer Model for Global Circulation Models (RRTMG) scheme (Iacono et al., 2008) for short wave radiative transfer; the Revised MM5 Surface scheme (Jiménez et al., 2012) for surface parameterization; the Unified NOAH Land Surface Model for land surface parameterization (Ek et al., 2003); and the Yonsei University scheme (Hong et al., 2006) for PBL parameterization (Table 1).
Land use was provided by the moderate resolution imaging spectroradiometer (MODIS) classes implemented in the WRF (at 30 arcsec resolution) with the NOAH land surface model for surface parameterization. The MODIS-derived static land-use data set is classified into 20 land-use categories with a single category for the urban/built-up environment: one for croplands, one for a mosaic of cropland/natural vegetation, and the others divided between various types of natural environments (different types of forest, savanna, tundra, and so on). Two different sets of initial and boundary conditions were provided every 6 hr (at 0000, 0600, 1200 and 1800 UTC), characterized by different spatial resolutions: ECMWF ERA-Interim data have a resolution of 0.75 , while CFSR data have a resolution of 0.5 (pressure-level variables) and 0.3 (surface data). For further information about raw reanalysis data, see Supporting Information. The modelled domain was divided vertically into 35 eta levels. All simulations were run with a spin-up time of 24 hr. Three nested domains were simulated with a 3:1 ratio between them, reaching a resolution of 3 km in the innermost domain (85 × 74 grid points) covering the region spanned by the flights (Figure 1). The two WRF runs using the two forcing data sets are named throughout as WRF-ECMWF (the simulation forced by the ERA-Interim) and WRF-CFSR (the simulation forced by the NCEP-CFSR

| Aircraft platform
The Sky Arrow ERA platform was used for all flights in the study: it carried a mobile flux platform (MFP) capable of measuring three-dimensional wind and turbulence along with H 2 O and CO 2 densities as well as other atmospheric parameters (Gioli et al., 2006). Wind measurements were made with the Best Available Turbulence (BAT) probe (Crawford and Dobosy, 1992), whose nine-hole hemispheric pressure sphere measured the velocity of the air with Yonsei University scheme (Hong et al., 2006) respect to the aircraft. The actual wind components (u, v and w) were then evaluated by removing the threedimensional aircraft motion. The latter is measured by global positioning satellite and inertial navigation system units coupled to accelerometers covering both low-and highfrequency velocity variations over all the aircraft's six degrees of freedom. For a description of the MFP data processing and calibration procedure, see Vellinga et al. (2013). Finally, the aircraft also mounted a laser altimeter (LD-90, RIEGL Laser Measurement Systems, Horn, Austria) to measure the flight altitude above the ground.

| Study area and aircraft measurements
The study area covers parts of the Tuscany and Lazio regions, containing 240 km of flight tracks that were  designed to be representative of the whole area in terms of land use, orography and meteorological conditions ( Figure 1b). The region's climate is typically Mediterranean, with mean annual rainfall between 700 and 1,000 mm and average annual temperature around 14-16 C. According to the Corine Land Cover 2006 classification (ISPRA, 2010) the flight area is dominated by forest and agricultural land. The orography is generally flat on the coast and becomes more variable moving from the southwest to the northeast inner area, reaching a peak altitude of 600 masl, while the final inland part is moderately hilly (Gioli et al., 2014). The experimental plan for the flights, originally designed for measuring carbon fluxes at a regional scale for the CARBIUS project (Maselli et al., 2010), was based on a set of IOPs in different seasons of the year, over a repetitive path and in different times of the day, therefore sampling both spatial and temporal variability. The flights were made between July 2004 and December 2005 for a total of 16 IOPs (Table 2). Flights were made well within the PBL, with altitudes recorded by the laser altimeter from 59 to 346 m, with an average of 135 ± 40 m. The data set was partitioned by season and by land characteristics to assess the error sources between model and aircraft data across different temporal and spatial patterns. Three main flight sections were defined: (1) a coastal area (characterized mainly by flat agricultural areas and small spots of forested areas with a slightly higher elevation); (2) a more pronounced orography covered almost exclusively by forests and semi-natural lands; and (3) a section mainly characterized by both agricultural areas, forests and semi-natural areas, and a relevant orography ( Figure 1b).
The IOPs were also divided by season: winter (December-February), spring (March-June), summer (July-September) and autumn (October-November).

| Data: Model comparisons
Aircraft and model data were processed to account for the different spatial and temporal resolution using the following workflow.

| Aircraft and model data match
Aircraft wind data were collected at 50 Hz, which corresponds to a horizontal resolution of about 80 cm considering an average aircraft speed of 40 mÁs −1 . Data were spatially averaged into 1.5 km segments to be comparable with the resolution of the innermost WRF domain (3 km). Each flight segment was then associated with the closest WRF grid cell. Similarly, temporal alignment was obtained by matching the UTC time of each flight segment with the temporally closest WRF timestamp. The WRF altitude data were converted to geopotential height (height above mean sea level-AMSL) to be comparable with aircraft data.
Wind speed was calculated (for both the WRF and aircraft data) from u, v and w velocity components. Since the WRF employs a staggered grid where the u, v and w components are located on cells' borders, they were reported to the cells' centre point via a destaggering process. This allowed a unique position for each cell to be obtained, which was then used to perform the modelaircraft matching.

| Measures of goodness of fit
Several metrics were computed to assess the agreement between model and aircraft wind speed data (following Santos-Alamillos et al., 2013): Root mean square error RMSE ð Þ: Relative standard deviation RELSD ð Þ : Bias BIAS ð Þ: Relative bias RELBIAS ð Þ: where N is the number of values; and m i and o i are the modelled and observed wind speeds. Wind direction, which is a circular variable, was assessed by computing the Pearson correlation coefficient (5) between wind direction distribution frequencies: where ρ wd is the wind direction frequency correlation; and H m and H o are the wind direction frequencies for the model and observation, respectively. Wind direction histograms were computed using 36 bins, each 10 wide. Essentially, this represents a frequency correlation smoothed on 10 ´b ins. This frequency correlation allowed the strength of the relationship to be analysed in a trend-wise term when accounting for data circularity.

| Data overview and overall comparisons
An overview of the simulations' results over all the IOPs combined is presented in Figure 2 for both wind speed ( Figure 2a) and wind direction (Figure 2b). The average wind speed measured by the aircraft over all the IOPs is 4.18 ± 0.02 mÁs −1 (95% confidence interval of the mean), while both the WRF-ECMWF and the WRF-CFSR report comparable, but higher, means (4.51 ± 0.02 and 4.71 ± 0.02 mÁs −1 , respectively). Overall, the WRF outputs (for the innermost 3 km domain) show good agreement with the aircraft data (Table 3) for the estimation of wind speed (average RMSE with the two forcings = 2.22 mÁs −1 ). Frequency distributions of wind direction data reveal non-univocal differences between observations and model outputs. Aircraft wind direction (Figure 2b) shows three main peaks: one in the northnortheast sector (between 15 and 45 ), another in the southern sector (between 165 and 195 ), and finally a smaller one between 255 and 270 . Model data tend to follow the same peaks: northerly peaks are higher than the aircraft's (average relative frequency across models = 7.17% ± 0.05% versus 5.23% ± 0.65% for the aircraft), while the second peak has a more complex composition. Between 165 and 180 the WRF-ECMWF has higher frequencies than the aircraft (5.88% ± 0.70% versus 5.48% ± 0.28%), while the WRF-CFSR shows lower frequencies. In the 195 bin, the WRF-ECMWF shows lower frequencies, while the WRF-CFSR overshoots. The third peak is where model data follow more closely the aircraft frequencies with only a slight over-or underestimation.
In general, both the WRF-ECMWF and the WRF-CFSR are quite able to follow the frequency trend of the observed data. The histogram correlation co-efficients for 10 binned wind direction are 0.84 for the WRF-ECMWF and 0.81 for the WRF-CFSR.

| Data partitioning: Sections and seasons
Statistics for the data partitioning described in Section 2.4 (for both wind speed and direction) are reported in the WRF tends to overestimate wind speed regardless of the used forcing and shows a marked drop in wind direction correlation in geographical section 2, which has the higher terrain elevation. The WRF-ECMWF and WRF-CFSR are quite comparable in terms of the RMSE, with the WRF-CFSR showing smaller RMSEs both when partitioned over different spatial sections and in different seasons (with the exception of autumn). Instead, the WRF-ECMWF shows smaller biases (with the exception of spring).
For wind direction, a seasonal effect is present: the frequency correlation decreases in summer and even more in autumn for both forcings, suggesting a mesoscale systematic effect affecting circulations.

| Temporal partitioning: Diurnal courses
The capability of the different forcings to reproduce wind speed variability over time is assessed by exploring the strength of the linear correlation (expressed as the square of the Pearson correlation co-efficient, r 2 ) between model data and observations' hourly diurnal courses (Figure 3) divided by season and flight section. Overall, the WRF-CFSR is better correlated with aircraft data, with the notable exception of section 2 (Figure 3b, e, h and k), where the WRF-ECMWF forcing has a higher r 2 in all seasons. Section 2 is also the area where the lowest r 2 were registered in autumn, when values for all the models drop very close to zero.

| Spatial Partitioning: Acrosssections analysis
A further spatial analysis was made by computing the cumulative flight covered distance from a fixed origin, and averaging wind speed along such distance on both geographical sections and seasons (Figures 4-7). Both the WRF internal DEM and ASTER DEM profiles were shown to assess the elevation smoothing in the coarser WRF internal DEM. Furthermore, the r 2 and RMSE were calculated per each section and season.
All data sets show a decrease in performance in summer (with r 2 always < 0.38), and section 3 reported the lowest r 2 for all data sets. In summer, the WRF with both T A B L E 3 Summary of statistics for data partitions.

WRF-ECMWF Section Season
Wind speed forcings tends to overestimate the measurements along the whole track, hinting at a spatial-independent seasonal effect. The different seasonal performances in section 1 (i.e. coastal section) can also be explained with the existence of a breeze regime, which is investigated in Figure 8a-d, where the mean RMSEs for wind speed are  Figure 8 shows a non-univocal trend for the error of wind magnitude estimation when moving away from the coastline, which changes between forcings and seasons. Finally, the significant discrepancies highlighted between the WRF model and measurements in section 3 (Figures 4-7), where there is a higher terrain elevation, are investigated in Figure 9. It compares the 20 km radius variance in the DEM with wind speed RMSEs for the different forcings of the WRF, highlighting, as per the coastline distance (Figure 8), a non-univocal trend between the RMSEs and the variation in elevation. Variance of the DEM and wind speed RMSEs is been binned into eight equally spaced bins. The highest RMSEs are found in correspondence with high DEM variances, indicating that a terrain with significant changes in orography affects the accuracy of wind speed computation.

| DISCUSSION
Among the two different reanalyses used in the study to drive WRF simulations, the ECMWF ERA-Interim has a comparable performance with the CFSR despite a coarser spatial resolution. A better performance of the ERA-Interim for wind speed estimation in comparison with ground observations was instead found by Lindsay et al. (2014). Such a reanalysis showed better performance at intermediate wind speeds (between 4 and 12 mÁs −1 ), and the presence of a small counter clockwise rotation (Carvalho et al., 2014). Bao and Zhang (2013) also compared the ERA-Interim and CFSR against independent soundings on the Tibetan Plateau and found a comparable RMSE between the two reanalyses and the soundings, corroborating the present findings. Yver et al. (2013) compared the ECMWF ERA-Interim and the NCEP-North America Model (NAM) data set, which is not a reanalysis but a forecast at 12 km resolution. Therefore, they compared a data set with a lower spatial resolution but higher accuracy (ECMWF ERA-Interim) with a more resolute but less accurate forecast (NCEP-NAM), and found that the ECMWF reanalysis better simulated wind at various ground stations. The results presented herein are in accordance with the cited literature: the WRF-ECMWF and WRF-CFSR both reported overall relatively low RMSEs for wind speed (Figure 2a) and good frequency correlations for wind direction (Table 3).
A further factor that could affect the errors between the model and the measurements could be the selection  Yver et al. (2013) tested both different initial and boundary conditions and also different PBL schemes, concluding that the difference between the two initial and boundary conditions (ERA-Interim and NCEP-NAM) was larger than the differences between the various PBL schemes and, therefore, that the choice of the forcing had a greater impact on the simulated wind than the choice of the PBL scheme. Therefore, it is possible to assume that while parameterization schemes play a role in the model outputs, their effect is overwhelmed by the selection of the reanalysis. Figure 3 shows that WRF reproduces the spatialized diurnal courses for wind speed with good accuracy with a few exceptions. These findings are particularly useful since, when the reanalysis was compared "as is" with flux tower data (Decker et al., 2012), no good correlations in wind speed between the towers and the data set were found on a diurnal time scale. Apparently, by employing a dynamical downscaling and a comparison with spatialized aircraft data, new information may be drawn by reanalysis data. Still a marked drop in wind speed r 2 occurs in all sections in summer and particularly in section 3, and in section 2 in autumn (when considering the daily courses Figure 3). This behaviour in summer likely reveals the presence of local thermal effects. Especially during warm periods, in fact, the complex orography may affect wind speed daily courses because of the formation of anabatic/katabatic effects (where the air mass close to the elevated ground is warmer/colder than the free air above (Papanastasiou et al., 2010).
Discrepancies in wind magnitude estimation do exist also in section 1 in parts of the track where there is no relevant terrain elevation (Figures 1b and 4-7). These discrepancies may be ascribed to breeze regimes: the misrepresentation of wind speed magnitude during a sea breeze with the WRF was reported, among others, by Hernández-Ceballos et al. (2013) while simulating breeze conditions in the Guadalquivir Valley. The aircraft data set presented here was used by Gioli et al. (2014) to validate a modelling chain using the WRF (Non-Hydrostatic Mesoscale Model v.2.1 initialized with the NCEP-GFS) and CALMET (Scire et al., 2000), and a similar influence of the breeze regime was also reported. In that work, the largest differences were observed in coastal areas in summer where breeze regimes develop consistently. A potential effect of breeze regimes on the wind speed simulated by the WRF is shown in Figure 8, where the wind speed RMSE was calculated along a transect from the coast to inland. In both winter and summer, both the WRF-ECMWF and the WRF-CFSR RMSEs wind patterns tend to increase and then decrease proceeding inland, likely representing the breeze penetration and confirming the presence of a winter sea breeze over the Tyrrhenian coasts (Ferretti et al., 2003). A similar trend is identified by the WRF-CFSR in August, while the WRF-ECMWF shows a continuous decrease of the RMSE with distance from the coast. The spring pattern instead, where model data tend to underestimate measurements in section 1 (Figure 5), shows a general increase of the RMSE with an increase in the distance from the coast.
While in section 2 in winter both forcings exhibit a moderate r 2 between elevation and wind speed (WRF-ECMWF r 2 = 0.33, WRF-CFSR r 2 = 0.36), various situations show a decoupling between the measured and modelled wind speeds, such as some mountainous areas where the r 2 for both forcings was < 0.1. These effects may be driven by the WRF's DEM orographic smoothing impacting wind speed magnitude (Carvalho et al., 2014): modelled data cannot quite "follow" the changes in velocity over the track because of the interactions between orography and air movement (e.g. changes in Froude number). When the interaction with orography results in an abatement of wind intensity, the model tends to overestimate such abatement (Figures 4, 6 and  7); conversely, when the measurements report a peak in intensity, the model tends to underestimate such increases ( Figure 5). Figure 9 shows the relationship between the RMSE of the modelled versus the observed wind speed and five classes of DEM variance (calculated in a 20 km buffer around the selected DEM point): above a certain threshold (around 3,000 m), the RMSE tends to increase with variance, hinting to the aforementioned orographic effects. A similar effect is visible also for wind direction where both the WRF forcings show the highest drop in frequency correlation in section 2, which has the most significant orography. Employing finer subgridscale parameterization schemes could, in fact, improve the biases and errors because of the complex terrain (Yang and Duan, 2016).
Besides the effect of terrain elevation, different land uses may also affect the bias between modelled and measured data. The WRF default land use is based on a static data set derived from the MODIS, with fixed values for parameters such as roughness height, leaf area index and so on. Land-use categories are then aggregated to the domain resolution from their original 30 arcsec horizontal resolution. Given the limited number of land-use classes (see Section 2.1), their spatial aggregation and the fact that land use changes dynamically in reality (European Environmental Agency Report No. 10/2017 2 ), this may affect the model's performance. Li et al. (2020), for example, forced the WRF with a new land-use data set based on the latest Global Land Cover Data set (GLC2015) at 300 m resolution and observed an improvement in the simulation of heat fluxes, wind and temperature at the surface. Similar results were also obtained by Nahian et al. (2020) when improving the land-use and topographical data set, highlighting the importance of considering land use as a physical driver of the biosphere atmosphere interaction rather than a static variable.

| CONCLUSIONS
In this study, the weather research and forecast (WRF) model was initialized with different forcings (European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim and National Centers for Environmental Predictions (NCEP) Climate Forecast System Reanalysis (CFSR)) on a domain that included both coastal areas and inland areas, and various degrees of terrain elevation, and simulated wind fields were compared with high-resolution aircraft data along a fixed route spanning multiple seasons. Both forcings showed similar trends and were generally well capable of reproducing the spatialized observations. Wind speed showed overall good agreement with relatively small root mean square errors (RMSEs). Wind direction also showed a good frequency correlation, but while the trend was well captured, it showed the greatest fluctuations around the actual values. While there were certain situations (i.e. sections or seasons or daily courses) where one forcing showed marginally better statistics, no evidence was found that the WRF-ECMWF was overall always better than the WRF-CFSR.
While analysing spatialized data in different intensive observation periods (IOPs) across the various seasons, it was found that the model's performance was linked to the combination of both temporal and spatial effects, such as the presence of orography and the development of sea breeze regimes, that are convoluted in a complex manner. This poses a clear caveat when validating models only against point-based observations, since both their position and the chosen period may affect the model's performance. Aircraft data provide an additional