Methodology for obtaining wind gusts using Doppler lidar

A new methodology is proposed for scaling Doppler lidar observations of wind gusts to make them comparable with those observed at a meteorological mast. Doppler lidars can then be used to measure wind gusts in regions and heights where traditional meteorological mast measurements are not available. This novel method also provides estimates for wind gusts at arbitrary gust durations, including those shorter than the temporal resolution of the Doppler lidar measurements. The input parameters for the scaling method are the measured wind‐gust speed as well as the mean and standard deviation of the horizontal wind speed. The method was tested using WindCube V2 Doppler lidar measurements taken next to a 100 m high meteorological mast. It is shown that the method can provide realistic Doppler lidar estimates of the gust factor, i.e. the ratio of the wind‐gust speed to the mean wind speed. The method reduced the bias in the Doppler lidar gust factors from 0.07 to 0.03 and can be improved further to reduce the bias by using a realistic estimate of turbulence. Wind gust measurements are often prone to outliers in the time series, because they represent the maximum of a (moving‐averaged) horizontal wind speed. To assure the data quality in this study, we applied a filtering technique based on spike detection to remove possible outliers in the Doppler lidar data. We found that the spike detection‐removal method clearly improved the wind‐gust measurements, both with and without the scaling method. Spike detection also outperformed the traditional Doppler lidar quality assurance method based on carrier‐to‐noise ratio, by removing additional unrealistic outliers present in the time series.


Introduction
Wind gusts are typically used as the main weather parameter in the assessment of wind-induced damage (e.g. Pasztor et al., 2015;Jung et al., 2016). Therefore, accurate wind-gust forecasts will enhance preparedness planning, with, for example, rescue services and power companies able to allocate resources when strong, damaging gusts are expected. Numerical weather prediction models do not resolve wind gusts and gust forecasts are based on parametrizations. These parametrizations have typically been developed for a reference height of 10 m, which is the standard measurement height for surface winds (e.g. Brasseur, 2001;Woetmann Nielsen and Petersen, 2001;Wichers Schreur and Geertsema, 2008). However, the measured gustiness at the reference height is not always representative of the surrounding area because of the spatial variation in the aerodynamic roughness of the surrounding environment, which can have a significant impact on wind gusts (Wieringa, 1973;Suomi et al., 2013Suomi et al., , 2015. Hence, a fair comparison of model gust forecasts and observations is challenging, because the roughness in the model grid cell may differ from the conditions at the measurement site (e.g. Wieringa, 1986;Vihma and Savijärvi, 1991;Bou-Zeid et al., 2007). For this reason, Wieringa (1986Wieringa ( , 1996 and Verkaik (2000) proposed methods for deriving representative wind measurements at nonideal weather stations based on extrapolation of the wind profile up to altitudes where the wind is representative of larger horizontal scales; Wieringa (1986) suggested 50-100 m above the surface. Forecast validation with measurements at altitudes above the roughness sublayer, or even above the surface layer, is therefore sought, as this would enable direct verification. Doppler lidars can potentially provide wind-gust measurements from these heights, enabling direct validation of wind-gust forecasts.
Wind gust measurements have traditionally been available only from weather stations and meteorological masts where the wind can be measured at high temporal resolution (>1 Hz). Hence, continuous wind-gust time series are usually only available at the standard 10 m measurement height and at altitudes reachable by meteorological masts (up to about 300 m). Recently, Suomi et al. (2016) developed a methodology to derive gusts based on research aircraft data to measure gusts from various heights, but these datasets are typically limited to short-term measurement campaigns and are therefore not suitable for operational model evaluation. Tall weather masts are rather sparse, so another possibility for obtaining continuous wind-gust records above the standard weather station height is Doppler lidar. Certain Doppler lidar instrument versions are capable of measuring to altitudes above tall masts, but the challenge is in measuring the high-frequency changes in wind speed. The main aim of this study is to develop a methodology for scaling wind gusts derived from the Doppler lidar wind-speed distribution, so that they are comparable with the standard wind-gust measurements obtained from meteorological masts and weather stations. Then, the Doppler lidar technique can be applied to regions and heights where traditional meteorological measurements do not reach.
Doppler lidar technology for measuring wind has matured rapidly in recent years, with the first small (<100 kg) commercial wind lidars becoming available less than a decade ago (Emeis et al., 2007). Profiles of the mean wind speed can be measured effectively and accurately at high vertical resolution within the boundary layer and up to a couple of kilometres in altitude, depending on the weather situation.
To measure the 3D wind vector requires information from at least three different lines of sight pointing towards different directions (e.g. Lane et al., 2013). The instrument sensitivity depends on the amount of aerosol present and the velocity measurement uncertainty is related to the strength of the backscattered signal (Pearson et al., 2009). It typically takes a second or more to measure each line of sight with sufficient sensitivity and therefore the temporal resolution of the wind measurement is often of the order of tens of seconds, which is not sufficient for gusts (e.g. Suomi et al., 2015). However, Doppler lidars can also provide high-resolution turbulent measurements, in both the vertical direction (O'Connor et al., 2010) and, potentially, the horizontal direction (Vakkari et al., 2015).
In this study, we will first investigate how the wind-speed maxima measured by a Doppler lidar compare with those measured at a meteorological mast and then introduce a new method to scale the Doppler lidar measurements to provide estimates for short-duration (1-19 s) wind-speed maxima. Finally, we will show that the Doppler lidar can also provide good wind-gust observations above the mast measurement heights. We use a commercial short-range pulsed Doppler lidar, which is commonly used in wind-energy applications and has already been shown to be applicable for measuring extreme winds (Sathe et al., 2011). The advantage of this lidar type is the relatively high temporal resolution of the measurements, which is very important for gusts. In section 2, we introduce the data, which include coincident meteorological mast and Doppler lidar measurements from the western coast of Denmark, and describe the data processing steps, including the methods to assure and assess the quality of lidar data. In section 3, the gust measurements from the lidar are compared with those from the meteorological mast and in section 4 we introduce a new method to scale the lidar measurements so that they match the wind gusts that would be measured by in situ measurement systems. The scaling method can be applied to derive gusts with different durations, including gusts for durations that are shorter than the temporal resolution of the lidar measurements. The resulting scaled Doppler lidar wind gusts are compared with the mast observations in section 5, together with a discussion on the applicability of the profile of the scaled lidar wind-gust measurements above the mast measurement levels. In section 5, we conclude with a summary of the main results and provide suggestions for further development of the method.

Measurements and data processing
Doppler lidar and meteorological mast measurements were collected at the Danish National Test Station for Large Wind Turbines, located at Høvsøre, near the western coast of Denmark ( Figure 1). A thorough description of the site and its instrumentation is provided by Peña et al. (2015). The long-term wind-gust conditions at this site have been investigated by Suomi et al. (2015). Here, the study period covers 2 days, 10 and 11 October 2015, during which there were easterly winds of 4-10 m s −1 at the 100 m level and the surface layer stability underwent a clear diurnal cycle: unstable conditions during the day and stable conditions at night. The first day selected exhibits ideal conditions for the Doppler lidar, with good sensitivity up to 250 m or more. The second day was more challenging, with precipitation present and reduced sensitivity, but this provided an opportunity to investigate the impact of data with larger uncertainties, test the post-processing applied to the Doppler lidar data and check the data-quality assessment. We now briefly present the lidar and sonic anemometers used in this study and describe the data post-processing applied to both sets of instrumentation.

Doppler lidar measurements
The Doppler lidar instrument used in this study is a Windcube V2 Doppler lidar manufactured by Leosphere. It was operated next to the meteorological mast at Høvsøre (Figure 1). The lidar measures radial wind velocities along four inclined and one vertical line of sight. The four inclined measuring beams are at φ = 28 • from zenith, with azimuth angles at 90 • relative to each other. The radial wind-velocity measurements were taken along each line of sight at ten levels (40, 60, 76, 80, 100, 116, 160, 200, 250 and 290 m). One scan sequence is defined as a set of sequential measurements covering all five lines of sight, which takes about 3.8 s. We will consider this as the resolution of the lidar wind measurements, i.e. the radial wind velocity vector (V r ) is updated once each scan sequence is completed. The radial velocities ( where the superscript 'T' denotes the transpose) using the methodology from Päschke et al. (2015). For this instrument, the rotation matrix A from geographic coordinates to radial velocities (V r = Av) includes five lines of sight where the first four rows represent the four inclined lines of sight with a zenith angle φ = 28 • and azimuth angle at 90 • . The fifth row represents the vertical beam for which φ = 0 • . According to Päschke et al. (2015), the geographic wind components can be derived using the matrix operation where A + denotes the Moore-Penrose pseudoinverse of A because the system is overdetermined, i.e. there are more rows (5) in the matrix A (Eq. (1)) than required (3) to solve the wind velocity components [v 1 , v 2 , v 3 ]. The resulting horizontal wind components in geographic coordinates are for the west-to-east and south-to-north components, respectively.
The horizontal wind velocity is then u L = v 2 1 + v 2 2 . After the coordinate transformation, the dataset was divided into 10 min periods; for each period, the mean horizontal wind speed (U L ), wind direction (θ L ), standard deviation of the wind speed (σ u L ), wind-gust speed (U max,L ) and gust factor were calculated. The subscript 'L' refers to lidar measurements. The gust factor is the ratio of the wind-gust speed to the mean wind speed and is the most commonly used measure of gustiness.
Since U max,L was calculated as the maximum of the horizontal wind speed u L , this means that the wind gust corresponds roughly to a wind gust with a duration (t g ) equal to the temporal resolution of the measurement ( t = 3.8 s for this Doppler lidar instrument). To compare lidar gusts with sonic anemometer gusts in section 4, we also calculated gusts with varying durations by applying a moving average to the measured horizontal wind-speed time series before taking the maximum. The moving average was taken over n measurements, which corresponds to a gust duration of t g = n t. In this study, we allowed n to vary from 2-8, which means that t g varied from 7.6-30.4 s.

Doppler lidar data quality
Doppler lidar data quality can be quantified in terms of the lidar carrier-to-noise ratio (CNR), a measure of the relative strength of the heterodyne backscattered Doppler signal over the inherent unavoidable noise level of the detection chain. The factors affecting CNR are related to the characteristics of the lidar and the atmosphere. For a pulsed wind lidar, such as the one used in this study, the CNR is proportional to the aerosol crosssectional area and, at the longest measurement distances, inversely proportional to the square of the measurement range. Therefore, the wind-data availability decreases with measurement range when using a constant CNR value to filter the measurements. The four most significant atmospheric factors influencing the wind lidar performance are aerosol backscatter, atmospheric refractive turbulence, relative humidity and precipitation. When the air is clean (low concentrations of aerosols), the retrieved wind data from power spectra measurements are associated with generally lower CNR values.
The uncertainty in the radial Doppler velocities along each line of sight is obtained from the associated CNR value for each measurement volume, following the methodology from O' Connor et al. (2010). The resulting error estimate σ e as a function of CNR is presented in Figure 2. Typically, a threshold of −22 or −23 dB is used as a limit for the accepted uncertainty in the lidar measurements (e.g. Gryning et al., 2016), which corresponds to an uncertainty of about 0.15 m s −1 . The uncertainty is calculated for each radial velocity component separately and propagated into the geographic wind components calculated using the formulations from Päschke et al. (2015). The error in the radial winds is represented by a diagonal [5 × 5] matrix C V r V r with the (squared) errors on its diagonal [σ 2 e0 , σ 2 e90 , σ 2 e180 , σ 2 e270 , σ 2 eZ ]. The errors in the geographic coordinates C vv are then provided by from which the errors in v 1 , v 2 and v 3 are obtained from the diagonal components of C vv . For the horizontal wind components, they are and the error of the horizontal wind is σ e,u L = σ 2 e,v 1 + σ 2 e,v 2 . Thus, the error in the wind-gust speed σ e,U max,L is the error in the horizontal wind speed at the time of the maximum gust. The error propagation into the mean wind speed (U L ) is calculated in terms of the root-mean-squared error (RMSE): where N is the number of observations in a sample. Manninen et al. (2016) have shown, however, that an instrument's internal calculation of CNR is not always correct, due to issues in calculating a reliable 'background' value, so that post-processing of the instrument data may be required. For the instrument used in this study, the background value used to determine CNR is calculated on a profile-by-profile basis; therefore there are occasional profiles with unrealistic CNR values. This has little impact when calculating mean wind profiles from a large set of samples, but is crucial when deriving the wind gust from a time series of wind values, since the gustfactor value is highly dependent on the uncertainty of a single wind measurement in the time series (Eq. (5)). When the signal strength is very weak (low CNR), velocity estimates derived from the Doppler shift are dominated by random noise and thus subject to estimation errors (the wind estimate can have any value within 0-60 m s −1 for this instrument); the resulting spikes in the signal lead to unrealistically high gust factors.
To mitigate this impact, we apply an approach called despiking to remove unrealistically high u L values. It is usually applied as standard to turbulence measurements from sonic anemometers, in order to detect malfunctioning of the instrument (Højstrup, 1993;Vickers and Mahrt, 1997;Floors, 2013). Here, despiking was implemented following the recommendation by Højstrup (1993). It is based on two-point statistics, using the previous data point (i − 1) to predict the next data point in the time series as where cov(τ ) is the autocovariance with a time lag τ equal to the resolution of the measurements (τ = t) andū L,i is the observed mean wind speed. Højstrup (1993) calculated the mean and the two-point correlation (cov(τ )) using a memory size concept, which allows automated operational detection of spikes without the requirement of a large memory size for the dataprocessing system. Here, we are not limited by the recording nor by the data-processing systems, because we are applying the spike detection to data that have already been collected. Therefore we calculate cov(τ ) andū L,i using a fixed number of data points, which is the last N = 100 values of the time series. Based on artificial turbulence data, Højstrup (1993) tested different memory sizes from 10-10 000 data points. With a 10 Hz sampling frequency, these represent time-scales from 1-1000 s. Our fixed N = 100 corresponds to 380 s (the lidar measuring interval being t = 3.8 s) and thereby fits into the range of values discussed by Højstrup (1993). When the prediction for the next value in a time series is obtained, a possible spike is detected using the criterion where || refers to the absolute value, σ i is the standard deviation of the last N observations and C spike is the threshold for the spike detection. Following Højstrup (1993), we assume a Gaussian distribution of the difference u fcst,i − u L,i and, based on the results, we applied a threshold of C spike = 3.5 to detect spikes. This threshold corresponds to a probability of about 5 × 10 −5 . The same threshold was also used by Vickers and Mahrt (1997) and Floors (2013) and it also fits within the range of discrimination levels provided by Højstrup (1993), which was 3.3-4.9, associated with detection levels from 10 −6 -10 −3 . The spike detection was performed as a moving window over the whole data set. All detected spikes were replaced by linear interpolation using the neighbouring non-spike values. Similarly to Vickers and Mahrt (1997) and Floors (2013), we repeated the spike detection-removal procedure until no more spikes were found and after each iteration the threshold C spike was increased by 0.1, which accounts for the reduced σ i after spike removal. Visual inspection showed that this method removes spikes efficiently when only a few spikes exist, but not once unrealistically high wind-speed values begin to dominate the time series. In this study, a maximum of 29 spikes were detected within a 10 min period of data, which consists of about 156 values. This corresponds to about 19% of data. The effect of the filtering on the results is discussed in section 5.2.

Meteorological mast measurements
Sonic anemometer measurements were available from six levels of the meteorological mast: at 10, 20, 40, 60, 80 and 100 m heights. This provides coincident measurements from the mast and the lidar at four heights: 40, 60, 80 and 100 m. In this study, we consider sonic anemometers as the reference instruments for wind-gust measurements, because they provide wind-speed measurements with a high temporal resolution, here at 20 Hz. Moreover, as will be shown in section 4, sonic anemometers provide wind-gust measurements that fit well with the expected theoretical behaviour of short wind gusts as a function of the gust duration.
To ensure data quality of the sonic anemometer measurements, unphysical values and spikes were detected and replaced by linear interpolation using neighbouring points. Spikes were removed using the same approach as for the lidar data (Eqs (10) and (11)). Data were then divided into 10 min periods and only those periods with more than 99.9% of acceptable data were included in further analysis. Sonic temperature was corrected for crosswind effects and the wind coordinate system was rotated into streamwise coordinates using the double rotation method (e.g. Rebmann et al., 2012).
The mean horizontal wind speed, its standard deviation, wind direction and wind-gust speed were calculated for each 10 min period, similarly to the lidar measurements. The wind-gust speed U max,S was calculated with t g = 3.8 s (corresponding to an average of 76 values) and this will represent the sonic anemometer wind gust throughout the article, unless mentioned otherwise. In section 4, we derive a theoretical method for estimating gusts of different durations from Doppler lidar measurements. To compare the resulting gusts with those from the meteorological mast, we will also calculate gusts from the sonic anemometer data with durations varying in range t g = 1-30 s.
Data from the sonic anemometer at 20 m level were used to derive stability conditions based on the Obukhov length (L). The Obukhov length was calculated as where κ = 0.4 is the von Karman constant, g is the gravitational acceleration and w θ is the kinematic heat flux, with θ the potential temperature and w the vertical velocity. The prime denotes the deviation of a variable from its mean and the overbar denotes the sample mean. u * is the friction velocity calculated as u * = (u w 2 + v w 2 ) 1/4 , where u and v are the fluctuating parts of the wind components along and perpendicular to the mean wind in the horizontal direction, respectively, and w in the vertical. Figure 3 shows the time-height cross-sections of (a) wind-gust speed, (c) mean wind speed and (e) gust factor as measured by lidar. For comparison, the corresponding gust factors from the meteorological mast are presented in Figure 3(f) and the height of the mast is indicated by a white dashed line in panels (a)-(e). The diurnal cycle is clearly seen, with high gust factors during the day and lower gust factors during the night. The gustfactor patterns are very similar for both measuring systems below 100 m (Figures 3(e) and (f)). For example, the peak G values in the transition period on the evening of 10 October near sunset are found in both lidar and mast measurements. This means that, in good signal conditions, the Doppler lidar measures reliable wind gusts and potentially provides good information on the gustiness above the mast heights. Also clear is that the wind gust is more uncertain than the mean wind speed (compare the relative errors shown in Figures 3(b) and (d)); unrealistically high wind gust maxima often occur at lower altitudes than the mean wind speed, even though spikes have been removed. Figure 4 shows the distributions of the gust factor from (a) lidar and (b) sonic anemometers as a function of the relative error based on the CNR from the lidar. Panel (c) shows the difference between (a) and (b), from where we see that lidar systematically overestimates the gust factors (bias = 0.08) compared with the sonic anemometer gust factors calculated with a gust duration of 3.8 s. In Figure 4, we have applied the filter based on spike detection described in section 2.2 to the lidar measurements. This filter removes outliers efficiently and therefore it is recommended for use when measuring wind gusts by a lidar.

Comparison of wind gusts from lidar and meteorological mast
So far, we have shown that Doppler lidar measurements yield wind-gust patterns comparable to those measured by sonic anemometers and, in addition, the lidar can reach above the meteorological mast. However, there exists a positive bias in the lidar gust factor, even after filtering of the outliers (Figure 4(c)). If the aim is to measure gusts comparable to those from a meteorological mast, this bias must be understood and accounted for.

A new scaling methodology for lidar gusts
Standard operational measurements of wind gusts are calculated from high temporal resolution anemometer measurements, with a standard wind-gust duration defined in terms of a 3 s moving average (WMO, 2010). In this section, we present a theoretical approach for scaling measured lidar gusts obtained at a lower temporal resolution and show that they correspond with highresolution measurements from sonic anemometers across a range of gust durations. The previous section highlighted that wind-gust maxima, and hence gust factors, from lidar measurements are overall higher than those from sonic anemometers. There are a number of reasons for this bias. The main reason is that the wind lidar combines measurements from four lines of sight that are separated spatially, whereas the sonic anemometer measurement is a point measurement. Therefore the lidar measurements are effectively providing an average of the spatially distributed wind field. Moreover, we introduced a moving average for sonic anemometer data (averaging over 76 observations, 3.8 s), while the lidar gust represents only one time-averaged value. This means that, compared with sonic anemometer results, lidar gusts are more prone to a single unrealistically high value in the data. Here, we develop a method to estimate wind gusts from lidar measurements as they would be measured by sonic anemometers reliably, to extend observations of wind-speed maxima above a mast measurement height.
The wind gust (U max ) can be expressed in terms of the mean wind speed (U) and a positive fluctuation from it, which is assumed to be proportional to the standard deviation of the horizontal wind speed (σ ). The coefficient of proportionality is called the peak factor g t g , where the subscript t g refers to the gust duration determined by the sampling frequency and/or the moving average window applied to the high-frequency turbulence data in the calculation of the maximum gust. Since these are different for each instrument, we have two equations for U max : where the subscript 'S' refers to sonic anemometers and the subscript 'L' to the Doppler lidar. The scaling will enable the wind-gust speed estimation as it would be measured by an anemometer with a high temporal resolution, i.e. the sonic anemometer U max,S (Eq. (12)) in terms of the parameters available from the lidar in Eq. (13). Therefore, we will start the derivation of this scaling by evaluating the different components of Eqs (12) and (13). Doppler lidar measures the mean wind speed with a high and known accuracy; Floors (2013) and Peña et al. (2013) found good agreement between wind lidar and cup-anemometer measurements at 100 m for a CNR > −22 dB, with agreement deteriorating as the CNR threshold is lowered (as expected from Figure 2). The relationship between the long-term wind speed and the CNR threshold value is further discussed in Gryning et al. (2016). In other words, we can assume U S ≈ U L ≈ U. This assumption gives Next, we will compare the peak factors from lidar and sonic anemometers, i.e. g t g ,L and g t g ,S , respectively. Figure 5 shows the median peak factor as a function of the gust duration for both lidar and sonic anemometers. The observed peak factors are calculated by applying Eqs (12) and (13), with Figure 5(a) shows that lidar and sonic anemometer peak factors match at about t g = 15 s, but there is an overestimation by lidar at shorter gust durations and a small underestimation at longer gust durations. The overestimation is caused by the difference in how each instrument samples the atmospheric turbulence.
The sonic anemometer provides pointwise measurements with a high temporal resolution and thus covers all temporal scales contributing to short gusts, whereas the lidar combines information on short-duration averages of radial wind speed from spatially separated measuring volumes. Thus, the shortest lidar gusts are higher than the respective gusts from the sonic anemometer. Moreover, the higher individual values in the highfrequency part of the lidar signal are reflected in the lidar Doppler velocity standard deviation σ L , causing it to be higher overall than σ S . This in turn leads to lower g t g ,L than g t g ,S at low gust durations. If we scale g t g ,L by σ L /σ S as in Figure 5(b), the peak factors from both data sources agree at gust duration t g ≈ 19 s and longer.
To overcome the mismatch between the median g t g curves from sonic anemometer and lidar measurements for gust durations shorter than 19 s ( Figure 5(b)), we use information about the known theoretical behaviour of the peak factor as a function of the gust duration and thereby force the lidar peak factor curve to follow the sonic anemometer curve for short gust durations. This is illustrated by the red curve in Figure 5. The mathematical description of the scaling of the lidar peak factor is given by g t g ,S = g tg,theory g t g,ref ,theory g t g,ref ,L for t g < t g,ref , g t g ,S = g t g ,L for t g ≥ t g,ref , where g t g ,theory is the theoretical expression for the peak factor and g t g,ref ,L and g t g,ref ,theory are the observed and theoretical peak factor, respectively, corresponding to the gust duration t g,ref , which is the shortest gust duration for which the observed median peak factor curves from the lidar and the sonic anemometer match. In this case, it is t g,ref ≈ 19 s. Theoretically, if the time series are stationary    Figure 5. (a) Median peak factor as a function of gust duration as observed by sonic anemometers (black), lidar (blue) and the theoretical peak factor (red; Eqs (15)) derived from the parallel measurements from lidar and sonic anemometers between 40 and 100 m during 10 October 2015. (b) The same as (a), but the lidar peak factor and the theoretical one are scaled by the ratio σ L /σ S . In both panels, the standard error of the mean is given by the shadowed region underlying the points in each median curve.
and Gaussian, this point should only be a function of the lidar instrument set-up, because the peak factor is the deviation of the gust from the mean normalized by the standard deviation. The normalization makes the time series independent of the local turbulence conditions at the measurement site. However, in real turbulence data the time series is not always stationary or Gaussian, but the method can still be applied using median peak factors as seen in Figure 5. We also found that, in this real turbulence data, t g,ref varies with the Doppler lidar measurement uncertainty (CNR) and also the measurement height. However, based on this dataset only, it is not possible to evaluate the reasons for the dependence of t g,ref on measurement height, because it may be caused by the growing integral length-scale (the time/distance after which the autocorrelation function of the wind speed decreases below e −1 ) of turbulence by height or by the lidar measurement set-up (e.g. the growing horizontal distance between the lidar measuring volumes by height, or by the changes in CNR by height).
The theoretical peak factor g t g ,theory can be derived from statistical considerations (Rice, 1944(Rice, , 1945Beljaars, 1987;Kristensen et al., 1991;Wichers Schreur and Geertsema, 2008;Suomi et al., 2015). The theoretical peak factor equation is where T is the sample length and P the desired probability of a gust in the ensemble of samples. For the median peak factor, it is P = 0.5. τ is the turbulent time-scale, which also determines the effect of the gust duration on the peak factor and is expressed as where S(f ) is the one-sided power spectrum of the horizontal wind speed, for which we used the formulation by Kaimal et al. (1972) with a constant U = 10 m s −1 and z = 10 m. The spectrum is filtered by a function |H(f )| 2 , determined by which represents the moving average filter determining the desired gust length (t g ). Equations (16)-(18) provide an estimate for the peak factor of the filtered turbulence time series, but usually we are interested in the peak factor relative to the true turbulence. Therefore, Eq. (16) must be multiplied by the ratio of the standard deviations of the filtered and true turbulent wind speeds, expressed in terms of the turbulence spectrum: Figure 6. Difference in the gust factor distributions as a function of the relative error (σ e,U max,L /U L ) as in Figure 4(c). Here, Doppler lidar G is derived using the scaling method with the observed standard deviation from (a) lidar measurements (assuming σ S = σ L ) and (b) sonic anemometer measurements. Now we have derived the equations to estimate the peak factor from lidar measurements for any gust duration using a statistical scaling approach. The advantage of using Eq. (15) to scale the lidar peak factors is that they use information about the observed lidar wind-speed maxima of each sample (g t g,ref ,L ) and the scaling coefficient g t g ,theory /g t g,ref ,theory scales that to correspond to the value observed by a sonic anemometer with some defined gust duration t g . Since g t g,ref ,L = (U max,L − U L )/σ L varies from sample to sample, it retains the natural scatter of the peak-factor values as in the original lidar data set. Now that we have derived the expression for the peak factor g t g,S = f (g t g,L ) (Eq. (15)), the wind-gust equation can be written as There is one more component to be estimated, σ S . Turbulence estimation from lidar measurements has received a lot of attention in the literature and a summary is provided by Sathe and Mann (2013). Here, we require a pragmatic and robust method for scaling the Doppler lidar wind gusts independent of meteorological mast measurements. Therefore, we will test the method using the standard deviation of velocity obtained directly from lidar measurements (assuming σ S = σ L ). The resulting gusts will then naturally deviate from those obtained from the meteorological mast. To evaluate the effect of this assumption, we also applied the scaling method using the best possible estimate for σ S , i.e. that from the meteorological mast. This, of course, can only be applied at the mast measurement heights, i.e. here up to 100 m. The evaluation of the assumption is presented at the beginning of the following section, followed by a comparison of mean gust-factor profiles up to 290 m derived independently from Doppler lidar measurements using the scaling method (Eq. (20) with the assumption σ S = σ L ) and up to 100 m based on meteorological mast measurements.

Validation of the scaling method
We now test the scaling method derived in section 4 to measure wind gusts. In Figure 6(a), we applied Eq. (20) with σ S = σ L , i.e. turbulence directly from the lidar measurements. Comparison with Figure 4(c) shows that both the mean error and RMSE have clearly decreased, but there is still an overestimation by the lidar. In Figure 6(b), using σ S observed by sonic anemometers, the positive bias in the gust measurements is reduced (leaving a very small negative bias) and RMSE reduces to 0.04. Hence, this novel scaling method for estimating wind gusts from lidar measurements performs well and demonstrates that the method clearly benefits from a reliable estimation of turbulence (in terms of velocity variance). Figure 7 shows the performance of the scaling method during the two-day period at heights covered by both mast and Doppler lidar. There is a clear overestimation of the gust factor by the original lidar measurements and the overestimation is largest where G is highest, i.e. during turbulent daytime conditions. During early morning and in the evening of 10 October, the gust factors from Doppler lidar and mast measurements compare well even without scaling, whereas the scaling method improves the results most during daytime on 10 October and in the precipitating conditions on 11 October.
So far, our evaluation of the scaling method has been based on the heights where there are coincident Doppler lidar and meteorological mast measurements. The gust factor profiles have been extended above the mast height in Figure 8. For comparison, G values from the mast are shown as dashed lines in (a) and (c). The results are shown separately for 10 and 11 October, to distinguish between non-rainy (10 October) and rainy (11 October) conditions. On 10 October, the estimated lidar wind gust using assumption σ S = σ L provides gust factors that fit the sonic anemometer gust factors exactly in stable conditions, but in unstable and near-neutral conditions the estimated gust factors are slightly overestimated (probably due to the impact of higher turbulence). Above the meteorological mast heights, the question is whether the lidar gust-factor measurements are reliable. Since we do not have reference sonic anemometer measurements above 100 m, we have to use other information to assess the quality of the measurements. In Figures 8(b) and (d), we highlight an error level of 4% in terms of relative error of the wind-gust speed σ e,U max,L /U L as an indicator of data quality. This choice for the acceptable error level is discussed in section 5.2. Here, with a threshold of 4%, Figure 8(b) indicates good-quality measurements at least up to 200 m and potentially even higher.
On 11 October (Figures 8(c) and (d)), the shape of the profiles clearly differs from those on 10 October (Figures 8(a) and (b)). Compared with the sonic anemometer measurements, the scaled Doppler lidar G is slightly high at all mast levels. Doppler lidar G is almost constant up to 160 m, above which it increases strongly. The relative errors are below 4% only near 100 m level; below and above that the errors are larger.
Precipitating conditions pose an additional challenge for obtaining reliable Doppler lidar wind retrievals. Aerosol and cloud droplets are ideal targets for Doppler lidar wind retrievals, as they have negligible terminal fall velocities (< 5 cm s −1 ) and are effective tracers of the air motion, whereas precipitating particles have an appreciable terminal fall velocity. For widespread precipitation that is all falling at similar velocities, there is little impact on the wind retrieval; however, in patchy or evaporating precipitation there could be variations of 5 m s −1 or more in the vertical component of the radial Doppler . Time-height cross-sections of the gust factor from (a) sonic anemometer, (b) Doppler lidar with no scaling, (c) Doppler lidar with scaling using σ L , (d) Doppler lidar with scaling using σ S , (e) difference between sonic anemometer and Doppler lidar with no scaling, (f) difference between sonic anemometer and Doppler lidar with scaling using σ L , (g) difference between sonic anemometer and Doppler lidar with scaling using σ S . The lowest panel shows the same stability index for each 10 min sequence as in Figure 3. velocities measured by each beam within a single scan (i.e. one beam encounters precipitation, another beam in the opposite direction only encounters aerosol), which then propagates through to the wind retrieval. This may be an additional reason for the reduced performance of the scaling method using σ L on 11 October, together with the reduction in sensitivity increasing the uncertainty. Even though the Doppler lidar raw radial measurements are more prone to errors in precipitating conditions, the scaling can still provide reasonable wind-gust estimates after spikes are removed from the wind-speed time series (section 3 and Figure 6; section 5.2). Although there are larger uncertainties in the Doppler lidar wind measurements (Figures 3 and 7), the gust factor is probably representative up to 160 m on 11 October (Figure 8(c)).

Sensitivity tests of the scaling method
In section 2.2, it was shown that wind gusts from a Doppler lidar are sensitive to outliers in the data and that spike removal is effective in improving the quality of the wind-gust measurements. The effectiveness of the spike removal is illustrated in Figure 9, where the G L distribution is calculated from the raw lidar measurements without spike removal. Comparison with Figure 4, where the spikes have been removed, shows a clear impact on the results, especially for relative errors higher than 4%. The bias of the raw lidar data is 0.23 and the RMSE 0.6 ( Figure 9(b)); after spike removal this is reduced to 0.07 and 0.11, respectively (Figure 4(c)). When the spikes are removed, the data quality improves such that, after scaling, it becomes acceptable to include gust factors with relative errors also above 4% ( Figure 6) and hence potentially provide reliable gust-factor profiles from lidar measurements up to 290 m in non-precipitating conditions on 10 October and up to 160 m in precipitating conditions on 11 October (Figure 8).
In addition to spike removal, we also tested filtering based on CNR during the maximum gust. Figure 10 shows the CNR during the maximum gust as a function of the lidar gust factor. The mean of the five radial CNR values is shown in black and the minimum in red. Based on sonic anemometer measurements, all gust factors during this period were smaller than 2 and therefore all G L values exceeding this threshold are erroneous. From Figure 10, we see that there are unrealistically high gust factors at mean CNR values below about −21 to −22 dB and at minimum CNR below −24 dB. Either of these thresholds could be used to filter out unreliable data. However, when the lidar is used to measure gusts operationally, it is easiest to assess data quality without any averaging operations, i.e. without taking the mean of the radial CNR values, and therefore we tried using a threshold based on the minimum CNR. In other words, all v measurements for which any of the five radial CNR values was < −24 dB were flagged. Flagged data were then filtered out of the time series before the gust calculations; the filtering replaced bad values by linear interpolation using neighbouring non-flagged points. We tested the effectiveness of this approach to filter out unreliable data, but found that not all unrealistically high wind-speed (u L ) values were removed, i.e. spikes were still present at CNR values above the threshold. We associate this with occasional issues in the automated calculation of the CNR within the instrument, i.e. an incorrect determination of the instrument noise level generates a CNR profile that is biased high or low. In cases where the CNR was biased high, a constant CNR threshold would not then filter out all potentially unreliable values.

Summary and discussion
We have derived a methodology for scaling Doppler lidar windgust estimates so that they are comparable with those observed by sonic anemometers on a meteorological mast. Thereby, profiles of wind gusts can potentially be measured by Doppler lidars at many more locations without the need for the costly and challenging deployment of a tall meteorological mast. This novel method not only scales the lidar gusts but also provides estimates for wind gusts with variable gust durations, including shorter durations (of the order of a second) that are beyond the limits of the lidar measurement frequency. The input parameters for the scaling method are the wind-gust speed as well as the mean and standard deviation of the horizontal wind speed from the Doppler lidar. The wind-gust speed is calculated as the maximum of the moving-averaged horizontal wind speed. For the WindCube V2 Doppler lidar used in this study, an average over five samples (corresponding to gust duration t g =19 s) was found to be adequate, but this depends on the lidar type and the scanning technique and must be tuned separately for each lidar set-up. As the scaling method is based on peak factors, which represent the maximum turbulent deviations from the mean in the normalized (by its standard deviation) wind-speed time series, the method does not depend on the measurement site, provided that the wind-speed time series is stationary and Gaussian. Instead, the measured (and scaled) wind-gust speeds and gust factors are site-specific, i.e. they depend on the local turbulence conditions determined by the surface roughness and the static stability of the atmosphere (e.g. Suomi et al., 2013Suomi et al., , 2016. Using Doppler lidar data only, the novel scaling method will provide reasonable gust factor estimates, with a small positive bias (0.03) and RMSE of about 0.06, but it is possible to reduce the bias by better estimates of the velocity variance. The added performance of the scaling method was most noticeable in turbulent daytime conditions, but also improved the estimation of gustiness in precipitating conditions. The data quality is crucial for successful wind-gust measurements, both with and without the scaling method. Here we applied a spike-detection method similar to that typically used in sonic anemometer data processing and found that it removes outliers from the data effectively. The spikes were replaced by linear interpolation using neighbouring non-spike values. This  Figure 4; two-dimensional histograms of the gust factor for (a) raw Doppler lidar data and (b) the difference between Doppler lidar and sonic anemometer data as a function of the relative error of the wind-gust speed (σ e,U max,L /U L ). In (b), the bias and RMSE are provided. removal of spikes improves the gust-factor estimation most in cases when only a few outliers exist. When unrealistically high wind-speed values (poor data quality) start to dominate the time series, the performance of the spike detection decreases. We also tested a spike-detection method based on instantaneous CNR values, but it did not remove all unrealistically high wind-speed values. Therefore, our conclusion is that, when using Doppler lidar to measure gusts, better data quality is achieved using a filtering technique based on spike detection and removal than filtering based on instantaneous CNR. Instead, CNR is a good tool for overall data-quality assessment, such as when estimating the relative error of the measurement (O'Connor et al., 2010).
The scaling methodology presented here was developed for one particular lidar type and scanning sequence. To develop this methodology further, the next step is to test the applicability of the method to other lidar types and other scanning sequences, such as conical scans with many more beams. One open question Figure 10. CNR during the wind gust as a function of the gust factor from lidar. CNR is provided as a mean of five radial components (black dots) and as the minumum of five radial components (red). A threshold of CNR = −24 dB is shown as a blue horizontal dashed line. is as follows: what is a sufficient Doppler lidar measurement frequency in order to obtain reliable wind-gust estimates? Also to be examined is the effect of the horizontal variability of the wind on the scaling method. In this study, we concentrated on removing the effect of horizontal variability in the timescales of the lidar scan sequence (3.8 s) and length-scales of the volume between the lidar measuring beams (up to about 300 m). Smoothing the turbulent measurements with a 19 s moving average reduced the temporal and spatial variability in the wind field, so that the lidar measurements matched the sonic anemometer measurements. Moreover, in the theoretical method we used only one formulation for the turbulence spectrum, that by Kaimal et al. (1972) with a fixed mean wind speed and height. This choice provided good results in this study, but the sensitivity of the results to different spectral formulations will be investigated in the future.
Here, the aim was to scale Doppler lidar wind-gust measurements to match the sonic anemometer measurements from a tall meteorological mast. We have shown that this particular lidar instrument provides good wind-gust estimates up to heights of about 160-250 m, typically well above the roughness sublayer and often also above the surface layer. This is exceedingly useful when assessing wind-gust parametrizations in numerical weather prediction models, since we are no longer limited by the mismatch between the roughness at the model grid point and the conditions at the observation site, as is the case for weather stations where the wind measurements are usually made at 10 m reference height. With model evaluation based on observed profiles of gusts, there are now possibilities of developing gust forecast methods further.
In wind-energy applications, it is not only the pointwise measurement of the wind-gust speed that is important for estimating the extreme instantaneous loads on wind turbines. As seen in this study, lidar wind-gust measurements without the scaling method are affected by wind-speed variability on the scale of the volume between the lidar measuring beams. This information could be useful for wind turbine operations and is therefore an aspect that should also be investigated in the future.