Implications of employing detailed urban canopy parameters for mesoscale climate modelling: a comparison between WUDAPT and GIS databases over Vienna, Austria

One of the major obstacles to using numerical weather prediction models for guidance on mitigating urbanization's impact on local and regional climate is the lack of detailed and model ready morphological data at urban scale. The World Urban Database and Access Portal Tool (WUDAPT) is a recent project developed to extract climate relevant information on urban areas, in the form of local climate zones (LCZs), out of remote sensing imagery. This description of the urban landscape has been tested and used for parameterization of different urban canopy models (UCM) for mesoscale studies. As detailed information is usually bounded within cities' centres, crowdsourced and remote sensing data offer the possibility to move beyond the old barriers of urban climate investigations by studying the full range of variation from the urban core to the periphery and its related impacts on local climate. Thus, for this study we sought to compare the relative impact of using the WUDAPT methodology versus a simplified definition of the urban morphology extracted out of detailed GIS information to initialize a regional weather model and compare the output against official and crowdsourced weather station networks. A case study over Vienna, Austria was conducted using the weather research forecasting (WRF) model, coupled with the building effect parameterization and building energy models (BEP–BEM) in five distinct seasonal periods. Results demonstrated that using detailed GIS data to derive morphological descriptions of LCZs for mesoscale studies provided only a marginal overall improvement over using the default WUDAPT parameters based on the ranges proposed by Stewart and Oke (2012). The findings also highlighted the importance of developing techniques that are better at capturing the morphological heterogeneity across the entire urban landscape and thus improve our understandings of UCM performance over urban areas.


Introduction
Urbanization's impacts on the local climate and environment have been studied for nearly two centuries (Howard, 1818;Bornstein, 1968;Oke, 1982;Arnfield, 2003;Souch and Grimmond, 2006). These impacts include phenomena such as the urban heat island, inhabitants' heat stress, air pollution, increased storm water runoff, modified building energy use and intensified thunderstorms over urban areas. As 70% of the world's population is expected to live in urban areas by 2050 (United Nations, 2014), it is critical to better understand and represent these issues. To do so, urban canopy models (UCM) were developed with the perspective of giving quantitative decision support for the mitigation of, and adaptation to, climate change through urban planning and policymaking (Masson, 2000;Kusaka et al., 2001;Martilli, 2002;Järvi et al., 2011;Mauree et al., 2017). However, a major impediment to the use of UCMs is the lack of data on urban morphologies and related local weather data, which is crucial for the implementation and testing of such models (Oke, 2006;Grimmond et al., 2010;Grimmond et al., 2011). This is especially true in developing countries where the most rapid and extensive urban development is taking place.
To bridge this knowledge and data gap on urban structures, the World Urban Database and Access Portal Tool (WUDAPT) project (Ching et al., , 2018Mills et al., 2015), currently at a stage of development defined as Level 0, started mapping urban areas using the local climate zones (LCZ) classification schema. LCZs are defined as areas with relatively homogenous surface cover, materials, morphology and human activity. This classification system was designed to be used globally independent of cultures or local idiosyncrasies. It was created to be used for heat island assessment studies and, as a result, the class properties are all climatically relevant and related to the surface energy balance. The classification comprises 17 typologies, 10 of which are built environments and the other 7 are natural land cover types (Stewart and Oke, 2012). Those classes describe the climate relevant aspects e1242 K. HAMMERBERG et al.  Open mid-rise 0.5-0.  of the urban environment with ranges of values, which are implementable in UCMs (Table 1). Although mean values of those ranges already give a good approximation of the climate impact of heterogeneous urban landscapes (Alexander et al., 2016;Brousse et al., 2016;Wouters et al., 2016), the use of more detailed data sets to improve the definition of those typologies might provide more accurate results. Therefore, this study, which focuses on a central European case in the city of Vienna, makes use of the WUDAPT framework and compares it with a detailed GIS data set in terms of urban morphology mapping and model parameterization capacities. It then examines how a multilayer UCM performs differently by using the urban canopy parameters (UCPs) obtained from the default LCZs versus detailed information from a GIS database as input for the weather research forecasting (WRF) model . The model results are compared against measurements from a network of weather stations throughout the region of interest (ROI). Because few official weather stations are available within the city of Vienna, and even less are available in the suburban areas outside the municipal boundary, crowdsourced weather station networks were also investigated as a potential source for comparison of model performances per LCZs.
The multilayer UCM embedded within the WRF model is the building effect parameterization-building energy model (BEP-BEM) model (Martilli, 2002;. This model represents the effect of urban-scale processes at the mesoscale level by reducing the complex and variable urban landscape to simplified building canyons. The BEP building canyons are described using UCPs, which detail their climate relevant properties with morphological, physical and thermal parameters (Tables 1 and 2). Those parameters also allow the BEM model to evaluate the impact of urban climate on building energy consumption and the feedback of the resulting anthropogenic heat output.

Study area
This study was centred on the city of Vienna, Austria. In terms of urban morphology and geographic location, Vienna is a rather typical central European city. It has a dense central core largely defined by its medieval fortifications, which is surrounded by a ring of districts composed predominately of four to six story Gründerzeit apartment blocks that were built in the mid-19th century. Beyond this central development is an assortment of incorporated historic villages, modern apartment blocks, single-family housing, sprawling big box retail and agricultural land cover. Its municipal boundaries inscribe an area of 414.87 km 2 . and approximately 1.8 million residents (Lukacsy and Fendt, 2015 to 16.74 ∘ E, contains the municipal boundary of Vienna as well as much of the surrounding suburban and rural areas ( Figure 1). It consists of a 3541 km 2 area that contains the continuous urban network from Schwechat to Penzing (east-west) and from Kottingbrunn to Klosterneuburg (south-north), croplands in the northern and eastern part of Vienna, which are predominately small grain crops, and the densely forested Wienerwald to the west. The region is known to be subject to a continental climate and also to strong winds conditions due to its geographical positioning at crossroads between the Alps, the Carpathians and the Alföld.
Vienna was selected due to the availability of both high-resolution geospatial data and weather data records. Additionally, while a good number of Asian and North American cities have been used in WRF modelling studies, relatively few European cities have been examined.

Study period and date selection
In this study, we examined the sensitivity of WRF to the two urban land cover inputs across different weather conditions. Rather than run multiple simulations of season-long or year-long durations, we selected five representative periods each with a duration of 48 h. Although this period could be interpreted as short, the study of different climate conditions allowed for a better test of the model's sensitivity to changes in the morphological parameters independent of climate variability.
The selection of the representative dates was performed using the k-means cluster analysis method. This method sorts a number of observations into k clusters, where k is determined by the user, such that each observation is in the cluster that has the nearest mean. In the first step, hourly weather data provided by Austria's Zentralanstalt für Meteorologie und Geodynamik (ZAMG) for the Schwechat Airport was processed into daily mean values. While this station does not represent urban conditions, the goal for the clustering was to classify general regional weather typologies. It is a well calibrated station sited in an open area that allows for good mixing and less impact of extreme micro-climate effects that could distort the clustering.
After examining the correlation between the measured climatic variables, three relatively uncorrelated variables (R < 0.5) were selected to use as the key indicators for clustering: temperature, diurnal temperature range and wind speed. Using k-means cluster analysis there is no objective or automated method for determining the optimal number of clusters. Therefore, it was necessary to find a balance between two competing criteria: the desire to cover the broadest possible range of unique weather typologies and the need to reduce the overall number of runs. To achieve this balance, a number of clusters from two to ten were tested and the distinctiveness of the resulting groups compared. It was found that beyond five clusters any easily distinguishable differences in the resulting categories were lost. In other words, two or more clusters would share similar mean values of their key indicators. Therefore, we chose to use five weather typologies in this study.
After the clustering, it was necessary to select periods with distinct climatic sequences from each cluster over which to run the WRF model. Simply selecting the periods whose three key indicator values were closest to the mean values of their cluster resulted in a set of five dates e1244 K. HAMMERBERG et al. that, despite the clustering approach, were not very distinct from one another. Therefore we selected 48-h periods in each cluster that had climatic values which were the furthest from the mean values of the other four clusters. This ensured that each selected period represented the qualities that set that cluster apart from the others ( Table 3).
The two coldest periods, January and February, had similar temperature profiles in terms of their diurnal range and exhibiting no clear diurnal cycle. The largest distinction between the two periods was the wind speed. February had a much stronger and consistent wind speed than any of the other study periods. Additionally, while both cold periods were overcast and had occasional snow showers, the February period was clear overnight while the January period remained overcast.
The study periods in March, April and July all exhibited a clear diurnal cycle and got progressively warmer as expected given the seasonal shift. The March period was distinguished by its very low wind speeds and clear skies. This lack of mixing combined with night-time cooling contributes to a large diurnal variation of surface temperatures.

Observational data sources
In this study, we incorporated weather data from three separate sources: the Austrian ZAMG, the city of Vienna's Department for Environmental Protection (MA22) and Wunderground's Personal Weather Station Network (PWSN) (Weather Underground, 2016). The stations from the two former sources have the advantage of being installed and maintained by professional organizations, but they are limited in quantity and spatial coverage. The stations from the later are installed and maintained by amateurs, but it is a large network of many stations, which provides excellent spatial coverage. This coverage is a necessary asset for comparing how UCMs respond to different urban morphologies and heterogeneities of the landscape.
Although they provide inherently more trustworthy data, we had access to only 11 official stations within our area of interest that provided near-surface climatic data. In addition to sparse geographic coverage, only six of those stations were located in urban areas (Table 4). Using only these stations, it would not be possible to assess how the different LCZ descriptions affected WRF predictions across Vienna's varied urban land cover typologies. Therefore, the use of crowdsourced data from the PWSN  seemed to be relevant for this study Muller et al., 2015;Chapman et al., 2017). The Weather Underground maintains a database of weather data voluntarily provided by stations in their PWSN. At the beginning of the study period, January 2015, there were 310 active stations in our area of interest. This number has only continued to grow in the interim, making the use of crowdsourced data for urban climate studies an increasingly attractive option. Even if this number increased during the study period, only those initially active stations with the most complete record were included in the final analysis.
Although anyone with a networked weather station and an internet connection can sign up to be part of the PWSN, the Weather Underground provides guidelines for installation, as do the manufacturers of most weather stations. However, there is no guarantee that these recommendations are followed. Indeed, it was found that a number of stations in a similar crowdsourced network in Berlin seem to have been setup and networked inside and then never installed outside (Meier et al., 2017).
Additionally, there is no standard for the type of hardware used. Of the stations that provide information about their hardware type, the majority (78.8%) are Netatmo stations. The manufacturer claims a temperature sensor accuracy of ±0.3 ∘ C, a humidity sensor accuracy of ±3% and a barometer accuracy of ±1 mbar (Netatmo Weather|Weather Station, 2016). These stations are not ventilated and/or properly shielded so they must be set up where they are not in direct sunlight or they will overheat. Rain and wind gauge accessories are optional and not included with many stations.

Quality control and filtering
With full awareness of the potential for increased measurement error introduced by using amateur weather stations, a series of quality control filters were applied to the PWSN data to remove stations that were reporting erroneous data before analysis. These filter criteria were adapted from Meier et al. (2017). This process reduced the number of stations from over 300 to 150 viable stations ( Figure 2).
As a first pass, all data that exceeded possible physical limits for climatic values measured at the surface of the e1245 Figure 2. Weather station network map. earth were removed. These extreme values typically represent default error values when a station is malfunctioning.
Next, a two-tailed median absolute deviation (double MAD) test was used to identify outliers. MAD was used to prevent the outliers themselves from influencing the selection process and a two-tailed approach was used because some of the variables have skewed distributions. This outlying data was removed and any station that had more than 25% of its data classified as an outlier was removed completely.
The last step was to compare the remaining PWSN stations to the official sources. Each station was compared to either an average of the closest three official stations within 4 km, or in the case there are not three stations within 4 km, the average of all 11 official stations. This step included two filters each designed to target a particular type of error. The first filter was aimed at stations that are operating indoors or in other artificially heated environments. For each month, the average daily minimum temperature was compared to official sources, if a station exceeded the value from the warmest official station by more than 4 K in any month it was removed. The second filter targeted stations that were overheating due to improper radiation shielding and ventilation. The daily maximum temperature reported from each personal weather station was compared to the highest value recorded at any official weather station. Stations were removed if they exceeded that maximum by more than 3 K on more than 10% of days.
Following this filtering process, we compared those PWSN stations that were in close proximity (<1.5 km) to the official sources ( Figure 3). We found that the crowdsourced data followed the major trends of the official data.
With only a few exceptions, the verified data fell in the middle of the range of PWSN data described by ±2 standard deviations of the mean. This range was typically on the order of ±2 K, although July and March showed much larger variations in the PWSN record. For some official stations, the crowdsourced network tended to overestimate the temperature for different reasons. In the case of the Gaudenzdorf station, this difference is mainly explained by the location of the station, which is sited adjacent to a park and the river, compared to the location of the three PWSN stations, which are at the very edge of the distance threshold, closer to the city centre and situated in more built up neighbourhoods. Thus, they are subject to different temperature and energy balance regimes (Figure 4). This hypothesis is supported by the event recorded in July where the range of PWSN values follows the official record e1246 K. HAMMERBERG et al. during the day, but does not cool as much overnight. The ability of the additional stations from the PWSN to capture other micro-climates within the grid cell increases the representativeness of the observational data set and emphasizes the limit of using only the official station network. Also, the distribution of stations by LCZ over represents the denser urban typologies. Not surprisingly, a higher population increases the likelihood of finding a weather station, yet it does introduce a sampling bias that could skew overall results (Table 5).

WUDAPT
The WUDAPT is a state-of-the-art web portal developed to produce maps of cities by describing them in terms of LCZs and offer them freely on-line for urban climate modelling purposes. The methodology adopted by Bechtel and Daneke (2012) and Bechtel et al. (2015) consists of a supervised classification of Landsat imagery using a Random Forest Classification algorithm (Breiman, 2001). First of all, the user draws polygons of relatively uniform land cover in Google Earth, which are representative of a particular LCZ. Those polygons are called training areas (TA). The algorithm considers the distinct pattern of reflected radiation in each spectral band for the selected TAs to classify a ROI pixel by pixel. The pixels are resampled at a 100 m grid size resolution. In this framework, the TAs represent LCZ types over at least 1 km 2 , which are processed in the SAGA GIS program  to map the defined ROI, consisting of the inner WRF domain e1247 Figure 4. Gaudenzdorf station location and relationship to its three reference PWSN stations.
following Martilli et al. (2017). After examination of the mapping with the underlying landscape by the user and an iterative process of manually refining the TAs, a final map is produced. A set of statistical indicators are used to determine if that map sufficiently 'robust' for implementation in models. These statistical indicators consist of two of the most common accuracy indicators used in remote sensing: the overall accuracy (OA) and the Kappa coefficient ( ). They are obtained following a bootstrapping procedure that consists of randomly extracting a part of the TA data set for comparison with the produced map by a confusion matrix. This is done repeatedly 25 times over Vienna. More information can be found on statistical indicators of remote sensing classification accuracy in Congalton (1991). For LCZ classification processes and verifications, Bechtel and Daneke (2012), Bechtel et al. (2015) and Bechtel et al. (2017) or Brousse et al. (2016, appendix A) provide thorough descriptions of the supervised classification's steps. In this study, Landsat 8 scenes from three seasons (winter, spring and summer) were used to compute an LCZ map over Vienna. The urban landscape, which covers 20.87% of the ROI surface, was categorized as follows: 6.65% as compact mid-rise (LCZ 2), 12.29% as open mid-rise (LCZ 5), 36.35% as open low-rise (LCZ 6), 7.51% as large low-rise (LCZ 8) and 37.21% as sparsely built (LCZ 9).

Vienna GIS
The city of Vienna is strongly committed to the Open Government Data initiative (Open Government in Vienna, 2016). As part of this initiative, it has made a massive database of geospatial data available for public use. This includes a high-resolution vector map of all the buildings and land cover in the city. The detailed categories and associated metadata used in this database allow for a very precise calculation of the required UCPs. Using these calculated UCPs, it is then possible to classify an area according to the LCZ classification scheme. Although the geospatial information is highly accurate, the primary drawback of this method is that the detailed information is only available within the municipal boundary of Vienna.
An algorithm was created using the open-source quantum GIS platform (Quantum GIS Development Team, 2016), to use the municipal data to create an alternative LCZ mapping. The algorithm iterated across Vienna on a 100 m grid and at each grid cell used the database to calculate the local UCPs and classify it. For each cell, the vector polygons representing buildings, green space and roads were downloaded from a web feature service (WFS). From these the following morphological parameters were easily derived: average building height weighted by area, a histogram of the height distribution, building e1248 K. HAMMERBERG et al. surface fraction, impervious surface fraction, pervious surface fraction, road width and roof width. The height to width ratio was calculated using a DEM of the area and a four-pass traverse modified from approach of Burian et al. (2003). The classification of the cells was performed with a Naive Bayes classifier (Hand and Yu, 2001) that used the typical ranges for LCZs provided in Stewart and Oke (2012) (Table 1). As a final step, the calculated UCPs for each grid cell were averaged across their respective LCZ class and updated values of UCPs were given for each urban class. Although reducing the gridded data to averaged values reduced the fidelity of the information, the comparison provided a 1 : 1 insight on the accuracy of the WUDAPT remote sensing classification scheme and was more readily ingested by the WRF model.
It should also be noted that at the scale of a 100 m grid, certain features of interest are larger than the scale of analysis. This is particularly true in LCZ 8, where large buildings can occupy an entire grid cell. Here, many of the calculated parameters were outside the ranges provided by Stewart and Oke and others could not even be calculated (e.g. height to width ratio). In these cases, the Naive Bayes classifier struggled to accurately classify the cells. However, this influenced only a small number of cells (0.9%). Still, considering that the classifications are eventually upscaled to the larger WRF grid before the model run, it is recommended that future GIS classification be conducted on a larger grid or, better yet, directly on the final WRF grid.

Classification results and comparison
The WUDAPT classification map showed high reliability through the verification procedure. The OA percentile was 0.83 while the Kappa coefficient was 0.76. The OA urb , which consists of the accuracy for urban classes only, was 0.81. These indicators show a good degree of classification accuracy through the whole map and within the urban area.
The Vienna GIS classification algorithm provided highly accurate gridded data about the urban morphology in Vienna, but only within the municipal boundaries. Within the city boundaries of Vienna, the two mappings procedures presented tangible differences. As an example, major disparities were observed among the distribution of open low-rise (LCZ 6) and sparsely built (LCZ 9) areas at the periphery of the city and differences in spatial coverage close to 50% were noticeable between compact mid-rise (LCZ 2) and open mid-rise (LCZ 5) areas in the central districts (Figures 5 and 6). Compared cell by cell, the two mappings had an agreement of only 67%. The majority of the disagreement was between adjacent and similarly structured classes (i.e. open low rise and sparsely built or compact mid-rise and open mid-rise). Most of the distinctions were related to the degree of spacing between buildings.
After the classification of the gridded data, the distribution of calculated UCPs within each LCZ was compared with the ranges used in WUDAPT (Figure 7). This comparison demonstrated that the median values of average height and urban fraction were consistent with the WUDAPT medians. However, H/W ratio varied significantly. Furthermore, nearly all GIS LCZ classifications included values from a much wider range than that described by Stewart and Oke (2012).

Urban canopy parameterization using different LCZs description
Despite the differences in the mappings described previously, initial calibration of the model and runs made with different mappings for another study over Vienna  demonstrated that the overall performance of the model was more related to changes in the morphological description of the LCZs than to slight changes in the spatial extent of their areas. It was thus assumed that after implementation in the WRF model at 500 m resolution, these spatial disparities would result in only a slight change e1249 Figure 6. LCZ distribution by mapping methodology.
in morphology within the city boundary, a change that can also be represented in the parameterization of the model by changing the UCP values. Moreover, because the inner domain requires the description of the whole surrounding region and the GIS data is only available within the city boundary, it would have been necessary to either aggregate the Vienna GIS mapping with the WUDAPT one for running the model, changing only the classification within the city of Vienna, or to perpetuate assumptions about the morphology outside the city boundary.
Indeed, the urban morphology characterization -in terms of urban fraction, building heights and buildings and roads' widths -was different when using the WUDAPT to WRF (W2W) protocol  versus a gridded mean value calculation of the detailed GIS data set (Table 6 and Figure 8). W2W describes the LCZ landscape by using the mean values of the general ranges proposed by Stewart and Oke (2012), whereas the values derived from the GIS data were calculated from precise and city-specific data.
In general, building heights' variability and canyon widths showed the greatest deviation between the two methods. Although urban fractions were very similar between the two methodologies, notable differences were observed for the open low-rise zone (LCZ 6), which is the most common urban land type in the ROI.
Therefore, it was decided to focus solely on the differences in the LCZs' morphological definition and the WUDAPT mapping was used for the spatial extent in both cases. Two cases were modelled in WRF: the 'WUDAPT' case, based on the W2W protocol and the 'Vienna GIS' case, derived from the GIS data set.
The thermal and physical parameters were derived from Brousse et al. (2016), which were established in Salamanca et al. (2012) and Stewart et al. (2014) (Table 2).

Model setup
The WRF model , with the BEP-BEM (Martilli, 2002; parameterization, under the Bougeaut-Lacarrere at 1.5 ∘ turbulence scheme (Bougeault and Laccarrere, 1989), was chosen for this study for its multilayer urban canopy representation. Thus, differences in the urban morphology are resolved in more detail by this urban model. The simulations were performed using NCEP Final Analysis data at 1 ∘ resolution every 6 h as initial and boundary conditions for each 2-day simulation with a spin-up time of 6 h. The inner domain (Figure 9), with same coverage as the ROI, was composed of 112 × 118 grid cells at a 500 m resolution, of which 11% was composed of urban categories. The rural land use classification was expressed following the default MODIS classification available in WRF. Each simulation was then compared to measurements from the two weather station networks described previously using RMSE and Mean Bias indicators for hourly temperature at 2 m over the 2 day periods.

Results
The modelled temperatures at 2 m height were compared against our network of official stations and crowdsourced PWSN (Figure 10). This network comprise as a whole 150 stations. Overall, the performance of the model showed little discernible difference between the two UCP definitions when compared across all stations and all seasons. The WRF runs using WUDAPT input data consistently exhibited a slightly higher RMSE and mean bias (Table 7). This difference is small (RMSE 0.04 K; MB 0.06 K); however, it is statistically significant (p = 0.001),    so it is unlikely due to chance. Deployment of land use and urban morphology data derived from the W2W protocol produced results only marginally worse than using a highly detailed GIS database. There was no obvious trend observed with regard to seasonal study periods. In some cases, the model was grossly overestimating and, in others, it was underestimating. This allowed for no easy or consistent correction factor or transformation. Both Vienna GIS and WUDAPT cases exhibited nearly identical seasonal and diurnal patterns. Overall, February was the better represented case in terms of RMSE, despite a large deviation during the second night. However, the daily temperature ranges and cyclical variation in July and April were well captured. In general, the model tends to overestimate the temperatures, except in April and during the day in July. Temperatures were highly overestimated by night in March, while daily temperatures were well predicted by the model.
To better understand the patterns that explain those seasonal differences and also to see if there might be a distinction in the spatial distribution of error between Vienna GIS and WUDAPT, the cumulative absolute error was derived for each LCZ (Figure 11). The results were analysed by night, when errors tended to be highest, and aggregated from all five periods. Consistently across the LCZs, only 60% of the hourly values had an absolute error less than or equal to 4 K with the error rising as high as 10 K at some points. There was no clear difference between the LCZs in terms of their error distributions and there were only small differences between the Vienna GIS and WUDAPT runs.
LCZ 6 (open low-rise), which had the higher disagreement in terms of morphological description between the mappings, showed the most notable difference.
For a more complete view of the spatial variation between model parameterizations, difference maps of the mean overnight air temperature were created for each season ( Figure 12). Despite, the lack of variation in cumulative error graphs by LCZ, systematic differences of temperatures between the two urban definitions, going up to 1 K, were detected when examining difference maps. In nearly every case, the centre of the city was colder with the WUDAPT parameterization (LCZ 2 and 5) and the suburban areas (LCZ 6 and 9) were warmer. Only February demonstrated similar temperatures between the two runs.
This pattern could mainly be explained by the radiative trapping effect. Indeed, increasing the density of urban fraction and increasing the H/W ratio in both central areas (LCZ 2 and 5) in the Vienna GIS case, and rural areas e1252 K. HAMMERBERG et al. (LCZ 6 and 9) in the WUDAPT case, actually allowed more radiative trapping and less cooling potential, hence increasing the air temperature. This effect in combination with the low wind speeds might also explain the exceptionally poor performance during the night in March.
Interestingly, LCZ 8 also presented higher temperature in the Vienna GIS case even though its building heights and urban fraction were alike with WUDAPT. However, like the central areas (LCZ 2 and 5), both the road and building width in the Vienna GIS description of LCZ 8 was much narrower than the WUDAPT case. This confirms that one of the main drivers of urban air temperature in the BEP-BEM model is the radiative trapping caused by increased H/W ratio. In fact, the main change between one LCZ 8 parameterization to another is the amount of wall coverage per grid which doubles when using Vienna GIS parameters. Increased cooling potential by wind induced by larger road width might also increase this difference.
Concerning the seasonality, the differences between the definition of the LCZs' UCPs seem to be intensified by stable atmospheric conditions. This can be most clearly seen in the February case, which had the highest wind speeds, the lowest diurnal range of temperature, and the least difference between the two model runs.
January had clearer patterns of difference in the south than in the north where most of the cloudy conditions appeared to be. This reduces the downwards dwelling shortwave radiation received during the day and thus the impact of differences of H/W ratio on radiation trapping. Even if the Bougeault-Lacarrère local turbulence scheme has proven better results through different studies using BEP-BEM (Salamanca et al., 2012;Brousse et al., 2016), in comparison with the Mellor-Yamada-Janjic scheme (Janjić, 1990(Janjić, , 1994(Janjić, , 2002, the model might have had difficulties to accurately assess the convection and creation of clouds. Moreover, to our knowledge, urban case studies have not been done often during winter periods and most of the UCMs are calibrated for summer cases (Stewart, 2011;Heaviside et al., 2017), which could lead to a miscalculation of the radiation trapping over urban areas during winter nights.

Potential for measurement error
In order to better isolate the source or sources of the modelling error, a correlation analysis was conducted between overall RMSE per station from the Vienna GIS run and some inherent station properties that could potentially contribute to the deviation of observed and modelled results: distance of station from grid centre, elevation of the station with respect to sea level and the deviation of the station elevation from the model elevation. Of those three, only the e1253 Figure 10. Modelled (dots) and observed (line) mean temperature by seasonal period and LCZ for the WUDAPT run with range of variation (±2 SD) for each (blue and red, respectively). station elevation showed any significant correlation with the error with a weak positive correlation of 0.28. The PWSN provided no metadata regarding the height of the stations with respect to the ground. This introduces some uncertainty when comparing the observations to model results which are interpolated at a height of 2 m. However, the filtering process plus the higher density of measurements allow for a normalized statistical interpretation of those data sets for model evaluation. Furthermore, model evaluation suffers from this reality not only vertically but also horizontally. Indeed, even if the temperature is measured at the same height it is unlikely measured at the centre point of the grid cell, nor in an exact same morphological surrounding and condition (i.e. Gaudenzdorf station). Therefore, the margin induced by the height of implementation of the PWSN stations is unlikely a main source of error, houses being usually not higher than 20 m in Vienna.

Urban complexity and improvement of results
This sensitivity test demonstrated that a more elaborated description of the urban areas by LCZ classification did not greatly improve the prediction of core climatic variables, such as air temperature. As Jänicke et al.   (2016) demonstrated over Berlin, Germany, increasing complexity in UCMs might actually be a source of error more than a source of precision. However, it has been shown in previous studies (Alexander et al., 2016;Brousse et al., 2016Brousse et al., , 2017 and in this study that the models are sensible to the LCZs disparities. In this case study, the WUDAPT urban description extracted from the mid-values of Stewart and Oke (2012) performed only marginally worse overall than the urban description obtained from detailed GIS information of the city structure when comparing the results to multiple measurements across the city. However, notable differences are observable between the two runs (up to ±1 K) in certain areas. Therefore, it appears that the BEP-BEM model is able to indeed recognize morphological differences from one neighbourhood type (i.e. LCZ) to another and between parameterizations (i.e. WUDAPT against Vienna GIS).

e1255
The BEP-BEM model represents the urban geometry of each LCZ as an idealized and simplified street canyon (Martilli, 2002; with only a single fixed value for each UCP. Apart from differences in geographic context and boundary conditions, the model, parameterized by reading a Look Up Table, cannot represent properly in space any variation of morphology within grid cells of the same LCZ at 500 m horizontal resolution. This might explain why the performance is likely the same overall for the two runs, even when comparing them by LCZs, because the description of an urban class is fixed all across the domain, overshadowing the impact of the complex morphological reality they are meant to represent (Figure 7). This difference in variability was also observed in the comparison of the modelled and measured temperatures where the variation of modelled values within LCZ was almost always significantly smaller than the range of observed values (Figure 9).
Although the model is indeed depicting differences of temperatures between the two parameterizations, it seems that these differences are obscured when compared to the highly variable observations. One of the major hypotheses would be that the model is incapable of representing small-scale urban climate variability by gridding the information at coarser resolutions and using broad classifications. This is a well-known issue that is driving research questions on the ability of computational fluid dynamics (CFD), such as Reynolds-averaged Navier-Strokes (RANS) or large eddy simulations (LES), to represent smaller-scale urban impacts on climate (Nazarian and Kleissl, 2015;Sanchez et al., 2016;Li et al., 2016;Wang et al., 2017). Implementation of such detailed interactions in mesoscale models are also investigated (Santiago and Martilli, 2010), yet computational capacities are still a barrier for broader adoption of these strategies.
Yet, multiple sources of error and uncertainties might also have an impact: (1) The location of the city of interest has to be taken into account prior to the analysis. The complex orography surrounding Vienna might explain some of the errors that are observed as regional climate models such as WRF are also idealizing the climate reality. (2) As Jänicke et al. (2016) demonstrated over Berlin, Germany, increasing complexity in UCMs might actually be a source of error more than a source of precision. In this case, the BEP-BEM model is known to be one the most complex UCMs. (3) The simplification of the urban description prior to the model runs introduces a margin of error and hinders the ability to evaluate the true potential of using the detailed database. (4) Thus, detailed information might improve model performance, however these data sets, when available, are often constrained by political boundaries. Hence assumptions would have to be made for representing the whole urban area, introducing further unknowns. (5) Finally, the evaluation of model performance depending on urban morphological definition requires data sets with excellent geographic coverage. The use of crowdsourced data might be a solution in this regard, however, as specified before a range of uncertainty has to be acknowledged. Yet, UCMs simplified representation of the morphology at coarse scale does not allow for evaluation based only on official station networks. As the local morphological and related climate reality might impact the reference measurements too (i.e. Gaudenzdorf station), the interpretation of those would be biased.

Conclusion
The principle finding of this work is that new strategies coupling open-source remote-sensing and crowdsourcing techniques are necessary for improving our studies over ever-changing and expanding urban areas. In the current state of the art, this study demonstrates that there are no perfect parameterizations of a whole urban area at such scales. Indeed, users will be confronted with multiple challenges in terms of choosing a domain size and resolution, a model complexity, a land surface classification and particularly a parameterization of the urban landscape.
In this sense, the WUDAPT project offers a great opportunity for quantifying the impact of those decisions by defining a common framework on which to develop and consider further improvements.
The WUDAPT classification at its data Level 0 offers a description of the urban morphology that is substantially sufficient for urban climate modelling when compared with detailed LCZ definitions extracted from GIS databases at a 500 m scale. This information gives great insights in regards of the capacity of our climatic models to efficiently represent the local climate of urban areas in different seasonal cases. As the WUDAPT project is currently focusing on the standardization of Level 0 data Verdonck et al., 2017;Zheng et al., 2017) and on the development of a Level 1, which would give more detailed land use information to the model users, a question is raised concerning the portion of the forecasting error that is attributable to data sets that are currently used within the meteorological and climatological communities.
From the authors perspective, the use of Level 0 data is currently satisfactory for regional climate modelling. Yet, the use of Level 1 and 2 data, as described by the WUDAPT team Ching et al., 2015;Mills et al., 2015;Bechtel et al., 2016), would need both: a parallel improvement and understanding of the models themselves, to be used in a coherent way for mitigation purposes; plus an extension of our in situ measurements capacities with a coherent verification framework to test our models sensitivities in a comprehensive manner. Indeed, the variability of error between LCZs, which would not have been demonstrated without the use of crowdsourced data, stressed that using the more calibrated data for parameterization of the models in their current form did not always show the best performance.
Moreover, climatic and UCMs have often been calibrated over summer periods and examples such as March or January cases showed the need to better represent the impact of seasonality on urban climate. e1256 K. HAMMERBERG et al. While the importance of urban parameterization has been demonstrated to be a key element in comparison with the use of different UCMs , it seems that detailed data sets cannot be used alone to initialize regional models as they are often only available at the city scale. Indeed, such data sets often need to be recalculated and classified before being applied to suburban areas in mesoscale studies, leading to errors in the urban morphology's description. Thus, the use of WUDAPT Level 0 data would already give a sense of the urban climate at a common mesoscale level of study (500-1000 m). However, to provide reliable guidance to stakeholders in terms of mitigation and adaptation strategies, a better understanding of urban climate drivers needs to be arrived at in two parallel ways: (1) by obtaining better spatial definition of the whole urban area and its surroundings, as mean values from LCZs are not able to depict all the urban heterogeneity and (2) by improving our downscaling capacities to allow urban climate models for representing mesoscale climate at higher resolutions, so that detailed information could be properly ingested in the calculations. This improvement is only foreseeable by being able to bridge over past frontiers, by using innovative techniques, and compare UCMs in places where they have not been studied before: suburban areas, mid-sized northwestern cities and principally cities of the global south. Yet this effort has to follow thorough developments of methodologies by the international community, such as the International Urban Energy Balance Model Comparison (Grimmond et al., 2010, to allow standardized comparisons of model performances throughout the world.