Short‐term forecasting of air pollution index in Belgrade, Serbia

Forecasting air pollution in big cities is of great importance, and there are various types of air pollution indices (APIs). In the present study, the level of air pollution in winter 2011 at four point‐locations in Belgrade, Serbia, was measured using wind speed data and non‐standard and standard air temperature data, measured at 10 and 2 m, respectively. Using multiple linear regression (MLR) analysis, equations to forecast the API were obtained. This forecast was verified using data from the winter of 2012/2013. The results obtained are well aligned with the monitored API and verified by the root mean square error (RMSE). It is shown that standard meteorological measurements representative of the city can accurately predict the API at individual point‐locations as well as using temperature and wind speed measured at each respective location. Three locations, which measured SO2, NO2, PM10, O3 and CO, showed poor air quality for > 78% of days observed. For the fourth location, the final estimate of prevailing air quality could not be calculated due to the absence of PM10 measurements. Forecasting the API on a short‐term scale can be of great help for long‐term air‐quality improvements.


| INTRODUCTION
Nowadays more than half the human population lives in urban areas. According to the 2018 Revision of the World Urbanization Prospects (UN, 2018), 4.2 billion people live in urban areas, with an expectation of 68% of all mankind living in cities by 2050. Consequently, the number of vehicles in cities will increase, as well as the concentration of air pollutants. This means that the unfavourable effects of air pollution will impact an increasingly large portion of the human population, causing multiple health problems, such as cardiovascular disease, lung cancer and acute respiratory infections (Kampa and Castanas, 2008). Anderson (2009) correlated an episode of heavy air pollution in London in 1952 with excess deaths. Within three weeks of this episode, there were more than 4,000 deaths. Because influenza was not prevalent during that period, it was concluded that the cause of these deaths was due to air pollution. Gurjar et al. (2010) evaluated the health risks in megacities due to air pollution and concluded that South Asian megacities have high health risks due to high concentrations of total suspended particulates, which are more damaging to health than gaseous pollutants. Kampa and Castanas (2008, 362) stated that "shortand long-term exposures have also been linked with premature mortality and reduced life expectancy". The World Health Organization (WHO) concluded that air pollution causes disease and more than 2 million premature deaths (WHO, 2006) each year. Hence, many cities monitor air quality at several locations, particularly near crowded traffic lines.
Numerous studies have examined the relationship between urban air pollution and meteorological variables (Yadav et al., 2016;Squizzato et al., 2017;Zhang et al., 2017a) and several air-quality indices are in use (Stieb et al., 2012). Their common objective is to facilitate the interpretation of air quality and to help the general public adjust to changes in air quality. Plaia and Ruggieri (2011) reviewed several air pollution indices (APIs). Cogliani (2001) developed an API for the purpose of forecasting air pollution for the next day in three cities in Italy. Deniz Genc et al. (2010) used and calculated the APIs for 10 stations in Ankara, Turkey. Kumar and Goyal (2011) (Berkowicz, 2000) to measure the air pollution within the urban city canyon. However, their study was limited to a 10 week period in summer when pollution is generally less intense as compared with the cold part of the year. Vujovi c and Todorovi c (2017) assessed emissions of different pollutants released by air traffic along with weather conditions over a period of eight years. They concluded that "the number of days with poor air quality increased during the cold part of year, especially during anticyclones when temperature inversions form near the surface" and wind speeds are low (p. 93); consequently, emitted pollutants are not adequately dispersed. During the cold part of the year, high pressure, light winds, temperature inversions up to 700 or 1,000 m in height, fog and drizzle are common conditions in Belgrade. These are the most unfavourable weather conditions from the standpoint of local air quality. This weather situation can last for days, bringing warnings of excessive air pollution and providing the motivation for this research.
Both emissions of pollutants and meteorological factors determine air quality at point-locations in a city. The goal of this research was to find an appropriate way to forecast the level of air pollution in Belgrade on a shortterm scale. In that way, city authorities would have enough time to act regarding the improvement of air quality at locations of concern in the city, and consequently protect the health of the citizens.

| Data source
Belgrade is the capital of Serbia. According to the Census Atlas (2014), Belgrade has 1,659,440 inhabitants (513 inhabitantsÁkm -2 ). In the last 30 years, it has experienced rapid urbanization. As a result of the increase in the number of inhabitants, the number of motor vehicles more than doubled from 292,449 in 1985 to 595,053 in 2014 (source: Institute for Informatics and Statistics of the City of Belgrade). The major sources of air pollution in Belgrade are electricity production, heating and industry; for oxides of nitrogen, the major source is traffic (EIS, 2012(EIS, , 2013. The Environmental Protection Agency of Serbia, a division of the Ministry of Environmental Protection of the Republic of Serbia, provided the data for air pollution used in the present study. Data were measured at four automatic stations situated in residential areas in Belgrade: Novi Beograd, Stari Grad, Mostar and Vračar (Figure 1). These four stations fall under the local network of stations for air-quality monitoring in Belgrade. The locations of the stations are shown in Figure 1. Stations Mostar and Vračar are located near crowded motorways. Stari Grad is in the old city centre with a high number of individual heating system units (as well as Vračar), while Novi Beograd is in the new part of the city, with high buildings, a flat terrain and heavy traffic. Table 1 displays the coordinates, latitudes and longitudes as well as the elevations of the stations.
All stations measured hourly concentrations of sulphur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), coarse particulate matter (PM 10 ), ozone (O 3 ) and carbon monoxide (CO), with the exception of Vračar station, which did not measure PM 10 concentrations. An expert team from the Environmental Protection Agency conducted a logical control of the data. All the stations provided meteorological measurements as well as hourly values of air temperature (non-standard measurements) and wind speed, both at 10 m height.
However, we had and the standard meteorological measurements of hourly values of air temperature at 2 m height as well as wind speed at 10 m height . The source of these meteorological data was Belgrade Meteorological Observatory (BMO) at Karađorđev Park, which falls under the network of synoptic stations of the Republic Hydrometeorological Service of Serbia, which is by its status a location representative of Belgrade city. The location of BMO is also shown in Figure 1.
The analyser GRIMM EDM 180 aerosol spectrometer was used to measure PM 10 , while the Teledyne analysers measured the other pollutants. The Teledyne API model 100E gas analyser measured SO 2 concentrations, which were determined by measuring the fluorescence of SO 2 at a wavelength of 330 nm. The Teledyne API Model 200A gas analyser measured the concentration of nitrogen oxides using the chemiluminescence detection of NO 2 . The model T300M CO analysers measured the concentration of CO using non-dispersive infrared spectroscopy. The API model 400 A ozone analyser was used to compare the light absorption of the sample air with the light absorption of the sample scrubbed to remove the O 3 . The concentration of O 3 was then calculated using the Beer-Lambert Law.
The period selected for all stations was winter because of its high air pollution. The two different halves of winter were studied: January-March and October-December 2011. Due to inoperative equipment at Mostar station, the first winter was limited to March. For verification of the obtained results, data from the winter 2012/2013 were used.

| Calculating the air pollution index (API)
On a daily basis, the API of a single station is derived from hourly measured concentrations of air pollutants. Air Protection Law Regulation (2010) prescribes daily threshold and tolerance levels for SO 2 , NO 2 , PM 10 and CO concentrations. Law Order (2006) prescribes the daily threshold level of O 3 . Its daily tolerance level is considered to be equal to the threshold. The tolerance level is a threshold raised by the tolerance threshold, that is, the percentage of excess allowed above the threshold under prescribed conditions. The threshold is the highest concentration allowed for the air pollutant in order to avoid harmful effects on human health and the environment. According to pollutant thresholds and tolerances, air quality is defined as good (clean or slightly polluted air), acceptable (moderately polluted) or poor (over-polluted air) ( Table 2). Air quality is considered poor if only one pollutant's concentration exceeds its tolerance at any given station. The concentrations of five air pollutants were measured at Novi Beograd, Stari Grad and Mostar stations. Following Cogliani (2001), if the concentration of a single pollutant was low, the score was 1; if the concentration was acceptable, the score was 6; and if the concentration was high, the score was 31. Numerical scores of one, six and 31 were selected to avoid superposing of the scores. If the concentration was missing, the score was 0. According to the criteria presented in Table 2, a capital letter indicates the air quality at a given station with "G" for good air quality, "A" for acceptable and "P" for poor. For all five pollutants: 5 was good air quality, 10-30 were acceptable air quality and 35-155 were poor air quality.
Subsequently, new numerical values were attributed to each station to represent the station's daily API. When the air quality score was good, the station index was attributed the value of 0. For an acceptable air quality score, the station index was attributed the value of 1. When the air quality score was poor, the station index was attributed the value of 2.
The method was the same for Vračar station at which four air pollutant concentrations were measured. PM 10 data were not collected at this site, so there is a slight difference in the total score compared with the other stations. Because of this, a score of 4 represents good air quality, 9-24 is acceptable and 34-124 when air quality was poor.
Each API calculated with this method was dependent on the number of air pollutants measured. If the number of air pollutants differed from station to station, the stations' APIs were not comparable. This means that Vračar's API could not be compared with the indices of the other three stations due to the absence of PM 10 measurements at Vračar.

| Selection of the correlation variables
The selection of correlation variables was made according to the method, analysis and results provided by Cogliani (2001) and Genc et al. (2010), as well as data from the local network for automatic air quality monitoring and BMO. Three independent variables were selected: diurnal temperature range (ΔT), daily average wind speed (V av ) and the previous day's API (I d-1 ). ΔT is the difference between daily maximum and minimum temperatures. Although the relation between ΔT and human health and mortality is well documented (Ge et al., 2013;Zhang et al., 2017b), the present understanding of the relation between ΔT and air pollution can be improved. The API derived from the previous day's concentrations is justifiably used as the basis for estimations of the current day's air quality (Cogliani, 2001;Deniz Genc et al., 2010;Kumar and Goyal, 2011), and it was shown to be a reasonably good replacement for motor vehicle traffic intensity, which produced the air pollution (Cogliani, 2001). A station's daily API is a dependent variable calculated from measured or forecasted independent variables using multiple linear regression (MLR) analysis. Relative humidity and total rainfall did not show statistically significant correlations with daily API, with exception of relative humidity for Vračar (significant at the 0.01 level).

| Multiple linear regression (MLR)
The MLR is a common variation of linear regression (Wilks, 2006) in which there is a single predictand and more than one predictor (in the present case, ΔT, V av and I d-1 ). Each predictor has its own coefficient. The general form of the prediction equation with regression coefficients illustrates the relation between the calculated station's daily API (I c ) and other variables: where I d-1 is the daily API obtained from the previous day's concentrations of pollutants; ΔT is a diurnal T A B L E 2 Daily threshold and tolerance levels of the measured air pollutants  APIs and the difference between them is residual. The lower the RMSE, the better the API forecast. Thus, the RMSE is a good measure of how accurately the model predicts the variable.

| Pearson correlation
To find the degree of association between two variables, the Pearson product-moment coefficient of linear correlation, or Pearson correlation, was used. It is between -1 (a perfect negative linear relationship between variables) and 1 (a perfect positive linear relationship). What is often lost sight of is that the correlation coefficient does not provide any explanation about the relationship between variables, at least in some physical or causative sense (Wilks, 2006). If the correlation coefficient is > 0.5, the correlation is good.

| Seasonal variations in pollutant concentrations
The analysis began by observing the behaviour of air pollutants throughout the year. Average monthly concentrations obtained from hourly concentrations measured at four automatic air pollution monitoring stations in Belgrade are displayed in Figure 2, with descriptive statistics given in Table 3. Observed values for pollutants SO 2 , PM 10 and CO have variations, ranging from summer lows to winter highs. The opposite is the case for variations in O 3 , which range from winter lows to summer highs. As for NO 2 , there are no such specific seasonal differentiations. These seasonal variations are in a tight relation with meteorological variables as well as with the number of motor vehicles and combustion heating systems in operation. The winter heating season in Belgrade occurs between mid-October and mid-April. Heating systems are considered operationally inactive between mid-April and mid-October. Therefore, winters are characterized by greater emissions of air pollutants and specific meteorological conditions such as frequent temperature inversions, lower mixing height and anticyclonic atmospheric circulation. Air pollution concentration easily reaches peak values under conditions of increased emissions and inadequate surface level dispersion. SO 2 is a combustion product of fossil fuel. Its winterto-summer average monthly concentration ratios (W/S  ratio, the maximum concentration divided by the minimum concentration), ranged from 3.5 to 5.1. All W/S ratios are given in Table 3. Higher winter concentrations are due to an increase in the number of active heating units. Lower summer concentrations are due to the inactivity of heating units and more favourable meteorological conditions with less frequent temperature inversions. A W/S ratio of 5.1 was obtained at Vračar station located in a part of the city where the number of individual households with heating system units is large compared with parts of the city heated by city heating plants. The smallest W/S ratio of 3.5 was obtained at Mostar station ( Figure 2a).
NO 2 is mostly attributed to motor vehicle exhausts. The W/S ratios of average monthly concentrations ranged from 1.6 to 2.1. As can be seen in Figure 2b, there are no significantly differentiated winter highs or summer lows. Still, higher concentrations measured during winter are related to specific meteorological conditions. The highest NO 2 concentration was at Mostar station due to its location near the highroad.
PM 10 is a primary air pollutant considered to have higher concentrations during winter due to home heating and industrial enterprise. Average monthly W/S ratios ranged from 3.8 (Mostar) to 6.3 (Stari Grad). Vračar station lacks PM 10 measurements. Three other station's T A B L E 6 Regression parameters for the equation I c = a ΔT + b V av + c I -1 + d for all stations Note: I c_Vračar : API calculated using concentrations measured at automatic stations and meteorological data from Belgrade Meteorological Observatory (BMO) at Vračar; I c_automat : API calculated using both concentrations and meteorological data measured at automatic stations. measurements show well-differentiated winter highs and summer lows (Figure 2c). Surface O 3 is not a primary air pollutant. It is considered a secondary photochemical air pollutant. Therefore, as expected, winter lows and summer highs are well differentiated due to changes in atmospheric circulation and solar radiation (Figure 2d). Summer-to-winter average monthly concentration ratios ranged between 3.8 (Novi Beograd and Vračar) and 7.3 (Mostar). This indicates higher values of surface-level O 3 during summer despite more favourable (from the standpoint of air quality) meteorological conditions such as thermal convection.
CO is a fossil fuel combustion and motor vehicle exhaust product. Therefore, average monthly concentrations show well-differentiated winter highs and summer lows (Figure 2e). Its W/S ratios ranged between 4.9 (Mostar) and 6.3 (Stari Grad).

| Correlation analysis
The Pearson correlation coefficients are given in Table 4. Values were obtained from pollutant concentrations and meteorological data measured at automatic stations and analysed using the MLR. Values of statistical significance led to the conclusion that the correlation between dependent and independent variables could not be ascribed to a series of random events.
The API was positively correlated to the ΔT for all stations (with the exception of Stari Grad for October-December 2011). A higher ΔT in winter reflects cloudless weather conditions, usually with weak winds and the occurrence of fog. That is a typical condition for air pollution increasement. The API was positively correlated with the previous day's API, too, for all stations. It proved to be the predictor most strictly correlated with the API, except for Vračar, and for Mostar in January-March. Only for Vračar, in both winter half-seasons, is the API the best correlated with ΔT. Negative correlations were established between the stations' daily API and daily average wind speed, which was expected considering the wind's dispersion of air pollutants. The opposite was true for Stari Grad for October-December 2011, at which time the correlation was positive. The API is most strictly correlated with the wind speed only for Mostar (Table 4a). This is a result of Mostar's location: its topography is channelled in a south-east direction, which is the direction of a strong local wind named koshava (košava in Serbian), usually observed in coldest part of year over a large part of Serbia (Romani c et al., 2016;Romani c, 2019). Correlations showed high statistical significance at 95% confidence intervals using the MLR analysis. The only one that is not statistically significant within a 95% confidence interval is that between the API and V av for Vračar, January-March 2011. Meteorological variables that did not show a high correlation with the API included relative humidity and rain (Cogliani, 2001).
The Pearson correlation coefficients calculated for pollutant concentrations measured at automatic stations and meteorological data measured at BMO are similar or slightly better than previous ones (data not shown).

| Observed air quality
The air-quality appraisal was determined at all four point-locations. Table 5 displays the air quality (%) found at these locations for specific periods. The air-quality appraisals were performed in correlation with the API dependant only on observed pollutant concentrations. Table 5 reveals the prevalence of poor air quality in two winter seasons at stations Novi Beograd, Stari Grad and Mostar. From the aspect of air pollution, January-March was less favourable than October-December 2011 (84.4% of days observed), so the air quality at Stari Grad was considered poor, followed by Mostar (80.6%) and Novi Beograd (78.9%) in January-March. A final estimate of prevailing air quality at point-location Vračar cannot be given due to absence of PM 10 measurements.
The air quality assessment combined with the observed API results show that the area represented by Stari Grad station is the most polluted among the comparable three. The reason for that is the much larger number of individual heating units and its low elevation. Consequently, on cold winter days, temperature inversions are frequent, as well as heavy fog (Veljovi c et al., 2015).

| Regression parameters
Linear regression equations were obtained for winter. They were calculated for two cases: one with meteorological data measured at automatic stations; the other with meteorological data measured in the standard way, at BMO. The idea was to determine whether air quality data at point-locations could be analysed using meteorological data from a single, representative station. Specific regression equations for all pointlocations are given in Table 6. The residuals are normally distributed, according to the normal predicted probability plots. Collinearity check was performed for all linear regression equations. The condition index was far below 15 in all cases, so the absence of multicollinearity was concluded.

| Air pollution forecast assessment
Using the data acquired by non-standard meteorological measurements of the automatic stations for forecasting API and the data acquired by standard meteorological measurements at BMO, the RMSE of the two can be calculated and a conclusion rendered, that is, data acquired by standard meteorological measurements can be used for short-term forecasting of API at point-locations. Table 7 shows the RMSE of the API forecasted using standard meteorological measurements from BMO located at Vračar, I c_Vrač ar , and the data from non-standard meteorological measurements at automatic stations. The I c_automat RMSE for the API acquired from standard meteorological measurements at BMO is less than the API RMSE acquired from the automatic station data at all locations and time periods, except Novi Beograd in October-December 2011. These results show that use of meteorological data acquired by standard meteorological measurements (i.e. BMO data) from Vračar to forecast API at all point-locations is justifiable. Figures 3-6 display the three different APIs for the observed time periods. I is the station's daily API obtained from measured pollutant concentrations at the automatic station; I c_Vrač ar is the station's daily API calculated from concentrations measured at the automatic station plus meteorological data measured at BMO; and I c_automat is the station's daily API calculated from concentrations and meteorological data measured at the automatic station. Both calculated APIs follow the observed values very well. Figure 7 presents the correlation between I c_Vrač ar and I for all stations. The best fit was for Vračar, October-December 2011, with a correlation coefficient of T A B L E 8 Correlation between I and I c_automat and between I and I c_ver Correlation is significant at the 0.01 level (two-tailed). a I c_automat : air pollution index (API) calculated using both concentrations and meteorological data measured at automatic stations during the winter 2011; I c_ver : API forecasted from 2011 equations, but with 2012/2013 measured data from the automatic stations; and I c_ver : API forecasted from 2011 equations, but with 2012/2013 measured data from the automatic stations.
T A B L E 9 Correlation between I and I c_Vračar and correlation between I and I c_Vrver Correlation is significant at the 0.01 level (two-tailed). a I c_Vračar : air pollution index (API) calculated using concentrations measured at automatic stations and meteorological data of Belgrade Meteorological Observatory (BMO) at Vračar during winter 2011; and I c_Vrver : API calculated using concentrations measured at automatic stations and meteorological data measured at BMO from a 2012/2013 sample.