Social weather: A review of crowdsourcing‐assisted meteorological knowledge services through social cyberspace

Crowdsourcing has significantly motivated the development of meteorological services. Starting from the beginning of 2010s and highly motivating after 2014, crowdsourcing‐driven meteorological services have evolved from a single collection and observation of data to the systematic acquisition, analysis and application of these data. In this review, by focusing on papers and databases that have combined crowdsourcing methods to promote or implement meteorological knowledge services, we analysed the relevant literature in three dimensions: data collection, information analysis and meteorological knowledge applications. First, we selected the potential data sources for crowdsourcing and discussed the characteristics of the collected data in four dimensions: consciousness, objectiveness, mobility and multidisciplinary. Second, based on the purpose of these studies and the extent of utilizing data as well as knowledge, we categorize the crowdsourcing‐based meteorological analysis into three levels: relationship discovery, knowledge generalization and systemized service. Third, according to the application scenario, we discussed the applications that have already been put into use, and we suggest current challenges and future research directions. These previous studies show that the use of crowdsourcing in social space can expand the coverage as well as enhance the performance of meteorological service. It was also evident that current researches are contributing towards a systemic and intelligent knowledge service to establish a better bridge among academic, industrial and individual community.


| INTRODUCTION
Weather not only directly affects the physical activities of human but can influence human society in an imperceptible way (Adger et al., 2013). The mechanisms behind different types of weather are very complicated and almost beyond the control of human beings. Thus, the observation, forecasting as well as alerting and responding to these weather situations (particularly severe weather) play significant roles in meteorological services (Schafer, 2012;Veltri & Atanasova, 2017). To help measure and predict weather and climate change, vast amounts of manpower and funds have been invested into meteorological and other related research areas, which has brought meteorological research into an era of 'big data' (Greengard, 2014). A large number of meteorological satellites have been launched and densely concentrated stations have also been established to collect meteorological data. Powerful computational resources, such as high-performance supercomputers, are utilized to calculate sophisticated meteorological models (Spiegelhalter et al., 2011). These meteorological data can then be processed and utilized by meteorological and emergency planning departments.
Recently, the emergence of meteorological and climatic researches that utilizes social and public power has brought a new perspective towards more intelligent meteorological knowledge services. In other words, the original client and the information receptor (i.e. the public) have also become the information providers and research promoters. The rich amount of information perceived by them with their equipped smart devices has gradually formed a space containing these social signals. With the rapid advances in social and mobile computing, the Internet of Things (IoT) and big data analysis, this space has been acknowledged as a new dimension of the real world in addition to existing physical spaces, which has been named the 'social space' (Zhang et al., 2018a). In this social space, people are not only regarded as message receivers but also as perceivers and producers of weather information. In other words, every person is considered to be a 'social sensor' that together can perceive and sense the real world, harmonize tasks and data sources: the greater the public focus is, the more data that could be extracted (Zheng et al., 2016). For instance, people can express their opinions and discuss weather freely on social network platforms such as Twitter, Facebook and Sina Weibo, and subsequently, data collectors can integrate the multidomain information through social platforms, collect and analyse subjective responses, and finally generate knowledge by using data mining and knowledge engineering technologies while considering social significance (Wang, 2014). This can allow timely acquisition of feedback and has helped to guide policy enactment, decision-making, emergency responses and public relations by government departments (Goncalves et al., 2014;Xiong et al., 2015). However, extracting information that current research can benefit from this social space that is filled with unstructured and complex data requires a unique set of methods for data collection, processing, analysis and application.
From the perspective of a practical investigation process on the social space, data and processing work can be evenly distributed between a large, relatively open and often rapidly evolving group of online users, whether they are aware or not. This method, known as crowdsourcing, greatly liberates economic and human constraints and provides a convenient bridge for low-cost, fast, dense, and diverse data collection and processing (Prpić et al., 2015;Taeihagh, 2017).
Specifically concerning the application of the social space and crowdsourcing into meteorological service, Muller et al. have reported on the progress of crowdsourcing in atmospheric science based on the situation as it was before 2014 . However, compared with the initial period, relevant research and industrial applications have since made great progress over the following years. As a typical feature, the current connotation of meteorological knowledge service promoted by crowdsourcing has been derived from simple data collection to systematic information collection, fusion, analysis and application. Compared with traditional meteorological knowledge service, Figure 1 shows the differences in focus and coverage between conventional approach and this novel approach. In other words, the two approaches exactly represent top-bottom and bottom-up ideas. Based on this, we believe that this novel way of collecting, analysing, and applying this social space-based information provides a service. It was also evident that current researches are contributing towards a systemic and intelligent knowledge service to establish a better bridge among academic, industrial and individual community.

K E Y W O R D S
crowdsourcing, data-driven, knowledge services, meteorological services, social space new perspective for future meteorology research. Here, we use 'social weather' to roughly represent novel methods of collecting, processing, analysing and applying multisource, multimodal, massive meteorological data with the help of society and the public, that is a crowdsourcing-assisted approach ( Figure S1).
To briefly summarize, this social space-based meteorological research benefits from the following: First, in addition to the objective data able to be collected from the real world, social weather research focuses more on the public's subjective opinions, reactions, feelings and feedback, which can strongly influence emergency planning and decision-making (Grasso & Crisci, 2016;Sharpe & Bennett, 2018). Second, although social weather utilizes the advantages of data fusion, fusing this large heterogeneous dataset efficiently and sparingly between meteorological agencies and other departments in various domains is difficult (e.g. transportation, agriculture and tourism) and is known as the so-called 'data islands' problem (Dey et al., 2015;Zipper, 2018;Padilla et al., 2018). Third, the crowdsourcing data collection of social weather provides opportunities for more precise and individual-level data collection, thereby relieving the dependency on deploying a large number of sensors which are costly to deploy over dense ground evenspace grids (Strangeways, 2018).
In this paper, we survey existing social space-based meteorological research powered by crowdsourcing. To be specific, we first present the current state of research and report their bibliometric characteristics. Then, we briefly introduce the data sources that use crowdsourcing to promote meteorological knowledge services in the social space and discussed in depth the commonalities and characteristics of these data sources. On the data basis, we further analyse the specific analytical methods used in the current progress of research and divided them into three levels according to the roles and purposes of the methods. We also discuss the application areas of meteorological services by investigating typical and deployed online weather service applications. Finally, we discuss the current social weather challenges and the issues that need to be addressed and predict the future value of social weather research.

SELECTION AND ANALYSIS ON BIBLIOMETRICS
By using the combination between the words in meteorological-related theme words (weather, climate, atmospheric science, meteorology) and crowdsourcing-related words (crowdsourcing, social media) as the query, we search the relevant journals for the conference papers in the Web of Science and Engineering Village indexing system, respectively. As a result, we selected a total of 97 related journal and conference articles which published from 2011 to F I G U R E 1 Differences between traditional and social space-based meteorological knowledge services. The social space-based meteorological knowledge service (orange) shows an outstanding characteristic of gathering public perception and feedback to official departments/ agencies (bottom-up), while the traditional meteorological knowledge service (blue) shows a clear characteristic of getting conclusion from meteorological agencies and delivering notification to the public (top-down) 2018 (Figure 2b), the list of these papers are provided in Supporting Information). For an intuitive overview of this social weather-related research, we present visualizations based on their basic bibliometric factors.
First, as the simplest but most topical form, the keywords of each paper are collected and counted to determine the evolution of research focus in this area. The 14 most frequently appearing keywords were recorded and displayed in Figure  2a. In general, the most common keywords consist of domain-specific topical words (e.g. climate change, weather), methodological words (e.g. crowdsourcing, citizen science, big data), words describing data sources (e.g. social media, Twitter) and targeted areas (e.g. smart cities, crisis communication) that have been present throughout the whole process of this research. In addition, the term 'crowdsourcing', which is also used as a part of the title of this paper, shows its first peak of appearance approximately 2015 and has received widespread attention since 2016. Obviously, from the view of the use of keywords, it is suggested that the importance of data sources is equal to their methodology and purpose. In other words, a data-driven approach is widely used in social weather-related studies, which also reflects the basic features of big data analysis in this area.
Then, the abstracts are utilized to find themes that are more detailed. However, to avoid the influence of common but meaningless words in a language, we use the latent Dirichlet allocation (LDA) model to extract topical words of all abstracts from all related papers. Figure 2c shows the word cloud of topical words extracted from abstracts, with the font size reflected by the frequency of these words. In addition to the words we observed in typical keywords, more detailed theme and objective words, such as sentiments, sensor, disaster, communication, system, etc., also occur in the word cloud with significant appearances in published papers. Furthermore, more detailed research objects (e.g. temperature, forecast, water, network, events, etc.) also appear with a relatively larger frequency, which indicates the information that the current meteorological knowledge service focuses on.

| Crowdsourcing-based data sources and attributes
As predicted, data derived from social weather are playing an increasingly important role in meteorological research, especially in public discourse (Kirilenko & Stepchenkova, 2014;Auer et al., 2014;Krennert et al., 2018) and in intensive and cost-efficient data acquisition. Also known as 'crowdsourcing for atmospheric sciences', social weather data consist of temporal, spatial, textual, behavioural and other features. Recorded contents are specific to one or more aspects of people's activities and are specific to an individual or a community. The specific features and their structures and resolutions as well as the objectivity determine the strengths, weaknesses and scopes of further application. In other words, different fusions and combinations of data are selected to solve different tasks and problems. Despite previous research  suggesting that there are many types of crowdsourcing information, from the view of data-driven information science, in this review, we tagged all data sources as one of four types based on the characteristics of data itself, such as consciousness, objectiveness, mobility and discipline.

| Potential data providers
In crowdsourcing-based data collection, the idea of 'humans as sensors' plays a significant role. The role of people in this process can either be as the operators of data collection or the generators of data. After years of development of crowdsourcing-based data collection in meteorology, there are many data sources continuously providing rich information. Here, we briefly introduce these sources, since previous research has already discussed and listed them : • Smart embedded sensors: Sensors that are embedded into smart phones and other IoT devices have been used to estimate fine-grained meteorological parameters. Temperature, pressure and light intensity collected by huge crowds of smart phone owners tagged with high-precision GPS depict an exquisite map (Overeem et al., 2013;Sosko & Dalyot, 2017). This smart phone-embedded device-based collection method benefits from promising geographic resolution, deployment and mobility. • Social networks: People's evolution of attitude towards long-term meteorological phenomena and short-term emergency communications can be extracted by mining their opinions from social network sites (SNS) (Liu & Zhao, 2017;Sisco et al., 2017;Veltri & Atanasova, 2017;Rossi et al., 2018). As free communication platforms, Meteorological data process social networks accommodate different angles and attitudes expressed by people of different backgrounds and roles. Thus, this makes the data collection of cross-domain opinion and knowledge possible and convenient (Lu et al., 2018b). • Sensors provided by weather-sensitive departments: Weather and climate influence many industrial sectors. Conversely, using these weather-sensitive industry data in meteorology also has an advantageous influence. This data source can be of great help to interdisciplinary research by virtue of the cross-disciplinary nature of its birth. • Volunteer and citizen science: Unlike government or agency-led meteorological information gathering, enabling volunteers and citizen science enthusiasts to collect relevant data spontaneously in the scheme of a collaborative research approach is also considered a source of crowdsourcing (Lee et al., 2012;Tipaldo & Allamano, 2017;Irwin, 2018).

| Processed public datasets
Due to the results of previous efforts, the generation of crowdsourcing datasets provides more opportunities and accessibilities to academic as well as industrial communities, thereby further benefitting from a collective effort to validate or verify the reliability of these solutions with a single standard. It should also be acknowledged that the production of datasets sometimes requires enormous human and economic costs. Therefore, the free disclosure of the dataset requires great courage and determination for the provider. For example, Meteorological Phenomena Identification Near the Ground (mPING) makes users of smart phones able to share current weather situations with apps in United States. The collected structural data can be accessed by calling the official application programming interface (API). We also list several typical public datasets for social weather in Table 1.

| Attributes of social weather data
In general, social weather collects quantitative meteorological and climatic signals, geographic coordinates, texts, images and web usage-based behavioural information. In other words, different problems require different data sources with different levels of precision, content and timeliness. Considering the inefficiencies of counting the new applications and platforms that are constantly emerging from the view of information science (the view of data itself, in other words), we use four aspects to label current and potential data sources.
• Consciousness: In this aspect, we mainly examine whether the data were deliberately observed and recorded by people during acquisition, or whether the weather data were generated during the user's daily life and other inadvertent behaviours. In other words, the question 'is the research-demanded data provided actively by a crowd or is it data collected passively by people?' is asked to judge the consciousness of the data. As an example, Kirilenko et al. compared the data processing work between for-profit businesses and volunteer amateur citizen scientists. No matter how the approach is, this kind of work requires users to actively observe, generate and upload information (Kirilenko et al., 2017). Another example is Morita et al., who developed a system that can collect and analyse disaster-related information posted on social media by residents in real time (social sensor system) and implemented it in society. The system can pick up information that is useful for grasping disaster situations even though the intention of these online posts' authors was not to provide information to government agencies (Morita et al., 2018). The advantages and disadvantages of these two types of data are obvious: conscious acquisition can collect in-demand and noiseless data, but the collection density and amount of data are still subject to the recruitment of volunteers and citizen scientists. In contrast, unintentional data acquisition enables large-scale and high-density information collection, but it is likely to be collected less urgently than required and to provide noisy data.
There is also research that focuses on fusing these two types of data collection to utilize the advantages of both. For instance, Niforatos et al. developed a crowdsourcing weather app that not only periodically samples smartphones' sensors for weather measurements but also allows users to enter their own estimates of both current and future weather conditions. The results from a 32-month public deployment show that this combination results in more accurate temperature estimates and features an average error rate of 2.7°C (Niforatos et al., 2017). • Objectiveness: Another important characteristic to describe the crowdsourcing-based data is whether the content is a result of subjective feeling and opinion or objective observation. Indeed, the most modern natural sciences such as meteorology relies on objective data, which makes the research quantifiable and measurable. However, subjective opinions and feedback reflect the purpose of meteorological science research: science and technology are developed to meet the needs of humanity and ultimately serve human development. Therefore, how to objectively study these subjective opinions has opened a subfield termed 'opinion mining' in information science. For example, insight from personal experiences and the Twitter feed was extracted during the Pan American Games in 2015 by Herdt et al. to aid weather stations in evaluating thermal discomfort and the heightened risk of heat-related illnesses. These subjective opinions are ultimately more densely deployed in the target area than weather stations are and provide the most direct feelings of the end users that weather stations cannot provide (Herdt et al., 2018). In addition, the SNS platforms provide different data according to their role and functions. For example, a weak relationship social platform such as a microblog seeks the most spread while strong relationship social platform retains privacy for individuals. Currently, major related research on social media is more focused on follower-based platforms (e.g. Twitter, Weibo, etc.) than friend-based one (such as Facebook). The SNS platform distribution used in relevant meteorological-related research papers using social media as a data source can be found in Supporting Information (Figure 3c). • Mobility: Crowdsourcing-based data collection provides a significant attribute that the bottomed sensors can be movable. The mobility makes the collection more intelligent and customizable. Specifically, this attribute fits well with the need for meteorological services that are more concerned with the concerns of people. Combined with the geographic coordinates of different precision levels, mobility data collection effectively improves the efficiency of the use of sensor resources. For example, a typical application scenario is to evaluate meteorological conditions through current vehicular sensing signals and road network information in transportation systems. These approaches take the advantages of regular and structural information gathering especially alongside the daily service work of vehicles (like Uber, taxi, etc.) (Dey et al., 2015;Tomas et al., 2016). • Multidisciplinarity: Meteorological data collection is not limited to meteorological and atmospheric science, but many related fields affected by meteorology can provide data support for social weather through crowdsourcing collection in their industry. In other words, multidisciplinarity enables problem solving assisted by knowledge from other domains. In the field of hydrology, for instance, Lowry & Fienen compared stream gauge measurements from volunteers to official measurements, who collected data via a conscious approach (Lowry and Fienen, 2013). Similarly, phonological change in terrestrial, freshwater and marine organisms shows that many of the climatic events can be reflected, which suggests shared large-scale drivers (Thackeray et al., 2016;Willis et al., 2017;Strangeways, 2018). From natural to artificial fields in agriculture, crop planting records and remote sensing data have also been used to measure key areas for meteorological observations (Dell'Acqua et al., 2018).
To summarize, as a typical characteristic of reviewed papers, all experiments and work are built on the basis of fusing or aggregating different types of data. Meteorology usually refers to a real-world phenomenon, which always has a comprehensive dimension to describe itself. In addition to meteorological data in this area, data from other fields are important sources of data supplementation, especially when the actual problem is a cross-disciplinary task (e.g. tourism (Padilla et al., 2018), agriculture (Minet et al., 2017;Zipper, 2018), etc.). The advantages and limitations of different social weather reveal a huge demand for advanced data fusion approaches and techniques. However, current research in this field is still at a preliminary stage. The fusion of cross-domain and multisource heterogeneous data still has plenty of space for further breakthroughs. Shown in Table 2, we list the relevant data sources and the data specialties that can be generated by them.

| Crowdsourcing-motivated analytics on social weather
Data analysis plays a crucial role in social meteorological research. Through data analysis and mining techniques, datadriven public deep-seated behaviours towards weather and climate can be analysed and acquired (including conscious or unconscious ones). Especially when the collected data are semistructured or even unstructured with more or less noise, a powerful analysis and process method is vitally needed. In addition, due to the huge volume and different noises of data, the performance and efficiency of analysis techniques are also important factors that should be taken into consideration. In this review, from the perspective of the discovery, mastery and application of data as well as knowledge, we categorize these research efforts into three levels: correlation and relationship discovery in acquired data, generalization and abstraction of discovered knowledge, and current effort to a more systemic and intelligent knowledge service. However, it should be noted that this stage-based grading method should not be considered as having either technical or research value.

| Correlation and relationship discovery in acquired data
The first level is to discover basic correlations and relationships between human and meteorological-related data. In this stage, statistical methods are utilized to calculate the correlation, optimize the process, map the distribution, etc. In this stage, an elemental knowledge is acquired and generalized. The target of this level is to find the pointcut of a specific problem and to attempt to ensure this pointcut and the correlation are stable. To acquire such knowledge, methods to capture, clean, and extract features of data are proposed. Feature extraction and correlation calculation are the most common topics in the related research.
On the premise that human behaviour (including behaviour on the Internet) is affected by the weather and climate (Li et al., 2014), the regular patterns between behaviour and atmospheric condition can be feasibly explored (Dutta et al., 2017). For example, it has been found that mental well-being deteriorates during warmer periods according to an investigation of depressive language in over 600 million social media posts, potentially leading to a change in suicide rates comparable to the estimated impact of economic recessions, suicide prevention programmes or gun restriction laws (Burke et al., 2018). It is not the only instance of using social media to assist correlation investigation by combining medical illnesses and meteorological information. A series of studies have reported the potential for meteorological and disease relationships through crowdsourcing approaches, such as the varying degrees of correlation between local weather conditions and the subjective text on Twitter attributable to discussions of Fibromyalgia (Haghighi et al., 2017) and depression (Yang et al., 2015;Modoni & Tosi, 2016). In the field of behaviour differences among different groups, Holmberg and Short-term warning Chatfield and Brajawidagda, Ultraviolet Imaging (Wilkes et al., 2016) Tropical cyclone data labelling Hennon et al., Surface Pressure Observation Mass and Madaus, (2014) Hellsten found that there are more women that have a firm attitude towards the human impact of climate change, while men are more sceptical in their daily discussions on Twitter Holmberg & Hellsten, 2015). Another scenario is that risk perception may affect users' trust towards severe natural hazards, such as the increased estimated likelihood of severe weather and the expectation of danger and harm, which were also associated with greater trust and are helpful for advancing risk mitigation actions (Losee & Joslyn, 2018). Rather than quantitative correlation, other studies linking user online behaviour to meteorology have also reported several interesting phenomena. For instance, Anderson and Huntington explored sarcasm and incivility usage in Twitter towards climate change during 4,094 extreme weather events. Their result showed that the sarcasm and incivility in discussions about climate change are overall rarely seen and more related to right-leaning politics profiles (Anderson & Huntington, 2017;Anderson & Becker, 2018). Facing severe weather, such as smog and storms, differences in reposting microblogs between oriental-and occidental-backgrounded users are discussed: less humour or increase in affective outpouring as the crisis developed was found on the Chinese Sina Weibo than on Twitter (Lin et al., 2016). Similarly, Stewart and Wilson tracked the entirety of Hurricane Sandy's lifecycle in social media with their proposed model based on a crisis communication theory (Stewart & Wilson, 2016). By contrast, information spread through interpersonal social networks such as Facebook also contributes to the awareness and popularization of meteorological information such as climate change (Ali, 2011;Sharpe & Bennett, 2018). Connor et al. analysed messages spreading through the communication chain in social network on Facebook, finding that statements centred on conventional climate change topics survived longer in communication chains than those with less conventional topics (Connor et al., 2016). In political use in the United States, climate change-related tweets also exhibit the difference in use by people with different political briefs: when the tweets are classified according to the location of states, red states prefer 'climate change' rather than 'global warming', and blue states show the opposite preference (Jang & Hart, 2015). A similar phenomenon could also be observed between left-and right-leaning news agencies (Saunders et al., 2018). Pearce et al. analysed the nature of the debate about the 2013 Intergovernmental Panel on Climate Change on Twitter and investigate three user communities to try to find significant links between the climate-convinced and critical users (Pearce et al., 2014).
While the core of crowdsourcing is to capture, analyse and apply data for each terminal user as much as possible, the roles of professional and authoritative meteorological services providers in the process of information dissemination are important as well (Fownes et al., 2018). For example, Kim et al. identified that such professional news and weather agencies played a dominant role as information sources and information diffusers to the public (Kim et al., 2018). However, blog owners that deny or downplay global warming attempted to disregard the overwhelming scientific evidence, such as Arctic sea-ice loss and polar bear vulnerability, to cast doubt on other established ecological consequences of global warming, thereby aggravating the consensus gap (Harvey et al., 2018). A case study on the question-answer community Quora also suggests that emphasis on specific subjects but not popular knowledge of climate change could draw more public attention on the issue of climate change (Jiang et al., 2018). Brandt et al. examined another case including before, during and after historic rainfall and flooding in the Midlands region of the greater Columbia, South Carolina, area in October 2015 to distinguish the characteristic of tweets of each period in the life cycle of an emergency event on social network (Brandt et al., 2018). The crowdsourced opinion can also be used as a measurement of accuracy of forecasting. Wiwatwattana et al., (2015) reported that their case study in Bangkok, Thailand, examined that the social response can serve as an alternative validation for weather forecasting, and such responses were also able to produce bias and weakness in forecast services more prominent.

| Generalization of discovered knowledge
The second level establishes generalization of acquired knowledge. The evidence-based knowledge and data found in the previous phase were abstracted at this stage. In other words, researchers found different types of regular patterns at the first level and concluded them to be laws for predicting unseen new situation and instances at this stage. Common tasks include the classification, regression, clustering and combination of them. Powered by artificial intelligence (e.g. machine learning) tools, such predictions and estimations can be implemented and driven by preprocessed data. The target of this stage is to abstract knowledge and examine the soundness and accuracy of this abstraction and to further visualize it to be easier to understand by the public.
For example, generating weather reports from collected tweets is one kind of abstraction (Butgereit, 2014). When applying them to a practical use, it is able to generate weather reports from new unseen tweets, which is a type of generalization. As another typical example of generalization, the sentiment tendency on subject texts abstracts emotional factors from various texts that people publish on the Internet on a daily basis. Applied in the meteorological field, people's feelings about weather and climate are also an important issue worth studying (Lachlan et al., 2014). Cody et al., (2015) use their previously implemented tool named the | 71 ZHU et al.
'Hedonometer' (Dodds et al., 2011) to measure the happiness of tweets containing the word 'climate', extracting lexical patterns which can be further used for automatically judging a new text's sentimental tendency as it appears on the social network. However, the form of tweets that contains keywords still highlights more weakly than hashtags (e.g. '#ClimateChange'). Hashtags significantly reduce and even eliminate subjectivity errors associated with humans and provide an inexpensive solution for event detection on Twitter, thus revealing a promising perspective for detecting emergent weather events (Hamed et al., 2015). Not limited to English, the current academic circles have also proposed a variety of meteorological emotion calculation methods for different language usage habits (e.g. Arabic (Al-Kabi et al., 2018)).
Predicting new samples in the future through existing data and scenarios is also a typical task of generalization of knowledge. Benefiting from a global-scale sensor network crowdsourcing 15+ billion transponder messages per day from aircraft, Trub et al. proposed a method to estimate meteorological parameters in upper air conditions with a relatively low error (Trub et al., 2018). Data subject to timing constraints tend to be more affected when the authoritative data are scarce, or unavailable for some periods. Thus, this makes the crowdsourcing work more significant in missing data imputation. Restrepo-Estrada et al. utilized georeferenced authoritative tweets on Twitter as an alternative data source to replenish timing such sequential feature and established a rainfall-runoff estimation and flood forecasting in Brazil (Restrepo-Estrada et al., 2018), and a similar task has been reported by Arthur et al. by using a text filter on relevant tweets in the UK (Arthur et al., 2018).
In the task of abstracting data to higher level understandable information, it is generally believed that directly collected data only provide information at the record level (signal, posts, etc.), while information better for human understanding requires an event level in the meteorological field (Ali & Ogie, 2017). Zhu et al., (2019) proposed a severe weather event extraction method to compile online news and tweets on Sina Weibo into weather event by considering their temporal, spatial and thematic feature.
Not only can external data be used to solve questions in meteorological research, but meteorological data can also be used for fixing problems in other fields. In transportation research, for instance, modelling and trying to predict traffic situations require the aspects of features. Lin et al. extracted weather events captured by hashtag from Twitter to predict the traffic speed (i.e. smoothness) on freeways with a linear regression model (Lin et al., 2015). A city-level traffic awareness alerting model was also proposed by Lu et al., (2018b), and this model extracted opinions about severe weather and traffic from Weibo and validated the prediction and alert by referring to news reports, thereby saving large costs on deploying physical sensors and integrating data from different sources and departments. In hydrological and agricultural research, Ravazzani et al., (2017) proposed a prediction model to estimate soil moisture and crop water requirements by combining ground observed data from space and reporting information from cyberspace with a crowdsourcing approach.

| Wielding mastered knowledge: towards a systemic and intelligent knowledge utilization
The final level cultivates knowledge service by aggregating knowledge and information derived by the aforementioned two stages. This stage aims to provide systemic and robust service and framework to support high-level human activities such as decision-making advice, expert systems, emergency response, and information dissemination. In addition to complex intrinsic models and algorithms, this level of analysis also has very high requirements for visualization on results of analysis. These requirements ask for readable, informativerich, and intelligible information due to the involvement of cross-domain people, machines, data, methods and processes (Kox et al., 2018).
To serve the emergency response mechanism, Ripberger et al., (2014) proposed a measurement on public attention to severe weather risk communication based on the growing stream of data that individuals publish on social media platforms, thus providing a better understanding of the relationship between risk communication, attention and public reactions to severe weather. Another example is making the decision between a safe but costly option (e.g. spending on protecting from a storm) and a risky option (e.g. spending less under the estimation that the storm will not attack). A contrasting experiment conducted by Mu et al., (2018) suggests that although providing more evidence and information to users can make the judgement more convincing, the best decision is not always made using a massive amount of information.
On the issue of situational awareness and decision support, challenged but motivated by large heterogeneous data streams, powerful and adaptable situational awareness systems are highly demanding. Akbar et al. proposed an architecture for fusing and analysing traffic, weather and social media streams, thereby predicting the probability of congestion in real time (Akbar et al., 2018). Adjusted by the feedback of traffic administrators, their case in Madrid reaches a premising predicting accuracy and similar results are also reported by Lu et al. on cases in Beijing (Lu et al., 2018b) and Qingdao (Lu et al., 2018a). From the view of individualized services, the knowledge service is usually located in personalized advising and recommendations. Chen et al. implemented and provided a health guidance system for urban residents on respiratory diseases, travel and sleep quality based on fusing urban air quality data collected through meteorological sites, mobile crowdsourcing and IoT sensing and even users' body signals .

| Methodologies of analytics utilized in social weather research
To summarize, these analytical methods translate the data discussed in the previous subsection into knowledge. To further analyse the details of these analytical methods, we divide the articles in the above review by the proportion of various types of research in recent five years and present the result in Figure 3a,b. As a result, Figure 3a shows that nearly five years of research have focused primarily on discovering the relevance and relationships associated with atmospheric science, while knowledge extraction and decision-making respond which leverage existing knowledge to newly unseen data will have gained considerable attention in the near future. We also classify these papers according to their theoretical tools: statistical based, machine learning based, graph based and other methods. In this paper, the statistical based method refers to a method of calculating the intrinsic relationship by statistical methods after the existing data are processed, while machine-learning-based approaches originated from statistical learning methods, using existing data to perform a specific task effectively without using explicit instructions, relying on patterns and inference of the data instead. The graph-based approach uses graph mining algorithms to derive conclusions by constructing network relationships on existing studied data. In Figure 3b, the paper based on statistical methods has been in a stable publication for nearly five years and accounts for about 41% of the total number of papers published in analytical methods. On the other hand, articles based on machine-learning methods have experienced extremely significant growth in the recent year. Articles belonging to machine-learning-based analytic methods in 2018 accounted for 76.92% of the articles in the past five years (10 in 13 papers). This result also shows that the research hotspots on analytical methods in this area are moving towards a goal of systematic and intelligent knowledge services.

| Crowdsourcing-driven applications
Focusing on applications and platforms, in this section, we analyse proposed systems that have already been deployed. Additionally, we also discuss the characteristic as well as problems of these crowdsourced supported meteorological knowledge service oriented systems. We divide these applications into five categories based on the role of crowdsourcing in their research. In addition, user interfaces of typical online applications can be found at Figure S2.

| Crowdsourcing for dense highresolution meteorological parameter
As discussed above, the crowdsourcing approach makes the collection with universal sensors possible, thereby sharply reducing the cost of data acquisition and releasing the constraint of observation equipment. Thus, the most direct function from which the meteorological research can benefit is to collect and analyse perceived parameters with a precise geotemporal tag. The systems collecting temperature by using the batteries of smart phones established by Overeem et al. (2013) and Chau (2018) are typical examples of dense high-resolution meteorological parameter collection. Public pictures with geo-tagged records are also exploited to monitor situation of snow (Giuliani et al., 2016) and ultraviolet (Wilkes et al., 2016). Similarly, a series of platforms that collects meteorological parameters by end users actively clicking and uploading observations through apps on smart phones achieved this goal with more diverse but less data due to the requirement for user labour (Niforatos et al., 2015;Niforatos et al., 2016;NSSL et al., 2019). When the observation turns to emergent, Frigerio et al.'s app, named MAppERS (Frigerio et al., 2018), provides not only the observation but also emergency requests to authorities.

| Crowdsourcing for feedback and public concerns
Inaccurate forecasting and failure of focusing public concerns may cause a serious trust crisis between meteorological departments and the public. Therefore, the evaluation and opinion by the public are valuable for judging the service of meteorological agencies. In addition, the timely identification of hot events of concern is also conducive to enhancing the public's goodwill and satisfaction with the meteorological service sector. As a most direct application on this basis, Zhu et al. proposed a meteorological public opinion monitoring the platform by concentrating related news and online posts into events with focused heat values .

| Crowdsourcing for individualized service
With the rapid development of machine intelligence, the understanding of human preferences makes it possible to personalize services for individuals. By discovering these rules | 73 ZHU et al.
based on commonalities and characteristics in big data, personalized services provide great stimulation for smart cities, smart medical and intelligent education. In conjunction with the meteorological field, the system called 'UH-BigDataSys' proposed by Chen et al. combines climate, air and personallevel health data to achieve personalized guidance services for respiratory diseases, outdoor travel and sleep quality .

| Crowdsourcing for infrastructure
Infrastructure and ad hoc physical collectors also bring an extra opportunity for establishing meteorological knowledge service . Vishwarupe et al., (2015) proposed an embedded system to collect real-time weather data and deploy them under cooperation with telecom infrastructure. To relieve the bottleneck of transmitting large volumes of unstructured data by using data centres, Du et al., (2018) proposed a novel strategy for creating an efficient and low-latency-distributed message delivery system with operational connected vehicles applications. In addition to data, crowdsourcing can focus on computational resources. Chen et al. used a virtual cloud host to build a meteorological computing model, thereby realizing the crowdsourcing of computing resources (Chen et al., 2017).

| Crowdsourcing for interdisciplinary
One of the great advantages of crowdsourcing is that regardless of the purpose of crowdsourcing, the methods and means of implementing crowdsourcing often have similarities. Therefore, the use of crowdsourcing can effectively advance the meteorologically related interdisciplinary research progress. For instance, Aihara et al. proposed a smart phonebased driving recorder. The recorder records a video of the front environment while driving and provides a series of upload interfaces for users to report weather, traffic and other conditions to collect and aggregate information in the above fields (Aihara et al., 2016).

| Platform and scenario of applications
Based on the analysis of the above-mentioned articles on the application of meteorological knowledge services, we further divide them according to the applicable equipment and applying platforms which is shown in Figure 3d. Different applicable devices often correspond to different application scenarios. For example, mobile device-based applications are directed to the services of user terminals that are capable of providing information to end users while collecting data through these terminal devices. On the other hand, the data management-based web system usually processes and visualizes the aggregated data to server end user such emergency responder and decision-makers. As can be observed from Figure 3d, these two categories account for the majority of current application research (65%). Another research that has received considerable attention is the use of embedded devices to achieve low-energy, high-efficiency meteorological knowledge service data transmission (21%).
Considering these current applications in meteorological service domain, we can observe that since 2010, especially in the past five years, these applications have rapidly evolved from appearance to maturity with external technology environments such as mobile computing, social networking and machine intelligence. However, compared to the current research work in this crowdsourcing-assisted field, these applications still need to be further upgraded to achieve the 'smart' level. In other words, the current service is still unified or depends on the user's active query of information. How to provide personalized and differentiated knowledge services and recommendations to the end users intelligently by the system is still needed to further explore and implement.

AND FUTURE OF SOCIAL WEATHER
Despite the increasing attention given to this area, research on social weather still has prominent room for improvement. These progressive spaces are still rooted in the various aspects we have outlined in the previous sections, but the motivation for such improvement is to change or even replace the existing schemes.
In the area of data collection, two crucial questions still remain to be answered: First, with the next-generation mobile computing technologies such as 5G and the increasing access of mobile and IoT devices, how can we distribute sensors more efficiently during crowdsourcing data collection (e.g. rewards), and how can we resolve the transmission of massive amounts of collected data efficiently in the near future to avoid data bottlenecks? The second question is that multisource data are difficult to collect, aggregate and match. For example, as a widely acknowledged difficult problem, author attribution and name disambiguation among different platforms require complicated processing models, even manpower especially, when the name structures of the users are different than each other Tang et al., 2012;Zhang et al., 2018b). However, the uncertainty inherent with crowdsourcing-based collection such as inconsistency observation standards, subjective basis and data errors of missing data still must be solved (Khan et al., 2015;Tipaldo & Allamano, 2017).
When processing and analysing the collected data, challenges also exist: First, feature engineering extracts features from large volumes of data with noise and poor structure is a classic but vital step in data preprocessing. However, the emergence of deep learning has incorporated this so-called feature engineering into the intelligent model (Lecun et al., 2015). By learning from huge volumes of data, the state-of-the-art model acquires a special representation of data and knowledge automatically, which makes more intelligent tasks in meteorology possible. In addition, pretrained language representation models based on prior knowledge (e.g. BERT (Devlin et al., 2018), Word2vec (Mikolov et al., 2013) and Glove vector (Pennington et al., 2014)) better organize and represent the semantic information of text by learning from large texts in the general domain. This greatly assists the analysis of the subjective text information and content, making weather-related text processing able to be explored in a more complex and connotative direction. So far, the application of deep learning in this domain is still at a very preliminary stage. Second, the joint analysis of multimodal data is also expected to advance. Obviously, social weather contains data of different kinds of structures, such as images, texts, entities and relationships. Such heterogeneous data make unifying the model difficult. Therefore, generalized, comprehensive and informative data fusion has far-reaching implications for meteorological knowledge services (Lin et al., 2018).
To date, social weather-based service applications are still mostly in the 'Query-Results'-based passive service. However, smarter 'active services', recommendation systems, automated briefings and even personalized climate-related education and healthcare are more important and more challenging directions for the integrated use of social meteorological knowledge. Another active service also required to establish a crowdsourcing scheme and strategy (e.g. reward scheme) to optimize the efficiency, range and goal during the whole process of collection, analysis and application (dos Santos, 2017;Chapman et al., 2017). Additionally, from the perspective of meteorological teaching, how to organize knowledge efficiently to establish accurate push for professionals and students in this field is worthy of further study (Tarus et al., 2017;Wan & Niu, 2019). Similarly, how to mine new knowledge in the measurement of scientific literature, and even predict new hotspots is also an important direction of meteorological knowledge services (Tarus et al., 2018;Nie et al., 2019;Yousif et al., 2019).
Based on these current researches and zooming out from microscopical procedures and technics to the macroscopic framework and structure of social weather-related research, we propose to extend the concept of cyber-physical-social systems (CPSS) to cover all related research. Originated in intelligent system society, CPSS provides a framework with which to depict the relationship among real-world situations (physical), sensing and computing (cyber) and human factors (social), which has been used in enterprises and transport domains (Wang, 2010;Zheng et al., 2016;Tse et al., 2017). In the CPSS model of meteorology, the composition of the triad is human/public, meteorological department and weather/climate. The common intersection of any two and three subfields conceptually includes the research scope of social weather, which could be found in the Figure S3. In other words, the intersection of subjects is the main research area of current social weather, and the specific problems studied can be classified into one of the subjects.

| CONCLUSION
Crowdsourcing-assisted meteorological service plays an important role in addressing the problem of harmonizing the public feedback and data sources as well as providing the information support to both official departments and individuals in a more affordable, dense and public concern-centred way. Compared with conventional meteorological approach generally providing service based on physical sensor, transmission and model processing, the social-based meteorological service on the other hand provides a new dimension of collecting, analysing and applying data provided and benefitted from the public.
In this paper, we have presented a review of crowdsourcing-assisted meteorological service which we term 'social weather'. The contribution of our work in this survey is twofold. First, we have summarized the research achievements in the class of crowdsourcing-assisted meteorological service from its early beginning of 2010 to 2018 by classifying the papers with a focus on attributes of data collection, stages of information analysis and scope of practical applications. Secondly, we have given an insight on current challenges as well as the future trends on crowdsourcing-assisted meteorological knowledge service.
This survey has revealed that crowdsourcing-assisted meteorological services in combination with other data-driven and intelligent techniques are getting more extensive acceptance and application. Furthermore, it is evident from the papers reviewed that current researches are contributing towards a systemic and intelligent knowledge service to establish a better bridge among academic, industrial and individual community. We hope that this review study will widen the frontiers of knowledge and provide useful literature for researchers interested in advancing this field of research.