Using social media to measure impacts of named storm events in the United Kingdom and Ireland

Despite increasing use of impact‐based weather warnings, the social impacts of extreme weather events lie beyond the reach of conventional meteorological observations and remain difficult to quantify. This presents a challenge for validation of warnings and weather impact models. This study considers the application of social sensing, the systematic analysis of unsolicited social media data to observe real‐world events, to determine the impacts of named storms in the United Kingdom and Ireland during the winter storm season 2017–2018. User posts on Twitter are analysed to show that social sensing can robustly detect and locate storm events. Comprehensive filtering of tweets containing weather keywords reveals that ~3% of tweets are relevant to severe weather events and, for those, locations could be derived for about 75%. Impacts of storms on Twitter users are explored using the text content of storm‐related tweets to assess changes in sentiment and topics of discussion over the period before, during and after each storm event. Sentiment shows a consistent response to storms, with an increase in expressed negative emotion. Topics of discussion move from warnings as the storm approaches, to local observations and reportage during the storm, to accounts of damage/disruption and sharing of news reports following the event. There is a high level of humour expressed throughout. This study demonstrates a novel methodology for identifying tweets which can be used to assess the impacts of storms and other extreme weather events. Further development could lead to improved understanding of social impacts of storms and impact model validation.

idation of warnings and weather impact models. This study considers the application of social sensing, the systematic analysis of unsolicited social media data to observe real-world events, to determine the impacts of named storms in the United Kingdom and Ireland during the winter storm season 2017-2018. User posts on Twitter are analysed to show that social sensing can robustly detect and locate storm events. Comprehensive filtering of tweets containing weather keywords reveals that~3% of tweets are relevant to severe weather events and, for those, locations could be derived for about 75%. Impacts of storms on Twitter users are explored using the text content of storm-related tweets to assess changes in sentiment and topics of discussion over the period before, during and after each storm event. Sentiment shows a consistent response to storms, with an increase in expressed negative emotion.
Topics of discussion move from warnings as the storm approaches, to local observations and reportage during the storm, to accounts of damage/disruption and sharing of news reports following the event. There is a high level of humour expressed throughout. This study demonstrates a novel methodology for identifying tweets which can be used to assess the impacts of storms and other extreme weather events. Further development could lead to improved understanding of social impacts of storms and impact model validation.

K E Y W O R D S
extreme weather, impacts, social media, social sensing, storms

| INTRODUCTION
It is well known that extreme weather events such as strong winds, heavy rain and snow cause impact and disruption to our daily lives (IPCC, 2014). However, there is little observational record of the specific impacts (e.g. damage to property, disruption to travel, danger to life, stress and anxiety) that occur as a result of these weather events. This information lies beyond the scope of traditional meteorological observations. The frequency and intensity of extreme weather events has increased over recent years and is predicted to continue to increase (IPCC, 2014). Meanwhile, there has been a shift from forecasts that focus on meteorological conditions alone to forecasts that incorporate information about their associated impacts (Taylor, 2018). This impact-based forecasting strategy is endorsed by the World Meteorological Organization (WMO), who have produced guidance to support its development (WMO, 2015). Together, these trends create an urgent need to understand the ways in which extreme weather events affect people and property, to validate forecast models and warning systems.
Social media is increasingly used across the world (Statista, 2017) and this presents an opportunity to use the rich social information it creates to inform preparedness and response to natural hazard events. Many people routinely use social media to discuss weather conditions, particularly when weather patterns are unusual. During crisis events, such as periods of extreme weather, technological challenges in affected areas may slow official news correspondent reports, while social media reports may be more swiftly distributed (Spence et al., 2015). The public availability of data from some social media platforms, notably Twitter, opens the possibility to use social media data to understand how human activity is affected during an extreme weather event.
"Social sensing" using social media has been widely used for knowledge discovery in fields relating to public health, human behaviour, social influence and market analysis (Wang et al., 2015b). Social sensing broadly refers to a set of sensing and data collection models whereby data are collected from humans or personal devices (Wang et al., 2015a). In this paper, social sensing using unsolicited social media data is distinguished from solicited crowd-sourcing, where users voluntarily participate and report observations in a structured or semistructured manner. Examples of solicited crowd-sourcing include the UK Met Office Weather Observations Website (WOW. Met Office Weather Observations Website, 2019), where the public can provide amateur weather observations, and the UK Snow Map (UK Snow Map, 2010), where Twitter users are asked to report snowfall observations using a particular hashtag (#uksnow). While solicited crowd-sourcing offers benefits in that data are more reliable and can be provided in a structured form by a set of dedicated volunteers, the volumes of data generated are typically low relative to the high volumes seen in unsolicited social media use; this can limit the usefulness of solicited data for understanding of wider impacts.
For social sensing using unsolicited social media, each individual user plays the role of a sensor. When a user publicly posts an item to a social media platform, they are providing a piece of sensor data. When grouped together by topic or location, large numbers of social media posts can therefore be used to develop an understanding of a range of issues. Social sensing of this nature has already been successfully used to detect natural hazards such as earthquakes (Sakaki et al., 2010), wildfires (Boulton et al., 2016) and floods (Brouwer et al., 2017;Tkachenko et al., 2017;Rossi et al., 2018). A number of studies have used social media to understand impacts of hurricanes in the United States (Guan and Chen, 2014;Cervone et al., 2016;Kryvasheyeu et al., 2016;Morss et al., 2017;Kim and Hastak, 2018;Wu and Cui, 2018).
This study explores whether social sensing can help meteorologists to understand how human activity is affected during extreme weather events, in terms of both emotional impacts and other social impacts (e.g. disruption, damage) revealed by the topics of conversation during storm events. Some weather-related studies have begun to explore this opportunity. The effects of weather on mood have been shown using sentiment expressed in tweet text linked to weather conditions (Hannak et al., 2012;Caragea et al., 2014;Li et al., 2014;Baylis et al., 2018). The categorization of tweet content related to weather and natural hazards has also been explored using both manual methods (Spence et al., 2015;Halse et al., 2018) and automated methods (Alam et al., 2018). However, to date there has been little exploration of social sensing focused on social impacts of weather for the purposes of impact-based forecast validation.
In the present study, data from the social media platform Twitter were collected during the 2017/2018 UK and Ireland storm season (approximately October-March) to explore social sensing as a methodology for assessing the social impacts of storms. The research uses and builds on the social sensing methods described by  to extract, filter, locate and get useful meaning from social media data collected during this storm period. Sentiment analysis is used to look at the aggregated emotional response to storms and how this changes during the period of a storm event. Categorization of storm-related tweet content provides an indication of what kind of information can be determined from tweets, looking in particular for content related to social impacts. The aims of the study are (a) to establish a methodology for social sensing that can provide useful information about social impacts of storms and (b) to apply the methodology to explore the impact of storms in the United Kingdom and Ireland during winter 2017/2018. These objectives are intended to help develop social sensing as a source of impact observations suitable for validation of impact-based weather forecasting systems.
The paper is split into the following sections: Section 2 outlines the methods used for data collection, filtering and content analysis; Section 3 reports the main findings of the analysis, focusing on sentiment and categorized impacts observed during storm events; finally Section 4 summarizes the main benefits and limitations of the social sensing approach as demonstrated in this study, and makes some suggestions for future research.

| DATA COLLECTION AND METHODS
This study uses a hybrid approach of methods from previous studies which successfully collected and found useful meaning from Twitter data relating to weather events or natural hazards (Lachlan et al., 2014;Cowie et al., 2018;Halse et al., 2018). Social media data were collected, filtered for relevance and geo-located. The content of the resulting dataset was then analysed using sentiment analysis and automated categorization.

| UK/Ireland storm season 2017/2018
Since 2015 the Met Office in the United Kingdom and Met Éireann in Ireland have used a storm naming system to raise public awareness of the effects of stormy weather with the public and to increase preparedness in response to weather extremes. A storm is named if it is expected to cause "medium" or "high" impacts from wind and/or precipitation, i.e. storms will be named for weather systems which are expected to have an amber or red weather warning issued by Met Éireann and/or the Met Office's National Severe Weather Warning Service (https://www.metoffice.gov.uk/news/releases/2017/ storm-names-for-2017-18-announced). Weather warnings are colour coded in response to their potential impact and likelihood; amber and red warnings are therefore issued for weather events which are both probable and likely to cause significant disruption.
In the 2017/2018 UK storm season, which generally runs from autumn to early spring, there were a number of named storms which affected the United Kingdom with expected medium or high impacts from wind and/or rain/ snow ( Table 1). The reason for naming storms is to improve public communication about weather events likely to cause significant impacts. Named storms are likely to attract attention from social media users because of their severity and the use of the names in official communication and forecasts. Named storms are also useful from a technical point of view, as one can search directly for the storm's name. Therefore this study mainly focuses on named storms and the impacts associated with them. Twitter data were collected for named storms for the duration of the 2017/2018 UK storm season from October 16, 2017 (when news of ex-Hurricane Ophelia hitting the United Kingdom was reported in the media) until March 10, 2018, post Storm Emma. Tweets containing keywords for weather related to a storm (e.g. wind, rain etc.) were also collected during this period. This was so that tweet activity which included weather terms only could be compared with tweets relating specifically to named storms.
Other countries' meteorological services may also name storms, using similar naming systems, so that some storms are already named before hitting the United Kingdom/Ireland. If a weather system has previously been named by another meteorological service, then it retains this name when it reaches the United Kingdom/Ireland. For example ex-Hurricane Ophelia was named by the US National Hurricane Centre (NHC), Storm David by Méteo-France and Storm Emma by the Portuguese Met Service.

| Social media and Twitter data collection
At the end of 2017 it was estimated that there were 2.46 billion social media users around the world, reflecting the global usage of smartphones and mobile devices. The social media platform Twitter, having 330 million monthly active users (Statista, 2017), is a social networking and microblogging service that allows registered users to interact via short published messages (tweets) up to 280 characters in length. Twitter makes user posts freely available via the Twitter API, making Twitter a popular source of observational data for both social and natural scientists (Williams et al., 2013). Data collection using Twitter can be achieved using keywords or "hashtag" references to specific topics or events. However, suitable algorithms must be applied to filter the data to ensure that only relevant information is then taken forwards for analysis (Spence et al., 2015). Locating the user who has posted an item to a social media platform is another challenge. At present only 1% to 2% of Twitter posts, for example, carry a Global Positioning System (GPS) location or specific location coordinates (Dredze et al., 2013); therefore, other methods must be employed to infer the place of origin. Using the methods outlined by , tweets relating to named storms and storm-associated weather conditions were collected using the Twitter Streaming API (via a Python script using the Twython package [McGrath, 2013]). This API returns all tweets up to a limit of 1% of the total volume of tweets at any point in time. Search keywords were used as an initial filter applied by the API to identify and download relevant tweets ( Table 2). As tweets using these keywords are unlikely to reach the API limit, it is believed that most if not all relevant tweets are downloaded using this method (Morstatter et al., 2013). Some storm names were prone to typing errors in tweets; therefore, some common variants were accounted for in the search terms used. Only tweets in the English language were collected, since the majority of the populations in this study (United Kingdom and Ireland) are English speaking. Tweets were collected over the time period October 16, 2017 to March 10, 2018. Each tweet was saved as a JSON object which is a lightweight datainterchange format often used for transmitting data from a server to a web application (https://www.json.org). Each JSON object contains the tweet text as well as a number of meta-data fields relating to each tweet (i.e. timestamp, username, user location, geotag etc.).
The storm name collection keywords are shown in Table 2. Storm names were added to the "Storm Names" data collection in the days leading up to each storm event and therefore collections for each storm name do not cover the whole of the study period. As wind is the main weather type to cause impacts during a storm event, tweets relating to wind were collected as well as storm names. Precipitation also causes impacts during a storm event; however, weather warnings relating to each of the named storms predominantly related to the impact of winds, rather than precipitation. It is also likely that there were precipitation events (snow or heavy rain) not related to storm activity which makes the precipitation dataset less comparable with the storm dataset. Therefore, while tweets relating to precipitation were also collected and filtered for relevance, the crucial comparison is between the storm tweet collection and the wind tweet collection. More than 100 million tweets were collected from the API during the 2017/2018 storm season (see Table 3). Figure 1 shows time series of the numbers of tweets containing the specified keywords collected per day during the period October 16, 2017 to March 10, 2018. This includes all tweets (including retweets) in the raw dataset prior to any filtering for relevance to named storms. The time period of each named storm in the collection period is shown by the grey bars. There appear to be associated peaks in Twitter activity relating to the Wind collection. Peaks in the Storm Names collection are less obviously associated with storm events, but inspection suggested that this collection contained some highly relevant content amongst a lot of irrelevant content, which is likely to confound the association. The Precipitation collection has some storm-associated peaks but also many peaks not associated with storm events.
This study is concerned with the social impact of storms as experienced by social media users. For this purpose, retweets are retained in most parts of the analysis, including counts and time series measuring total activity around storms, and sentiment analysis (where it is asserted that retweeting implies endorsement, approval or agreement with the sentiment expressed in the original tweet). For purposes of observing social impacts, retweets and "quote" tweets are removed as they do not represent original observations. This removal was performed using tweet metadata.

| Filtering and location inference
After data collection, the first stage in processing the Twitter data was to apply a suitable relevance filter to remove any obviously irrelevant data. The various filters applied can be split into the following stages which are described in the order in which they were applied.

| Time zone filter
The raw data collection contains tweets from all global locations including the United States and other countries.
Only tweets which relate to weather activity in the United Kingdom and Ireland are of interest for this study; therefore, the dataset was first filtered based on the time zone entity of each tweet to remove international tweets. The use of time zone as a proxy for the country level location of a tweet is discussed by Schulz et al. (2013)  following time zones were therefore kept in the dataset: GMT, London, Europe/London, UTC, BST, GMT + 1, Dublin, Europe/Dublin, Edinburgh.
As of May 2018, in order to comply with General Data Protection Regulation requirements, Twitter has removed the time zone field from tweet metadata (Cowie et al., 2018). Other methods for location inference (as described in Section 3.3.6 below) remain effective in the absence of time zone information. This filter removes approximately 90% of tweets in the raw data collection and therefore makes later processing steps more computationally efficient.

| Bot filter
"Bots" are automated user accounts that are set up to perform a particular function, such as collate/spread content from a set of sources, promote a particular view or deliver advertising. Automated tweets from bot accounts are highly unlikely to contain information relating to social impacts of weather activity, but the presence of this kind of content can distort the dataset. To remove bot content, the number of tweets by each user account was calculated for the entire dataset. User accounts with a disproportionately high number of tweets (in this case >1% of the total volume of tweets in the dataset) were identified as bot accounts; automated accounts tend to create significantly more tweets than human users. All tweets posted by bot accounts were then removed from the dataset. A further manual review of the remaining users generating a high proportion of tweets found some additional bot accounts which were also removed. This filter removes approximately 1% of tweets in the raw data collection.

| Weather station filter
Data collections containing weather-related terms include a high number of tweets automatically posted by amateur weather stations. As this study is focused on social impacts, these tweets are deemed irrelevant since they are not directly related to social impacts. A process was developed to remove them. Tweets from weather stations typically follow a fixed structure, e.g. "Wind 2.0 mph E Barometer 30.10 in Falling slowly Temperature 68.5 F Rain today 0.00 in Humidity 55." Here these were identified using a script that searches the text of a tweet and counts weather-related terms; if there were more than two weather-related terms the tweet was identified as a weather station tweet. This method was shown to work well by manual inspection. Tweets identified as being from weather stations using this method were removed from the dataset. This filter removes a very small number of tweets in the raw data collection for named storms; however, it removes approximately 1% of tweets in the raw data collections for wind and precipitation.

| Irrelevant term filter
As for the weather station filter, this filter is more relevant to the data collections containing weather-related terms rather than storm names. There are many phrases in the English language which use weather-related terms but do not relate to weather, as well as some homographs for weather-related words; these are irrelevant to this study so tweets that contain them were removed using a look-up table method. A list of common terms or phrases which use weather-related terminology but are clearly not referring to a weather event (such as "wind up," "throw caution to the wind," "cook up a storm" etc.) were identified in tweet text and those tweets were removed from the dataset. This filter removes a very small proportion of tweets from the remaining raw data collection.

| Machine learning relevance filter
Although the previous stage removed much irrelevant content, an additional stage of filtering was still necessary to remove tweets which included the search keywords but were not relevant to wind, precipitation and storms. These included, for example, business advertising, links to articles on other topics, references to people and places who shared a name with the storm, and various other items of irrelevant content. Tweets in the Storm Names collection were particularly in need of additional filtering, since there are many celebrities or other individuals who share the same names as the storms studied here. To achieve this, the methods used successfully in previous studies Cowie et al., 2018) were employed.
A set of 6,000 tweets were randomly selected from the tweet collections. Each tweet in this set was then manually labelled as relevant or irrelevant. Manual coding was conservative, labelling as irrelevant tweets that were obviously unrelated to the study topic and also tweets which were ambiguous (i.e. providing insufficient information to decide on relevance). In total there were 1,495 tweets in the dataset labelled as relevant and 4,505 tweets labelled as irrelevant. The labelled dataset was then used as training data for a multinomial naïve Bayes classifier. As a first validation test for this approach, 25% of the data were held back as a validation set and a classifier was trained on the remaining 75% of cases; this classifier had accuracy (i.e. correctly identified the relevance/irrelevance) of 92% on the held-back validation tweets, with an F1 score of 0.84. As a second test, to confirm the robustness of the approach, the same training/ validation test was repeated with 6-fold cross-validation. The results of each test were combined to give an overall mean F1 score of 0.80 and the summed confusion matrix (also known as contingency This confusion matrix shows overall accuracy of 92%, with most tweets in the filtered dataset classified as not relevant. Accuracy was higher on the False class (4,301/4,505 = 95%) than on the True class (1,221/1,495 = 82%), with a slight tendency to misclassify relevant tweets as irrelevant. This could be attributed to the training dataset being unbalanced and biased towards irrelevant tweets. However, this is a conservative error that ensures tweets that are retained are highly likely to be relevant. This is probably due to the wide variety of tweets in the Storm Names collection which were not related to named storm discussion. The multinomial naïve Bayes classification approach was deemed to be accurate enough and sufficient for the purposes of this study based on the results discussed above. A new classifier was then trained on the entire set of manually coded tweets to take forward as the relevance filter for this study. As an additional check of the performance of this classifier, random manual checks of the data after this filter was applied to the whole tweet dataset confirmed that it was performing well.
The Bayesian filter described above removes a further 4-5% of tweets in the data collection for named storms and approximately 2% of tweets in the Wind and Precipitation data collection. Table 3 shows the number and percentage of tweets remaining for each tweet collection after the stages of relevance filtering described in Sections 2.3.1-2.3.5 were applied. Overall there are 3-4% of tweets remaining after relevance filtering. Table S1 provides a more detailed breakdown of the number and percentage of tweets removed at each stage of relevance filtering for each tweet collection.

| Location inference
After relevance filtering was completed, each tweet in the dataset was also processed to identify if it can be located using information contained within the tweet. The spatial distribution of tweets relating to the weather would also give an indication of social impacts in particular locations.
As found in other studies, this study also finds that only~1% of tweets contain geo-coordinates of the tweet origination. Therefore, a location inference method is required. Using the same location inference approach as the one outlined by , the filtered tweet dataset was examined for different kinds of geographical information: geo-coordinates (geotag), the place a user designated in the Twitter application when posting (place), the location given in the user profile (user location) and place names mentioned in the tweet text. This method is based on the location inference method validated by Schulz et al. (2013) who found 92% accuracy when inferred location was compared against tweets for which a geotag was known. Thus, there were four tweet elements examined for location information in the following order: • Geotag: locate tweets using geotag (GPS coordinates) It was found that the most useful elements of a tweet which can be used to determine a location are the user location and place name mentioned in the tweet text. Table 3 shows the number and percentage of tweets in the filtered dataset for which a location could be found for each tweet collection. On average 77% of filtered tweets could be located using this inference method.
Here "located" means that a tweet was allocated to a defined spatial area with high confidence. Table S2 provides more detail on the specific numbers and proportion of tweets located by each tweet element for each tweet collection.

| Results of filtering and location inference
After applying the above methods of relevance filtering the number of tweets retained for analysis was substantially reduced. Figure 2 shows an example of this reduction for Storm Brian. Compared with the unfiltered data, the filtered dataset contains far fewer tweets. However, there is now a clear peak of Twitter activity of relevance to Storm Brian which coincides with the period of the storm (shown by the grey bar in the figure). The same is found for each of the named storms in the dataset (data not shown). Figure 3 shows tweets that were both located (using location inference) and relevant (passed the relevance filters). All other analysis uses all relevant tweets that are located to the United Kingdom and Ireland by time zone, but not necessarily precisely located using the inference process.
Results for the Precipitation, Wind and Storm Names collections, pre-and post-filtering and after location inference, can be found in Table 3. Typically, <5% of tweets are retained after filtering for relevance. Interestingly this was much higher (~24%) for the dataset relating to ex-Hurricane Ophelia. This is most likely because Ophelia is an uncommon name. Where a storm is named with a more common name (i.e. Brian, Caroline etc.) the percentage of tweets retained after filtering for relevance is much smaller because there is a higher background level of Twitter activity. Of the relevant tweets, typically 55-80% could be successfully geo-located using the inference method outlined above. Figure 3 presents a case study of located tweets in England and Wales by county, as an example of the social sensing technique. This case study shows the spatial extent of tweet activity in England and Wales for Storm Brian following application of location inference. Tweets located in Scotland, Northern Ireland and Ireland are not shown in this figure but were included in other analyses. Darker shading indicates where there was more Twitter activity for a particular area than average for that location, plotted as an exceedance probability. The probability of exceedance is a statistical metric describing the probability that a particular value will be met or exceeded (McMahan et al., 2013). In this example, this provides the likelihood of recording a given number of tweets about storms in this particular location, based on the frequency distribution of observed counts across the whole storm collection dataset. This provides geographical information on where the storm is being most discussed on Twitter and therefore an indication of which areas of the country are likely to be most affected by the storm. In this example for Storm Brian, more significant tweet activity can be seen in the west, south and southwest of England and Wales. It also shows how the spatial pattern of tweets changes over time during the period leading up to, during and after the storm. As anticipated, there is a peak of activity on the day of the storm, which quickly reduces in the days afterwards.
Once both relevance filtering and location inference were completed, the dataset was then prepared for Filtered for Relevance to Storm Brian All "Brian" Tweets -Unfiltered F I G U R E 2 "Brian" tweets: unfiltered (i.e. all tweets containing the word "brian") versus post filtering for tweets relevant to Storm Brian further analysis to determine information on social impact from the tweet data. All filtered tweets' text was used for sentiment and content analysis.

| Sentiment analysis
The "sentiment" of a tweet measures the net level of positive or negative emotion it expresses. In this case, following various studies that use sentiment analysis with tweets to examine collective mood related to weather conditions (Hannak et al., 2012;Caragea et al., 2014;Li et al., 2014;Baylis et al., 2018), sentiment analysis is used to infer the mood of Twitter users. By analysing the collective sentiment of tweets during the period of a storm event, the aim is to get an indication of the emotional impact of the storm.
Tweet text was analysed using the sentiment analysis package TextBlob (Loria, 2010). This Python package is a popular lexicon-based sentiment analysis tool well suited to the relatively short text strings found in tweets. In preliminary work, TextBlob was tested against another leading sentiment package, VADER (Hutto and Gilbert, 2014), which gave comparable results. Since there was no substantive difference, TextBlob was preferred for ease of use with this dataset.
The TextBlob package returns a sentiment polarity value between −1 and 1, where <0 implies negative sentiment and >0 implies positive sentiment. The value returned is based on a sentiment classifier trained on a large dataset of text relating to movie reviews tagged as positive or negative. The F I G U R E 3 Storm Brian tweets (after filtering for relevance) located in England/Wales and grouped by county for each day of the storm period. Storm Brian hit the United Kingdom on October 21, 2017. Shading indicates the exceedance probability for the number of tweets observed by county (i.e. the likelihood of that activity level accounting for prevalence of tweet activity in that particular location). Data shown in this visualization are restricted to England and Wales only, but data analysed in this study extend to Scotland and Ireland sentiment polarity score for each tweet is based on all words in the tweet text. Figure 4 provides examples of tweets with sentiment score calculated using TextBlob.

| Content analysis
Filtered tweets in the Storm Brian dataset at times of peak activity (October 20, 2017to October 22, 2017 were manually analysed and placed into one of seven categories based on their content. Only tweets containing original content (i.e. excluding retweets and quotes) were analysed for their content. Categories were determined after an initial inspection of a subsample of filtered tweets, using a similar approach to a study on the volume and content of Tweets associated with Hurricane Sandy (Lachlan et al., 2014). The categories used were: • Humour-Tweet contains a joke, sarcastic remark or light-hearted commentary on experience of the storm event; does not provide any information about any impact as a result of the storm.
F I G U R E 4 Example of the types of tweets included in each category with sentiment score calculated using the TextBlob package.
These are synthetic tweets rather than actual tweets, in order to protect user privacy • Damage-Tweet contains information about damage to persons or property. • Disruption-Tweet contains information about disruption to daily life, e.g. train delays, road closures, not able to go to work. • Observations-Tweet contains commentary on the weather occurring, e.g. "wind is very strong," "Storm Brian has arrived here in Balamory." • Warnings-Tweet contains information and advice about the forthcoming storm, or a warning about danger to persons or property due to the storm. • News-Tweet contains reference to a media report on the storm event. • Other-Tweet content relating to the storm that does not fit into the above categories. Figure 4 provides examples of the types of tweets used in each category.
Categorization of tweets was performed manually by two human coders after initial discussion and agreement of the coding scheme. In total 5,961 tweets relating to Storm Brian were manually categorized. A subsample of 100 randomly chosen tweets from the filtered tweet data was used for an inter-coder reliability check. Cohen's kappa (κ) was used to determine the agreement between the two coders' judgement on the category of each tweet in the subsample. There was near perfect agreement between the two coders with κ = 0.889, p < 0.0005. This provided confidence in the categorization coding scheme used.
Note that both text and pictures in tweets were used to assign a category, but not emojis as these were removed from the dataset to simplify text analysis processes.

| Combined time series plot
Tweet counts in the filtered datasets for wind and storm names were plotted over time ( Figure 5). The time period for each storm is also shown. Peaks in the volume of tweets coincide with the (UK Met Office recorded) date of impact of storms shown in Table 1. Peaks in the volume of wind tweets also coincide with peaks in the volume of storm name tweets.    Figure 5 also shows that there were peaks in tweets relating to wind events which occurred at a time when there was not a named storm event (indicated by 'Unnamed Wind Event(s)' in the figure). Of the 12 peaks in wind speed not attributed to a named storm event, manual inspection of the time series identifies that four of these peaks correspond to peaks in wind tweet volume while eight appear not to. This shows that there were wind-related events being talked about on Twitter at these times and could suggest that the weather was sufficiently windy to generate discussion on Twitter, but not enough for a named storm event. This shows that social media may have some success in detecting smaller wind events that are not named storms.
The storms which saw the greatest wind speed and impacts (Brian, Caroline, Eleanor, Emma) also appear to have larger volumes of tweets than the lesser known/less impactful storms (Dylan, Fionn, Georgina).

| Sentiment
To understand the emotional response to storm events during the period of the storm, the average sentiment by hour was plotted against the tweet volume over time ( Figure 6). For ex-Hurricane Ophelia there is a very clear drop in sentiment (i.e. tweets become less positive and even negative) during and following the peak of tweet activity, before rising again after the storm has passed.
The distribution of sentiment in filtered tweets is shown as a histogram of average hourly sentiment in each of the Twitter collections (Figure 7). Average sentiment of tweets in the United Kingdom during 2017 was shown in another study (using the same sentiment analysis methods) to be 0.13 ; this reference value is shown in Figure 7 for comparison. For each tweet collection the distribution of tweet sentiment peaks around an average sentiment score lower than the UK average sentiment. The tweet collection with the lowest average sentiment is the Storm Names collection, with the Wind and Precipitation collections showing relatively higher values, albeit still below the UK baseline. This suggests that wind and rain have an adverse effect on sentiment, with more extreme weather (storms) associated with more extreme low sentiment.

| Content analysis
For each storm, filtered named storm tweets in the day before, during and after each named storm event were manually reviewed and categorized. The results for Storm F I G U R E 6 Sentiment polarity score for "Ophelia" tweets versus tweet count-line graph shows tweet count; area graph shows sentiment polarity score, aggregated over 2 hr windows. The period of the storm is shown by the grey shaded bar. There is a clear trend in sentiment, which drops during the storm period and then rises following the storm; see also   Figure 8. Similar patterns were observed for other named storms (data not shown). There is a clear temporal trend to the types of content posted by Twitter users as the storm passes through. In early stages, warnings are prevalent, but these show a distinct drop in volume as the main effects of the storm begin to be felt (in the early hours of October 21, 2017). In contrast, tweets relating to observations of the weather occurring and reports of damage/disruption begin to increase as the storm passes through. News reports also increase in frequency in the day after the storm. The level of humour expressed throughout the storm period is somewhat more consistent, remaining around 25% of tweets. Tweets categorized as 'other' include tweets which cannot be categorized under any of the other headings, e.g. commentary on sports results, business advertising, very short tweets with no information. There appears to be no obvious trend in volumes of these tweets.
In terms of tweets providing information on social impacts of the storm, those tweets categorized as damage or disruption are likely to provide information on the specific impacts experienced by Twitter users. For the example of Storm Brian in Figure 8, 1,020 tweets were categorized as damage or disruption. This means that approximately 17% of filtered tweets for Storm Brian provide information on impacts ranging from damage to property, road closures and power outages.

| DISCUSSION
The widespread use of Twitter during extreme weather events, such as named storms in the United Kingdom and Ireland, has created an opportunity to use this rich data source to find useful information. In particular, it offers a potential "social sensing" mechanism by which observations of social impacts of extreme weather can be gathered and measurements which are not available from traditional meteorological observations. The demand for such information is evidenced by the recent rise in impact-led forecasting across the meteorological sciences.
-0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 This study presents an analysis of data collected from Twitter during the 2017/2018 storm season in the United Kingdom and Ireland. Various computational techniques were used to filter and extract only those tweets of relevance to wind, precipitation and named storm events. The volume of storm-related weather (wind/rain) tweets increases substantially during storm events. Tweets referring to named storms, after careful filtering to exclude irrelevant content, show clear spikes of activity corresponding to the storm event. Analysis of content shows systematic trends in both sentiment and topics expressed in tweets relating to storms. Sentiment analysis of tweet content showed clear and consistent emotional impacts of named storms. Average sentiment in weather-related tweets during a named storm event was much less positive than the expected baseline for "normal" Twitter activity. Consistent across multiple storms, collective sentiment was shown to fall significantly as the extreme weather associated with the storm begins to be experienced, before recovering after the storm passes. Furthermore, sentiment is consistently lower in tweets relating to storms than in tweets about wind or rain; however, sentiment for all these weather conditions is lower than the baseline expectation. While sentiment analysis is a crude measure of the psychological aspects of extreme weather, the strength and consistency of the results shown here suggest that these weather events have a substantive adverse impact on social wellbeing.
Categorization of filtered tweets based on their topic and/or content showed another consistent pattern in the type of information being posted on Twitter during the period of a named storm weather event. In the period leading up to a storm it was found that tweets were mainly giving warnings and information about potential impacts. During the storm, tweets contain information about how people are being affected by the storm, such as tweets on disruption and damage. After the storm, tweets continue to report observations and damage/disruption, but also begin to share links to news reports covering the storm. Surprisingly, the proportion of tweets categorized as "humour" remains quite consistently large throughout the period of a storm, with many tweets making light of the given name of each storm and sharing humorous comments about its impacts, rather than commenting directly on the weather. The patterns shown here suggest that further investigation of content might Tweets are categorized and plotted as a percentage of all tweets in that hour to account for the expected variation in tweet volumes over each 24 hr period. The number of tweets in that hour is also shown by the line graph allow robust measurements of damage and disruption associated with storm events, with some refinements to the method to control for noise and bias. Common sources of noise and bias in social media data include linguistic variation (e.g. regional dialect, slang), tangential content (e.g. tweets related to the storm but not its direct impact, i.e. humour, other) and tweets providing misleading or false information. This kind of impact measurement is hard to obtain by other methods and has clear value for validation of weather hazard impact models. Combined with the location inference method this could be developed to provide information on both how and where the biggest impacts as a result of the storm are experienced.
An interesting finding of this study is the existence of peaks of Twitter activity relating to wind and precipitation that are not related to named storm events. Inspection shows that these peaks reflect genuine discussion of weather conditions, showing high levels of public engagement and concern with weather, similar in some cases to those observed for named storms. This finding may have implications for the design of storm-naming systems and wider understanding of when public information should be issued by meteorological agencies.
There are a number of methodological caveats and limitations to this study. After filtering tweets for relevance to storm events, there were relatively small numbers of tweets retained in the data collections for some of the named storms. The relatively small size of the dataset in these cases makes it difficult to identify patterns in tweet discussion confidently.
With regard to sentiment analysis, the tool used in this study (TextBlob) has a predefined training corpus based on a dataset of movie reviews. Therefore, it is likely that there may be some uncertainty over the accuracy of some of the sentiment scores assigned to tweets in the storm dataset. To enhance the sentiment analysis of tweets relating to an extreme weather event, it is suggested that a bespoke training corpus based on example tweets from the filtered dataset in this study be created to identify positivity and negativity in tweets relating to the weather. This would provide more confidence in the relevance of the data being used for sentiment scoring.
Aside from improvements to the methods used here, future work might increase understanding of the power and scope of social sensing for weather hazard/impact monitoring by looking at content in different ways. An obvious extension to the work performed in this study is to go into further depth regarding the identification of particular kinds of hazard and/or impact, for example by separating travel disruption from damage to property from risks to health. Whether this approach can provide accurate quantification in terms of counting instances of particular impacts is an open research question. The results reported here suggest that clear patterns can be obtained at a reasonable level of granularity. An extension might consider validation of each tweet against the observed weather conditions for that date/time and grid square; this might allow epidemiological study of how different weather conditions (both chronic and episodic) affect behaviour and wellbeing, alongside the more straightforward opportunity to validate the accuracy of individual users as social sensors. Related to impactbased weather forecasting, the volume of activity generated by events categorized as red/amber/yellow might be analysed to study the match between severity judged by meteorological organizations and severity as reported by the general population.
What this study has shown is how social media can be used to provide another layer of information about the social impacts of extreme weather, both emotionally and physically, spatially and temporally, in a way that has not been available before. Being able to determine more specific information about social impacts not available in weather observation data means that impact-based warnings for the public can be tailored towards high impact events. It also provides a method of validation of information provided by meteorological agencies in weather warnings for the public.