Efficient online extraction of keywords for localized events in twitter
Autor: | Hamed Abdelhaq, Ayser Armiti, Michael Gertz |
---|---|
Rok vydání: | 2016 |
Předmět: |
Situation awareness
Event (computing) Spatial database Geography Planning and Development 02 engineering and technology Recommender system computer.software_genre Identification (information) Geography 020204 information systems Sliding window protocol Scalability 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Social media Data mining computer Information Systems |
Zdroj: | GeoInformatica. 21:365-388 |
ISSN: | 1573-7624 1384-6175 |
DOI: | 10.1007/s10707-016-0258-x |
Popis: | Messages published via social media sites, such as Twitter, Facebook, and Foursquare hide a considerable amount of information about real world events. The timely identification of such events from this huge, unstructured, and noisy user-generated content plays an important role in increasing situation awareness and in supporting useful applications such as recommendation systems. Interestingly, a large number of these messages are enriched with location information, due to the recent advancements of today's location acquisition techniques. This, in turn, enables location-aware event mining, i.e., the detection and tracking of localized events such as sport events, demonstrations, or traffic jams, to name but a few. The main building blocks of a localized event are local keywords that exhibit a surge in usage at the event location. In this paper, we propose an approach that aims at extracting local keywords from a stream of Twitter messages by (1) identifying local keywords, and (2) estimating the central location of each keyword. This extraction procedure is performed in an online fashion using a sliding window over the Twitter stream. Additionally, we address the problem of spatial outliers that adversely affect a sound identification of local keywords. Spatial outliers occur when people far away from the location of an event use related keywords in their Tweets. We handle this problem by adjusting the spatial distribution of keywords based on their co-occurrence with place names that may refer to the location of an event. To ensure scalability, we utilize a hierarchical spatial index to gradually prune the geographic space and thus to efficiently perform complex spatial computations. Extensive comparative experiments are conducted using Twitter data. The analysis of the experimental results demonstrates the superiority of our approach over existing methods in terms of efficiency and precision of the obtained results. |
Databáze: | OpenAIRE |
Externí odkaz: |