Toward Tweet-Mining Framework for Extracting Terrorist Attack-Related Information and Reporting
Autor: | Rabia Batool, Ahmed Abbasi, Saiqa Aleem, Benjamin C. M. Fung, Farkhund Iqbal, Abdul Rehman Javed |
---|---|
Rok vydání: | 2021 |
Předmět: |
Information retrieval
Word embedding General Computer Science Computer science General Engineering Information Dissemination word embedding computer.software_genre Popularity Sequence labeling TK1-9971 Identification (information) Information extraction word mover’s distance Recurrent neural network Terrorist attacks Vector space model news recurrent neural network General Materials Science information extraction Electrical engineering. Electronics. Nuclear engineering computer |
Zdroj: | IEEE Access, Vol 9, Pp 115535-115547 (2021) |
ISSN: | 2169-3536 |
DOI: | 10.1109/access.2021.3102040 |
Popis: | The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including acronyms, misspelled words, synonyms, transliteration, and ambiguous terms. This makes incident-related information extraction a non-trivial task. However, this information can be valuable for public safety organizations that need to respond in an emergency. This paper proposes an early event-related information extraction and reporting framework that monitors Twitter streams synthesizes event-specific information, e.g., a terrorist attack, and alerts law enforcement, emergency services, and media outlets. Specifically, the proposed framework, Tweet-to-Act (T2A), employs word embedding to transform tweets into a vector space model and then utilizes the Word Mover’s Distance (WMD) to cluster tweets for the identification of incidents. To extract reliable and valuable information from a large dataset of short and informal tweets, the proposed framework employs sequence labeling with bidirectional Long Short-Term Memory based Recurrent Neural Networks (bLSTM-RNN). Extensive experimental results suggest that our proposed framework, T2A, outperforms other state-of-the-art methods that use vector space modeling and distance calculation techniques, e.g., Euclidean and Cosine distance. T2A achieves an accuracy of 96% and an F1-score of 86.2% on real-life datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |