Content-Aware Tweet Location Inference Using Quadtree Spatial Partitioning and Jaccard-Cosine Word Embedding
Autor: | Oluwaseun Ajao, Shahrzad Zargari, Deepayan Bhowmik |
---|---|
Rok vydání: | 2018 |
Předmět: |
Word embedding
Jaccard index Computer science Natural language processing Dimensionality reduction Feature vector Feature extraction Cosine similarity Inference 02 engineering and technology computer.software_genre Partitioning algorithms 020204 information systems 0202 electrical engineering electronic engineering information engineering Quadtree Inference algorithms 020201 artificial intelligence & image processing Data mining Safety computer |
Zdroj: | ASONAM |
DOI: | 10.1109/asonam.2018.8508257 |
Popis: | Inferring locations from user texts on social media platforms is a non-trivial and challenging problem relating to public safety. We propose a novel non-uniform grid-based approach for location inference from Twitter messages using Quadtree spatial partitions. The proposed algorithm uses natural language processing (NLP) for semantic understanding and incorporates Cosine similarity and Jaccard similarity measures for feature vector extraction and dimensionality reduction. We chose Twitter as our experimental social media platform due to its popularity and effectiveness for the dissemination of news and stories about recent events happening around the world. Our approach is the first of its kind to make location inference from tweets using Quadtree spatial partitions and NLP, in hybrid word-vector representations. The proposed algorithm achieved significant classification accuracy and outperformed state-of-the-art grid-based content-only location inference methods by up to 24% in correctly predicting tweet locations within a 161km radius and by 300km in median error distance on benchmark datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |