A deep semantic search method for random tweets
Autor: | Mark Liptrott, Isa Inuwa-Dutse, Ioannis Korkontzelos |
---|---|
Rok vydání: | 2019 |
Předmět: |
Information retrieval
Computer Networks and Communications Computer science business.industry Communication Deep learning Semantic search 020206 networking & telecommunications 02 engineering and technology Duplicate content Convolutional neural network Scalability 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Social media Artificial intelligence Cluster analysis business Information Systems Linear search |
Zdroj: | Online Social Networks and Media |
ISSN: | 2468-6964 |
DOI: | 10.1016/j.osnem.2019.07.002 |
Popis: | Contemporary social media platforms enable users to act as both producers and consumers of content, leading to the generation of enormous amounts of data. While this ability is empowering, it is also posing many challenges concerning efficient searches for relevant information. Many search approaches have been proposed in the literature. However, searching for information on Twitter is particularly challenging due to both the inconsistency in writing styles and the high generation rate of spurious and duplicate content. The quest for instant and efficient data processing to retrieve relevant information renders many existing techniques ineffective when applied to Twitter. We present a multilevel approach based on state-of-the-art deep learning methods and a novel scalable windowing approach for pairwise-similarity search (SWAPS) to improve search efficiency. SWAPS optimises searches using a strategic balancing criterion to assess the trade-off between accuracy and search speed, thereby circumnavigating sequential search problems. Moreover, we propose a deep search strategy that establishes a relationship between the status of a tweet and its longevity measured in terms of engagement lifespan since posting. Deep search utilises a convolutional neural network for textual n-grams features extraction and meta-features from the tweet to train a fully connected network on a vast number of tweets. This approach differs from existing ones by recognising the relationship between the status of a tweet and its engagement lifespan to ensure a better understanding of the compositional semantics in tweets. The results highlight interesting symmetrical properties with respect to similarity distribution and duration. We evaluate our approach on various benchmark datasets and demonstrate the efficacy and applicability of the method. Problems of event detection, clustering and ads, among others, can utilise this approach to detect items of interest effectively. |
Databáze: | OpenAIRE |
Externí odkaz: |