A Hotspot Information Extraction Hybrid Solution of Online Posts’ Textual Data
Autor: | Songyao Lian, Hui-Ru Cao, Choujun Zhan, Xiaomin Li |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Information retrieval
Article Subject Social network Computer science business.industry Big data 020207 software engineering 02 engineering and technology computer.software_genre Public opinion Popularity Computer Science Applications Information extraction QA76.75-76.765 0202 electrical engineering electronic engineering information engineering Key (cryptography) 020201 artificial intelligence & image processing Social media Computer software business Cluster analysis computer Software |
Zdroj: | Scientific Programming, Vol 2021 (2021) |
ISSN: | 1058-9244 |
Popis: | Online posts have gradually become a major carrier of network public opinion in social media, and the social network hotspots are the important basis for the study of network public opinion. Therefore, it is significant to extract hotspots for monitoring Internet public opinion from online posts textual big data. However, the current hotspot extraction methods are focused on the users’ features that are based on textual big data with spam and low-quality content. Meanwhile, these methods seldomly consider the time span of posts and the popularity of users. Accordingly, this article presents a hotspots information extraction hybrid solution of online posts’ textual data. Firstly, a filtering strategy to obtain more high-quality textual data is designed. Secondly, the topic hot degree is presented by considering the average number of replies and the popularity of the participant. Thirdly, an improved co-word analysis technology is used to search the same topic posts and Bisecting k-means clustering algorithm using repliers’ popularity and key posts are designed for studying and monitoring the hotspots of online posts in a valid big data environment. Finally, the proposed algorithms are verified in experiments by extracting the hotspots of online posts from the dataset. The results show that the data filtering strategy can help to obtain more valuable information and decrease the computing time. The results also demonstrate that the proposed solution can help to obtain hotspots comparing the traditional methods, and the hot degree can reflect the trend of the online post by comparing the traditional methods. |
Databáze: | OpenAIRE |
Externí odkaz: |