ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding
Autor: | Nicholas Diakopoulos, Seungyeon Kim, Niklas Elmqvist, Jaegul Choo, Jurim Lee, Deokgun Park |
---|---|
Rok vydání: | 2017 |
Předmět: |
Visual analytics
Word embedding Noisy text analytics Databases Factual Computer science 02 engineering and technology computer.software_genre Lexicon Semantics Text mining 0202 electrical engineering electronic engineering information engineering Semantic analytics Computer Graphics Image Processing Computer-Assisted Cluster Analysis Data Mining Polysemy Cultural analytics Information retrieval business.industry Sentiment analysis 020207 software engineering Computer Graphics and Computer-Aided Design Automatic summarization Analytics Signal Processing 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Artificial intelligence business computer Software Natural language Natural language processing Algorithms |
Zdroj: | IEEE transactions on visualization and computer graphics. 24(1) |
ISSN: | 1941-0506 |
Popis: | Central to many text analysis methods is the notion of a concept : a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building a concept from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of natural language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides a user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts, we introduce a bipolar concept model and support for specifying irrelevant words. We validate the interactive lexicon building interface by a user study and expert reviews. Quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones. |
Databáze: | OpenAIRE |
Externí odkaz: |