TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora
Autor: | Harry Hochheiser, Venkatesh Sivaraman, Adam Perer, Denis Newman-Griffis, Eric Fosler-Lussier |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Coronavirus disease 2019 (COVID-19) Interface (Java) Computer science Computer Science - Artificial Intelligence Computer Science - Human-Computer Interaction Scientific literature computer.software_genre Article Human-Computer Interaction (cs.HC) 03 medical and health sciences 0302 clinical medicine 030304 developmental biology 0303 health sciences Corpus analysis Measure (data warehouse) Computer Science - Computation and Language business.industry Interactive analysis Study Characteristics Artificial Intelligence (cs.AI) Embedding Artificial intelligence business Computation and Language (cs.CL) computer 030217 neurology & neurosurgery Natural language processing |
Zdroj: | NAACL-HLT (Demonstrations) Proc Conf |
Popis: | Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence is available from https://github.com/drgriffis/text-essence. Accepted as a Systems Demonstration at NAACL-HLT 2021. Video demonstration at https://youtu.be/1xEEfsMwL0k |
Databáze: | OpenAIRE |
Externí odkaz: |