TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Autor: Harry Hochheiser, Venkatesh Sivaraman, Adam Perer, Denis Newman-Griffis, Eric Fosler-Lussier
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: NAACL-HLT (Demonstrations)
Proc Conf
Popis: Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence is available from https://github.com/drgriffis/text-essence.
Accepted as a Systems Demonstration at NAACL-HLT 2021. Video demonstration at https://youtu.be/1xEEfsMwL0k
Databáze: OpenAIRE