Variational Autoencoder with Embedded Student-$t$ Mixture Model for Authorship Attribution
Autor: | Benedikt Boenninghoff, Dorothea Kolossa, Robert M. Nickel, Steffen Zeiler |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Computer science 0211 other engineering and technologies Machine Learning (stat.ML) 02 engineering and technology 010501 environmental sciences Space (commercial competition) Machine learning computer.software_genre 01 natural sciences Machine Learning (cs.LG) Task (project management) Set (abstract data type) Statistics - Machine Learning Finite set 0105 earth and related environmental sciences 021110 strategic defence & security studies Computer Science - Computation and Language business.industry Probabilistic logic Mixture model Autoencoder Probability distribution Artificial intelligence business Computation and Language (cs.CL) computer |
Zdroj: | COLING |
Popis: | Traditional computational authorship attribution describes a classification task in a closed-set scenario. Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts. In this work, we propose a probabilistic autoencoding framework to deal with this supervised classification task. More precisely, we are extending a variational autoencoder (VAE) with embedded Gaussian mixture model to a Student-$t$ mixture model. Autoencoders have had tremendous success in learning latent representations. However, existing VAEs are currently still bound by limitations imposed by the assumed Gaussianity of the underlying probability distributions in the latent space. In this work, we are extending the Gaussian model for the VAE to a Student-$t$ model, which allows for an independent control of the "heaviness" of the respective tails of the implied probability densities. Experiments over an Amazon review dataset indicate superior performance of the proposed method. Preprint |
Databáze: | OpenAIRE |
Externí odkaz: |