TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts
Autor: | John Lafferty, Michihiro Yasunaga |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Topic model Computer Science - Machine Learning Computer Science - Computation and Language business.industry Computer science Inference Machine Learning (stat.ML) Context (language use) General Medicine Extension (predicate logic) computer.software_genre Autoencoder Machine Learning (cs.LG) Computer Science - Information Retrieval Range (mathematics) Statistics - Machine Learning Artificial intelligence business Joint (audio engineering) Computation and Language (cs.CL) computer Information Retrieval (cs.IR) Word (computer architecture) Natural language processing |
Zdroj: | AAAI |
ISSN: | 2374-3468 2159-5399 |
Popis: | Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture of latent topics, and the equation is generated by an RNN that depends on the latent topic activations. To experiment with this model, we create a corpus of 400K equation-context pairs extracted from a range of scientific articles from arXiv, and fit the model using a variational autoencoder approach. Experimental results show that this joint model significantly outperforms existing topic models and equation models for scientific texts. Moreover, we qualitatively show that the model effectively captures the relationship between topics and mathematics, enabling novel applications such as topic-aware equation generation, equation topic inference, and topic-aware alignment of mathematical symbols and words. AAAI 2019 |
Databáze: | OpenAIRE |
Externí odkaz: |