Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

Autor:	Iain J. Marshall, Jan-Willem van de Meent, Sarthak Jain, Byron C. Wallace, Edward Banner
Rok vydání:	2018
Předmět:	FOS: Computer and information sciences Computer Science - Computation and Language Computer science business.industry 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences Article Code (semiotics) Salient Similarity (psychology) 0202 electrical engineering electronic engineering information engineering Embedding 020201 artificial intelligence & image processing Artificial intelligence business computer Computation and Language (cs.CL) Natural language processing 0105 earth and related environmental sciences Interpretability
Zdroj:	EMNLP King's College London Proc Conf Empir Methods Nat Lang Process
DOI:	10.48550/arxiv.1804.07212
Popis:	We propose a method for learning disentangled representations of texts that code for distinct and complementary aspects, with the aim of affording efficient model transfer and interpretability. To induce disentangled embeddings, we propose an adversarial objective based on the (dis)similarity between triplets of documents with respect to specific aspects. Our motivating application is embedding biomedical abstracts describing clinical trials in a manner that disentangles the populations, interventions, and outcomes in a given trial. We show that our method learns representations that encode these clinically salient aspects, and that these can be effectively used to perform aspect-specific retrieval. We demonstrate that the approach generalizes beyond our motivating application in experiments on two multi-aspect review corpora. Comment: Accepted to EMNLP 2018
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6044d3d822a2e808dcddb4e1e98dfba4 Zobrazit plný text záznamu