Paraphrase Identification Based on Weighted URAE, Unit Similarity and Context Correlation Feature

Autor:	Gongshen Liu, Huanrong Sun, Jie Zhou
Rok vydání:	2018
Předmět:	Phrase Semantic feature business.industry Computer science Deep learning Context (language use) 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences Paraphrase Similarity (network science) Feature (computer vision) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Sentence Natural language processing 0105 earth and related environmental sciences
Zdroj:	Natural Language Processing and Chinese Computing ISBN: 9783319995007 NLPCC (2)
DOI:	10.1007/978-3-319-99501-4_4
Popis:	A deep learning model adaptive to both sentence-level and article-level paraphrase identification is proposed in this paper. It consists of pairwise unit similarity feature and semantic context correlation feature. In this model, sentences are represented by word and phrase embedding while articles are represented by sentence embedding. Those phrase and sentence embedding are learned from parse trees through Weighted Unfolding Recursive Autoencoders (WURAE), an unsupervised learning algorithm. Then, unit similarity matrix is calculated by matching the pairwise lists of embedding. It is used to extract the pairwise unit similarity feature through CNN and k-max pooling layers. In addition, semantic context correlation feature is taken into account, which is captured by the combination of CNN and LSTM. CNN layers learn collocation information between adjacent units while LSTM extracts the long-term dependency feature of the text based on the output of CNN. This model is experimented on a famous English sentence paraphrase corpus, MSRPC, and a Chinese article paraphrase corpus. The results show that the deep semantic feature of text could be extracted based on WURAE, unit similarity and context correlation feature. We release our code of WURAE, deep learning model for paraphrase identification and pre-trained phrase end sentence embedding data for use by the community.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f9c106a7d7a80d2bb92a7dad04a83c16 https://doi.org/10.1007/978-3-319-99501-4_4 Zobrazit plný text záznamu