Factorization machines and deep views-based co-training for improving answer quality prediction in online health expert question-answering services
Autor: | Z.L. Hu, Zhan Zhang, Decheng Zuo, Rong Zhu, Haiqin Yang |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Exploit Computer science Semantic space Health Informatics 02 engineering and technology Convolutional neural network Machine Learning 03 medical and health sciences Factorization Predictive Value of Tests 0202 electrical engineering electronic engineering information engineering Question answering Humans Internet Co-training Information retrieval Communication Telemedicine Semantics Computer Science Applications 030104 developmental biology Labeled data 020201 artificial intelligence & image processing Neural Networks Computer Classifier (UML) Algorithms Software |
Zdroj: | Journal of Biomedical Informatics. 87:21-36 |
ISSN: | 1532-0464 |
DOI: | 10.1016/j.jbi.2018.09.011 |
Popis: | In online health expert question-answering (HQA) services, it is significant to automatically determine the quality of the answers. There are two prominent challenges in this task. First, the answers are usually written in short text, which makes it difficult to absorb the text semantic information. Second, it usually lacks sufficient labeled data but contains a huge amount of unlabeled data. To tackle these challenges, we propose a novel deep co-training framework based on factorization machines (FM) and deep textual views to intelligently and automatically identify the quality of HQA systems. More specifically, we exploit additional domain-specific semantic information from domain-specific word embeddings to expand the semantic space of short text and apply FM to excavate the non-independent interaction relationships among diverse features within individual views for improving the performance of the base classifier via co-training. Our learned deep textual views, the convolutional neural networks (CNN) view which focuses on extracting local features using convolution filters to locally model short text and the dependency-sensitive convolutional neural networks (DSCNN) view which focuses on capturing long-distance dependency information within the text to globally model short text, can then overcome the challenge of feature sparseness in the short text answers from the doctors. The developed co-training framework can effectively mine the highly non-linear semantic information embedded in the unlabeled data and expose the highly non-linear relationships between different views, which minimizes the labeling effort. Finally, we conduct extensive empirical evaluations and demonstrate that our proposed method can significantly improve the predictive performance of the answer quality in the context of HQA services. |
Databáze: | OpenAIRE |
Externí odkaz: |