Learning from positive and unlabeled data: a survey

Autor: Bekker, Jessa, Davis, Jesse
Rok vydání: 2018
Předmět:
Zdroj: Machine Learning (2020) 1-42
Druh dokumentu: Working Paper
DOI: 10.1007/s10994-020-05877-5
Popis: Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.
Comment: There was a typo in section 2.4. The fraction of labeled examples in the single-training-set scenario should be \alpha c, and not \alpha e(x) as was written in the previous version
Databáze: arXiv