Learning from Crowds via Joint Probabilistic Matrix Factorization and Clustering in Latent Space

Autor: Wuguannan Yao, Wonjung Lee, Junhui Wang
Rok vydání: 2021
Předmět:
Zdroj: Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track ISBN: 9783030676667
ECML/PKDD (4)
Popis: Learning from noisy labels is getting trendy in the era of big data. However, in crowdsourcing practice, it is still a challenging task to extract ground truth labels from noisy labels obtained from crowds. In this paper, we propose a latent variable model built on probabilistic logistic matrix factorization model and classical Gaussian mixture model for inferring ground truth labels from noisy, crowdsourced ones. The proposed model incorporates item heterogeneity in contrast to previous works and allows for vector space embeddings of both items and worker labels. Moreover, we derive a tractable mean-field variational inference algorithm to approximate the model posterior. Meanwhile, related MAP approximation problem to the model posterior is also investigated to identify links to existing works. Empirically, we demonstrate that the proposed method achieves good inference accuracy while preserving meaningful uncertainty measures in the embeddings, and therefore better reflects the intrinsic structure of data.
Databáze: OpenAIRE