Zdroj: |
Klementiev, A, Roth, D, Small, K & Titov, I 2009, Unsupervised Prediction Aggregation . in NIPS 2009, Workshop on Learning with Orderings . |
Popis: |
Consider the scenario where votes from multiple experts utilizing different data modalities or modeling assumptions are available for a given prediction task. The task of combining these signals with the goal of obtaining a better prediction is ubiquitous in Information Retrieval (IR), Natural Language Processing (NLP) and many other areas. In IR, for instance, meta-search aims to combine the outputs of multiple search engines to produce a better ranking. In NLP, aggregation of the outputs of computer systems generating natural language translations (Rosti et al., 07), syntactic dependency parses (Sagae and Lavie, 06), identifying intended meanings of words (Brody et al, 06), and others has received considerable recent attention. Most existing learning approaches to aggregation address the supervised setting. However, for complex prediction tasks such as these, data annotation is a very labor intensive and time consuming process. In this line of work, we first derive a mathematical and algorithmic framework for learning to combine predictions from multiple signals without supervision. In particular, we use the extended Mallows formalism for modeling aggregation, and derive an unsupervised learning procedure for estimating the model parameters. While direct application of the learning framework can be computationally expensive in general, we propose alternatives to keep learning and inference tractable. The intuition behind our approach is that the agreement between signals can serve to estimate their relative quality, which can in turn be used to induce aggregation. Indeed, higher quality signals are better at generating labels close (defined in terms of a distance function) to correct prediction and thus will tend to agree with one another, whereas the poor ones will not. The key assumption we make is that predictions induced by signals are conditionally independent given the true prediction. We demonstrate the effectiveness of our framework on the tasks of aggregating permutations and aggregating top-k lists. |