Ranking by Multitask Learning

Autor: Žagar, Lan
Přispěvatelé: Zupan, Blaž
Rok vydání: 2014
Předmět:
Popis: Instance ranking is a subfield of supervised machine learning and is concerned with inferring predictive models that can rank a set of data instances. We focus on multipartite ranking, where instances belong to one of a limited set of rank classes, study different approaches on synthetic and real data sets, and propose a ranking-specific evaluation framework and a new learning approach that combines multitask learning and binary decomposition. The thesis starts with an analysis of existing ranking approaches. These are used in a practical application of ranking within the domain of molecular biology. In particular, we study embryonic stem cell differentiation posed as a multipartite ranking problem. We critically evaluate several ranking approaches and demonstrate how they can be used to construct accurate predictive models that can rank samples based on their stage of differentiation. For testing, we introduce a framework for evaluation of ranking methods including a generalization of the popular performance measure AUC (area under the ROC curve) that can be used for multipartite problems. We proceed by analysing the family of methods based on binary decomposition, which reduces a ranking problem to a set of binary classification tasks. To improve the process of learning models for these tasks, we suggest to combine it with a multitask learning technique. Specifically, we propose a new ranking method, which we name BDMT, that combines one-against-one decomposition and multitask feature learning. The decomposition allows us to model nonlinear patterns and to simplify the learning domain to problems that are suitable for classical machine learning approaches. Multitask learning allows us to generalize across the decomposed tasks, making them interdependent through a joint regularized optimization. Our experiments show that the addition of multitask learning can greatly improve the performance of one-against-one decomposition and even succeed in outperforming state-of-the-art ranking approaches in certain settings. Learning the models of decomposed tasks simultaneously, increases the stability of model estimation and reduces the sensitivity to perturbations of the training data set. Compared with other ranking methods that are also able to model complex patterns, BDMT remains efficient and can achieve low training times with the use of fast linear base learners. We also show how the method and the patterns it learns can be interpreted. New features learned during the training of BDMT can reveal important hidden factors and hence map the problem into a low-dimensional subspace spanned by a set of novel features. Individual models of decomposed tasks and the similarities between them can be studied to further elucidate the distribution of rank classes in this subspace.
Databáze: OpenAIRE