Robust multiple-instance learning ensembles using random subspace instance selection
Autor: | Marc-André Carbonneau, Ghyslain Gagnon, Eric Granger, Alexandre J. Raymond |
---|---|
Rok vydání: | 2016 |
Předmět: |
Training set
Computer science business.industry Probabilistic logic 020206 networking & telecommunications Pattern recognition 02 engineering and technology Linear subspace ComputingMethodologies_PATTERNRECOGNITION Artificial Intelligence Robustness (computer science) Signal Processing 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Instance selection Artificial intelligence business Software Subspace topology |
Zdroj: | Pattern Recognition. 58:83-99 |
ISSN: | 0031-3203 |
DOI: | 10.1016/j.patcog.2016.03.035 |
Popis: | Many real-world pattern recognition problems can be modeled using multiple-instance learning (MIL), where instances are grouped into bags, and each bag is assigned a label. State-of-the-art MIL methods provide a high level of performance when strong assumptions are made regarding the underlying data distributions, and the proportion of positive to negative instances in positive bags. In this paper, a new method called Random Subspace Instance Selection (RSIS) is proposed for the robust design of MIL ensembles without any prior assumptions on the data structure and the proportion of instances in bags. First, instance selection probabilities are computed based on training data clustered in random subspaces. A pool of classifiers is then generated using the training subsets created with these selection probabilities. By using RSIS, MIL ensembles are more robust to many data distributions and noise, and are not adversely affected by the proportion of positive instances in positive bags because training instances are repeatedly selected in a probabilistic manner. Moreover, RSIS also allows the identification of positive instances on an individual basis, as required in many practical applications. Results obtained with several real-world and synthetic databases show the robustness of MIL ensembles designed with the proposed RSIS method over a range of witness rates, noisy features and data distributions compared to reference methods in the literature. HighlightsA new method, Random Subspace Instance Selection, is proposed to design MIL ensembles.The method yields ensembles that are robust to variations of witness rate, data distributions and noise.The method yields state-of-the-art results on several benchmark data sets. |
Databáze: | OpenAIRE |
Externí odkaz: |