Multiple instance learning: A survey of problem characteristics and applications

Autor: Ghyslain Gagnon, Marc-André Carbonneau, Veronika Cheplygina, Eric Granger
Přispěvatelé: Medical Informatics, Medical Image Analysis
Rok vydání: 2018
Předmět:
FOS: Computer and information sciences
Computer Science - Artificial Intelligence
Computer science
Drug activity prediction
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
02 engineering and technology
computer.software_genre
Machine learning
Computer Science - Information Retrieval
Document classification
Artificial Intelligence
020204 information systems
0202 electrical engineering
electronic engineering
information engineering

Leverage (statistics)
Instance-based learning
business.industry
Weakly supervised learning
Multiple instance learning
Supervised learning
Classification
Multi-instance learning
Computer aided diagnosis
machine learning
Artificial Intelligence (cs.AI)
Signal Processing
Labeled data
020201 artificial intelligence & image processing
Computer vision
Computer Vision and Pattern Recognition
Artificial intelligence
business
computer
Software
Information Retrieval (cs.IR)
Zdroj: Pattern Recognition, 77, 329-353. Elsevier Ltd.
Pattern Recognition, 77, 329-353. Elsevier
ISSN: 0031-3203
Popis: Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research. Code is available on-line at https://github.com/macarbonneau/MILSurvey.
Databáze: OpenAIRE