Generic and optimized framework for multi-content analysis based on learning approaches

Autor:	Cédric Marchessoux, Tom Kimpe, Quentin Besnehard
Rok vydání:	2010
Předmět:	Ground truth Computer science business.industry Supervised learning Decision tree Feature selection computer.software_genre Machine learning Metric (mathematics) AdaBoost Data mining Artificial intelligence business computer
Zdroj:	SPIE Proceedings.
ISSN:	0277-786X
DOI:	10.1117/12.838616
Popis:	During the European Cantata project (ITEA project, 2006-2009), a Multi-Content Analysis framework for the classification of compound images in various categories (text, graphical user interface, medical images, other complex images) was developed within Barco. The framework consists of six parts: a dataset, a feature selection method, a machine learning based Multi-Content Analysis (MCA) algorithm, a Ground Truth, an evaluation module based on metrics and a presentation module. This methodology was built on a cascade of decision tree-based classifiers combined and trained with the AdaBoost meta-algorithm. In order to be able to train these classifiers on large training datasets without excessively increasing the training time, various optimizations were implemented. These optimizations were performed at two levels: the methodology itself (feature selection / elimination, dataset pre-computation) and the decision-tree training algorithm (binary threshold search, dataset presorting and alternate splitting algorithm). These optimizations have little or no negative impact on the classification performance of the resulting classifiers. As a result, the training time of the classifiers was significantly reduced, mainly because the optimized decision-tree training algorithm has a lower algorithmic complexity. The time saved through this optimized methodology was used to compare the results of a greater number of different training parameters.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::9bc2bb31e670643ee534119421e7845b https://doi.org/10.1117/12.838616 Zobrazit plný text záznamu