A Hierarchical and Contextual Model for Aerial Image Parsing

Autor:	Jake Porway, Song-Chun Zhu, Qiongchen Wang
Jazyk:	angličtina
Předmět:	Computer science Artificial Intelligence (incl. Robotics) Bayesian inference 0211 other engineering and technologies ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Image processing 02 engineering and technology Scene-level context Pattern Recognition Hierarchical models Artificial Intelligence Aerial images 0202 electrical engineering electronic engineering information engineering Hierarchical control system Computer vision AdaBoost Aerial image 021101 geological & geomatics engineering business.industry Supervised learning Computer Imaging Vision Pattern Recognition and Graphics Swendsen-Wang clustering Statistical learning Image Processing and Computer Vision Image understanding Hallucinating Computer Science 020201 artificial intelligence & image processing Artificial intelligence Computer Vision and Pattern Recognition business Software Texture synthesis
Zdroj:	Porway, Jake; Wang, Qiongchen; & Zhu, Song Chun. (2010). A Hierarchical and Contextual Model for Aerial Image Parsing. International Journal of Computer Vision, 88(2), pp 254-283. doi: 10.1007/s11263-009-0306-1. Retrieved from: http://www.escholarship.org/uc/item/2t7919dw
ISSN:	0920-5691
DOI:	10.1007/s11263-009-0306-1
Popis:	In this paper we present a hierarchical and contextual model for aerial image understanding. Our model organizes objects (cars, roofs, roads, trees, parking lots) in aerial scenes into hierarchical groups whose appearances and configurations are determined by statistical constraints (e.g. relative position, relative scale, etc.). Our hierarchy is a non-recursive grammar for objects in aerial images comprised of layers of nodes that can each decompose into a number of different configurations. This allows us to generate and recognize a vast number of scenes with relatively few rules. We present a minimax entropy framework for learning the statistical constraints between objects and show that this learned context allows us to rule out unlikely scene configurations and hallucinate undetected objects during inference. A similar algorithm was proposed for texture synthesis (Zhu et al. in Int. J. Comput. Vis. 2:107---126, 1998) but didn't incorporate hierarchical information. We use a range of different bottom-up detectors (AdaBoost, TextonBoost, Compositional Boosting (Freund and Schapire in J. Comput. Syst. Sci. 55, 1997; Shotton et al. in Proceedings of the European Conference on Computer Vision, pp. 1---15, 2006; Wu et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1---8, 2007)) to propose locations of objects in new aerial images and employ a cluster sampling algorithm (C4 (Porway and Zhu, 2009)) to choose the subset of detections that best explains the image according to our learned prior model. The C4 algorithm can quickly and efficiently switch between alternate competing sub-solutions, for example whether an image patch is better explained by a parking lot with cars or by a building with vents. We also show that our model can predict the locations of objects our detectors missed. We conclude by presenting parsed aerial images and experimental results showing that our cluster sampling and top-down prediction algorithms use the learned contextual cues from our model to improve detection results over traditional bottom-up detectors alone.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::937053d71b352cd744f53111a62cf439 Zobrazit plný text záznamu Plný text ve formátu PDF