On the study of deep learning active vision systems

Autor: Ozimek, Piotr Aleksander
Rok vydání: 2022
DOI: 10.5525/gla.thesis.83427
Popis: This thesis presents a series of investigations into various active vision algorithms. An experimental method for evaluating active vision memory is proposed and used to demonstrate the benefits of a novel memory variant called the WW-LSTM network. A method for training active vision attention using classification gradients is proposed and a proof of concept of an attentional spotlight algorithm is demonstrated to convert spatially arranged gradients into coordinate space. The thesis makes a number of empirically supported recommendations as to the structure of future active vision architectures. Chapter 1 discusses the motivation behind pursuing active vision and lists the objectives set out in this thesis. The chapter contains the thesis statement, a brief overview of the relevant background and a list of the main contributions of this thesis to the literature. Chapter 2 describes an investigation into the utility of the software retina algorithm within the active vision paradigm. It discusses the initial research approach and motivations behind studying the retina, as well as the results that prompted a shift in the focus of this thesis away from the retina and onto active vision. The retina was found to slow down training to an infeasible pace, and in a latter experiment it was found to perform worse than a simple image cropping algorithm on an image classification task. Chapter 3 contains a comprehensive and empirically supported literature review highlighting a number of issues and knowledge gaps present within the relevant active vision literature. The review found the literature to be incoherent due to inconsistent terminology and due to the pursuit of disjointed approaches that do not reinforce each other. The literature was also found to contain a large number of pressing knowledge gaps, some of which were demonstrated experimentally. The literature review is accompanied by the proposal of an investigative framework devised to address the identified problems in the literature by structuring future active vision research. Chapter 4 investigated the means by which an active vision systems can collate the information they obtain across multiple observations. This aspect of active vision is referred to as memory. An experimental method for evaluating active vision memory in an interpretable manner is devised and applied to the study of a novel approach to recurrent memory called the WW-LSTM. The WW-LSTM is a parameter-efficient variant of the LSTM network that outperformed all other recurrent memory variants that were evaluated on an image classification task. Additionally, spatial concatenation in the input space was found to outperform all recurrent memory variants, calling into question a commonly employed approach in the active vision literature. Chapter 5 contains an investigation into active vision attention, which is the means by which the system decides where to look. Investigations contained therein demonstrate the benefits of employing a curriculum for training attention that modifies sensor parameters, and present an empirically backed argument in favour of implementing attention in a separate processing stream from classification. The chapter closes with a proposal of a novel method for leveraging classification gradients in training attention; the method is called predictive attention, and a first step in its pursuit is taken with a proof of concept demonstration of the hardcoded attention spotlight algorithm. The spotlight is demonstrated to facilitate the localisation of a hotspot in a modelled feature map via an optimisation process. Chapter 6 concludes this thesis by re-stating its objectives and summarizing its key contributions. It closes with a discussion of recommended future work that can further advance our understanding of active vision in deep learning.
Databáze: OpenAIRE