Robust and Efficient Deep Visual Learning

Autor:	Prokudin, Sergey
Přispěvatelé:	Gehler, Peter Vincent (Dr.)
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	machine learning computer graphics Deep learning Maschinelles Lernen Dimension 3 Avatar computer vision
Popis:	The past decade was marked by significant progress in the field of artificial intelligence and statistical learning. However, the most impressive of modern models come in the form of computationally expensive black boxes, with the majority of them lacking the ability to reason about the confidence of their predictions robustly. Being capable of quantifying model uncertainty and recognizing failure scenarios is crucial when it comes to incorporating them into complex decision-making pipelines, e.g. autonomous driving or medical image analysis systems. It is also important to maintain a low computational cost of these models. In the present thesis, the aforementioned desired properties of robustness and efficiency of deep learning models are studied and developed in the three specific realms of computer vision. First, we investigate deep probabilistic models that allow uncertainty quantification, i.e. the models that "know what they do not know". Here, we propose a novel model for the task of angular regression that allows probabilistic object pose estimation from 2D images. We also showcase how the general deep density estimation paradigm can be adapted and utilized in two other real-world applications, ball trajectory prediction and brain imaging. Next, we turn to the field of 3D shape analysis and rendering. We propose a method for efficient encoding of 3D point clouds, the type of data that is hard to handle with conventional learning algorithms due to its unordered nature. We show that simple neural networks that use the developed encoding as input can match the performance of state-of-the-art methods on various point cloud processing tasks while using orders of magnitude less floating-point operations. Finally, we explore the emerging field of neural rendering and develop the framework that connects classic deformable 3D body models with modern image-to-image translation neural networks. This combination allows efficient photorealistic human avatar rendering in a controlled manner, with the possibility to control the camera flexibly and to change the body pose and shape appearance. The thesis concludes with the discussion of the presented methods, including current limitations and future research directions.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=od_______707::5a5725aff97c5f849a00a53fc1a26b1c https://hdl.handle.net/10900/110708 Zobrazit plný text záznamu