Popis: |
The generally unsupervised nature of autoencoder models implies that the main training metric is formulated as the error between input images and their corresponding reconstructions. Different reconstruction loss variations and latent space regularization have been shown to improve model performances depending on the tasks to solve and to induce new desirable properties like disentanglement. Nevertheless, measuring the success in, or enforcing properties by, the input pixel space is a challenging endeavor. In this work, we want to make more efficient use of the available data and provide design choices to be considered in the recording or generation of future datasets to implicitly induce desirable properties during training. To this end, we propose a new sampling technique which matches semantically important parts of the image while randomizing the other parts, leading to salient feature extraction and a neglection of unimportant details. Further, we propose to recursively apply a previously trained autoencoder model, which can then be interpreted as a dynamical system with desirable properties for generalization and uncertainty estimation. The proposed methods can be combined with any existing reconstruction loss. We give a detailed analysis of the resulting properties on various datasets and show improvements on several computer vision tasks: image and illumination normalization, invariances, synthetic to real generalization, uncertainty estimation and improved classification accuracy by means of simple classifiers in the latent space. These investigations are adopted in the automotive application of vehicle interior rear seat occupant classification. For the latter, we release a synthetic dataset with several fine-grained extensions such that all the aforementioned topics can be investigated in isolation, or together, in a single application environment. We provide quantitative evidence that machine learning, and in particular deep learning methods cannot readily be used in industrial applications when only a limited amount of variation is available for training. The latter can, however, often be the case because of constraints enforced by the application to be considered and financial limitations. |