Popis: |
The rise of artificial intelligence has significantly impacted the field of computer vision. In particular, deep learning has advanced the development of algorithms that comprehend visual data, and that can infer information about the environment, i.e., to mimic human vision. Among the wide variety of visual algorithms, in this thesis, we study and devise generative deep-learning models that enable image-to-image translation tasks, including style transfer and attribute manipulation. Such editing capacity might come in handy in those scenarios where additional data that contains certain properties is required, but is not available a priori, or it is quite restricted. Over the last years, we have seen how data has become the new gold in many domains, as it has for deep-learning approaches. Indeed, the main Achilles' heel of these models is the ridiculous amount of labelled information that they crave. Therefore, we start this work by presenting a few-shot learning system that exploits alternative forms of supervision, successfully completing translation tasks with a very limited amount of samples. In this way, we open the door to less data-demanding image-to-image systems. A second focus of this thesis is the exploration and analysis of novel end-to-end models that incorporate inpainting modules to further improve their editing abilities. To that end, we assess different architectures and loss terms, together with semantic manipulations (label information) as well as with geometry manipulations (mask information), as input signal controls. The experimental evaluation of these scenarios allow us to gain insight into the role that the aforementioned elements might play when applying style and attribute modifications. Furthermore, we conduct a frequency spectrum analysis for both forged (deepfake) and generated images, paying attention to our image-to-image context as well. From this, we derive and discuss the effects that up-convolutional units might have on the final outcomes, such as artefacts in the high-frequency band. Last but not least, we present an image-to-image transformation system for a real-world application: identification of seismic events, such as diffraction and faults. The goal here is to combine two academic disciplines, i.e., computer vision and geophysics, into one project, drawing and integrating their knowledge to solve a given seismic problem. |