Abstrakt: |
In this paper, we address the problem of spatially-varying illumination-aware indoor harmonization. Existing image harmonization works either focus on extracting no more than 2D information (e.g., low-level statistics or image filters) from the background image or rely on the non-linear representations of deep neural networks to adjust the foreground appearance. However, from a physical point of view, realistic image harmonization requires the perception of illumination at the foreground position in the scene (i.e., Spatially-Varying (SV) illumination), especially for indoor scenes. To solve indoor harmonization, we present a novel learning-based framework, which attempts to mimic the physical model of image formation. The proposed framework consists of a new neural harmonization architecture with four compact neural modules, which jointly learn SV illumination, shading, albedo, and rendering. In particular, a multilayer perceptron-based neural illumination field is designed to recover the illumination with finer details. Besides, we construct the first large-scale synthetic indoor harmonization benchmark dataset in which the foreground focuses on humans and is rendered and perturbed by SV illuminations. An object placement formula is also derived to ensure that the foreground object is placed in the background at a reasonable size. Extensive experiments on synthetic and real data demonstrate that our proposed approach achieves better results than prior works. [ABSTRACT FROM AUTHOR] |