What we see in a photograph: content selection for image captioning

Autor: Georgios Barlas, Christos Veinidis, Avi Arampatzis
Rok vydání: 2020
Předmět:
Zdroj: The Visual Computer. 37:1309-1326
ISSN: 1432-2315
0178-2789
Popis: We propose and experimentally investigate the usefulness of several features for selecting image content (objects) suitable for image captioning. The approach taken explores three broad categories of features, namely geometric, conceptual, and visual. Experiments suggest that widely known geometric ‘rules’ in art–aesthetics or photography (such as the golden ratio or the rule-of-thirds) and facts about the human visual system (such as its wider horizontal angle than its vertical) provide no useful information for the task. Human captioners seem to prefer large, elongated (but not in the golden ratio) objects, positioned near the image center, irrespective of orientation. Conceptually, the preferred objects are either too specific or too general, and animate things are almost always mentioned; furthermore, some evidence is found for selecting diverse objects in order to achieve maximal image coverage in captions. Visual object features such as saliency, depth, edges, entropy, and contrast, are all found to provide useful information. Beyond evaluating features in isolation, we investigate how well these are combined by performing feature and feature-category ablation studies, leading to an effective set of features which can be proven useful for operational systems. Moreover, we propose alternative ways for feature engineering and evaluation, dealing with the drawbacks of the evaluation methodology proposed in past literature.
Databáze: OpenAIRE