Popis: |
Fine-grained information extraction from fashion imagery is a challengingtask due to the inherent diversity and complexity of fashion categories andattributes. Additionally, fashion imagery often depict multiple items while fashionitems tend to follow hierarchical relations among various object types, categoriesand attributes. In this study, we address both issues with a 2-step hierarchical deep learning pipeline consisting of (1) a low granularity object type detection module(upper-body, lower-body, full-body, footwear) and (2) two classification modulesfor garment categories and attributes based on the outcome of the first step. Forthe category and attribute-level classification stages we examine a hierarchical labelsharing (HLS) technique in two settings: (1) single-task learning (STL w/ HLS)and (2) multi-task learning with RNN and visual attention (MTL w/ RNN+VA).Our approach enables progressively focusing on appropriately detailed features forautomatically learning the hierarchical relations of fashion and enabling predictions on images with complete outfits. Empirically, STL w/ HLS reached 93.99% top-3accuracy while MTL w/ RNN+VA reached 97.57% top-5 accuracy for categoryclassification on the DeepFashion benchmark, surpassing the current state-of-the-artwithout requiring landmark or mask annotations nor specialised domain expertise. |