Výsledky vyhledávání - "Tutar, Ismail"

Report

Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment

Autor: Zhong, Wenliang, Wu, Wenyi, Li, Qi, Barton, Rob, Du, Boxin, Sam, Shioulin, Bouyarmane, Karim, Tutar, Ismail, Huang, Junzhou

Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based

Externí odkaz: http://arxiv.org/abs/2406.02987

Zobrazit plný text záznamu

Report

Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Autor: Seyfioglu, Mehmet Saygin, Bouyarmane, Karim, Kumar, Suren, Tavanaei, Amir, Tutar, Ismail B.

As online shopping is growing, the ability for buyers to virtually visualize products in their settings-a phenomenon we define as "Virtual Try-All"-has become crucial. Recent diffusion models inherently contain a world model, rendering them suitable

Externí odkaz: http://arxiv.org/abs/2401.13795

Zobrazit plný text záznamu

Report

Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications

Autor: Wu, Wenyi, Bouyarmane, Karim, Tutar, Ismail

We present Catalog Phrase Grounding (CPG), a model that can associate product textual data (title, brands) into corresponding regions of product images (isolated product region, brand logo region) for e-commerce vision-language applications. We use a

Externí odkaz: http://arxiv.org/abs/2308.16354

Zobrazit plný text záznamu

Report

DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling

Autor: Seyfioglu, Mehmet Saygin, Bouyarmane, Karim, Kumar, Suren, Tavanaei, Amir, Tutar, Ismail B.

We introduce DreamPaint, a framework to intelligently inpaint any e-commerce product on any user-provided context image. The context image can be, for example, the user's own image for virtual try-on of clothes from the e-commerce catalog on themselv

Externí odkaz: http://arxiv.org/abs/2305.01257

Zobrazit plný text záznamu

Report

Solving Price Per Unit Problem Around the World: Formulating Fact Extraction as Question Answering

Autor: Arici, Tarik, Kumar, Kushal, Çeker, Hayreddin, Saladi, Anoop S V K K, Tutar, Ismail

Price Per Unit (PPU) is an essential information for consumers shopping on e-commerce websites when comparing products. Finding total quantity in a product is required for computing PPU, which is not always provided by the sellers. To predict total q

Externí odkaz: http://arxiv.org/abs/2204.05555

Zobrazit plný text záznamu

Report

MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling

Autor: Arici, Tarik, Seyfioglu, Mehmet Saygin, Neiman, Tal, Xu, Yi, Train, Son, Chilimbi, Trishul, Zeng, Belinda, Tutar, Ismail

Vision-and-Language Pre-training (VLP) improves model performance for downstream tasks that require image and text inputs. Current VLP approaches differ on (i) model architecture (especially image embedders), (ii) loss functions, and (iii) masking po

Externí odkaz: http://arxiv.org/abs/2109.12178

Zobrazit plný text záznamu