Zobrazeno 1 - 10
of 26
pro vyhledávání: '"Ye, Keren"'
Autor:
Zhu, William Yicheng, Ye, Keren, Ke, Junjie, Yu, Jiahui, Guibas, Leonidas, Milanfar, Peyman, Yang, Feng
Recognizing and disentangling visual attributes from objects is a foundation to many computer vision applications. While large vision language representations like CLIP had largely resolved the task of zero-shot object recognition, zero-shot visual a
Externí odkaz:
http://arxiv.org/abs/2408.04102
Autor:
Qi, Chenyang, Tu, Zhengzhong, Ye, Keren, Delbracio, Mauricio, Milanfar, Peyman, Chen, Qifeng, Talebi, Hossein
Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for mo
Externí odkaz:
http://arxiv.org/abs/2312.11595
Assessing the aesthetics of an image is challenging, as it is influenced by multiple factors including composition, color, style, and high-level semantics. Existing image aesthetic assessment (IAA) methods primarily rely on human-labeled rating score
Externí odkaz:
http://arxiv.org/abs/2303.14302
Autor:
Ye, Keren, Kovashka, Adriana
Videos are more well-organized curated data sources for visual concept learning than images. Unlike the 2-dimensional images which only involve the spatial information, the additional temporal dimension bridges and synchronizes multiple modalities. H
Externí odkaz:
http://arxiv.org/abs/2205.05895
Autor:
Ye, Keren, Kovashka, Adriana
Prior work in scene graph generation requires categorical supervision at the level of triplets - subjects and objects, and predicates that relate them, either with or without bounding box information. However, scene graph generation is a holistic tas
Externí odkaz:
http://arxiv.org/abs/2105.13994
Deep learning based object detectors are commonly deployed on mobile devices to solve a variety of tasks. For maximum accuracy, each detector is usually trained to solve one single specific task, and comes with a completely independent set of paramet
Externí odkaz:
http://arxiv.org/abs/2101.01260
Learning to localize and name object instances is a fundamental problem in vision, but state-of-the-art approaches rely on expensive bounding box supervision. While weakly supervised detection (WSOD) methods relax the need for boxes to that of image-
Externí odkaz:
http://arxiv.org/abs/1907.10164
To alleviate the cost of obtaining accurate bounding boxes for training today's state-of-the-art object detection models, recent weakly supervised detection work has proposed techniques to learn from image-level labels. However, requiring discrete im
Externí odkaz:
http://arxiv.org/abs/1811.10080
In order to resonate with the viewers, many video advertisements explore creative narrative techniques such as "Freytag's pyramid" where a story begins with exposition, followed by rising action, then climax, concluding with denouement. In the dramat
Externí odkaz:
http://arxiv.org/abs/1807.11122
Autor:
Ye, Keren, Kovashka, Adriana
In order to convey the most content in their limited space, advertisements embed references to outside knowledge via symbolism. For example, a motorcycle stands for adventure (a positive property the ad wants associated with the product being sold),
Externí odkaz:
http://arxiv.org/abs/1711.06666