Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Zareian, Alireza"'
Autor:
He, Zecheng, Sun, Bo, Juefei-Xu, Felix, Ma, Haoyu, Ramchandani, Ankit, Cheung, Vincent, Shah, Siddharth, Kalia, Anmol, Subramanyam, Harihar, Zareian, Alireza, Chen, Li, Jain, Ankit, Zhang, Ning, Zhang, Peizhao, Sumbaly, Roshan, Vajda, Peter, Sinha, Animesh
Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-based persona
Externí odkaz:
http://arxiv.org/abs/2409.13346
Image-caption pretraining has been quite successfully used for downstream vision tasks like zero-shot image classification and object detection. However, image-caption pretraining is still a hard problem -- it requires multiple concepts (nouns) from
Externí odkaz:
http://arxiv.org/abs/2305.17540
Clustering is a ubiquitous tool in unsupervised learning. Most of the existing self-supervised representation learning methods typically cluster samples based on visually dominant features. While this works well for image-based self-supervision, it o
Externí odkaz:
http://arxiv.org/abs/2207.10158
Autor:
Zareian, Alireza
Recent advances in Deep Learning (DL) have achieved impressive performance in a variety of Computer Vision (CV) tasks, leading to an exciting wave of academic and industrial efforts to develop Artificial Intelligence (AI) facilities for every aspect
Autor:
Wang, Zhecan, You, Haoxuan, Li, Liunian Harold, Zareian, Alireza, Park, Suji, Liang, Yiqing, Chang, Kai-Wei, Chang, Shih-Fu
Publikováno v:
AAAI 2022
Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have mad
Externí odkaz:
http://arxiv.org/abs/2112.08587
Despite the remarkable accuracy of deep neural networks in object detection, they are costly to train and scale due to supervision requirements. Particularly, learning more object categories typically requires proportionally more bounding box annotat
Externí odkaz:
http://arxiv.org/abs/2011.10678
Autor:
Li, Liunian Harold, You, Haoxuan, Wang, Zhecan, Zareian, Alireza, Chang, Shih-Fu, Chang, Kai-Wei
Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks. However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and r
Externí odkaz:
http://arxiv.org/abs/2010.12831
Children acquire language subconsciously by observing the surrounding world and listening to descriptions. They can discover the meaning of words even without explicit language knowledge, and generalize to novel compositions effortlessly. In this pap
Externí odkaz:
http://arxiv.org/abs/2007.11668
Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild. Perception errors often lead to nonsensical compositions in the output scene graph
Externí odkaz:
http://arxiv.org/abs/2006.09623
Autor:
Li, Manling, Zareian, Alireza, Zeng, Qi, Whitehead, Spencer, Lu, Di, Ji, Heng, Chang, Shih-Fu
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated e
Externí odkaz:
http://arxiv.org/abs/2005.02472