Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Som, Subhojit"'
Autor:
Aggarwal, Kriti, Khandelwal, Aditi, Tanmay, Kumar, Khan, Owais Mohammed, Liu, Qiang, Choudhury, Monojit, Chauhan, Hardik Hansrajbhai, Som, Subhojit, Chaudhary, Vishrav, Tiwary, Saurabh
Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images. Existing models often rely on manual feature engineering or domain-specific pipelines, which limit their generalization
Externí odkaz:
http://arxiv.org/abs/2305.14218
Autor:
Huang, Qiuyuan, Park, Jae Sung, Gupta, Abhinav, Bennett, Paul, Gong, Ran, Som, Subhojit, Peng, Baolin, Mohammed, Owais Khan, Pal, Chris, Choi, Yejin, Gao, Jianfeng
Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amount
Externí odkaz:
http://arxiv.org/abs/2305.00970
Autor:
Huang, Shaohan, Dong, Li, Wang, Wenhui, Hao, Yaru, Singhal, Saksham, Ma, Shuming, Lv, Tengchao, Cui, Lei, Mohammed, Owais Khan, Patra, Barun, Liu, Qiang, Aggarwal, Kriti, Chi, Zewen, Bjorck, Johan, Chaudhary, Vishrav, Som, Subhojit, Song, Xia, Wei, Furu
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities,
Externí odkaz:
http://arxiv.org/abs/2302.14045
Autor:
Wang, Wenhui, Bao, Hangbo, Dong, Li, Bjorck, Johan, Peng, Zhiliang, Liu, Qiang, Aggarwal, Kriti, Mohammed, Owais Khan, Singhal, Saksham, Som, Subhojit, Wei, Furu
A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language ta
Externí odkaz:
http://arxiv.org/abs/2208.10442
Autor:
Gui, Liangke, Chang, Yingshan, Huang, Qiuyuan, Som, Subhojit, Hauptmann, Alex, Gao, Jianfeng, Bisk, Yonatan
Vision-Language Transformers can be learned without low-level human labels (e.g. class labels, bounding boxes, etc). Existing work, whether explicitly utilizing bounding boxes or patches, assumes that the visual backbone must first be trained on Imag
Externí odkaz:
http://arxiv.org/abs/2205.09256
Autor:
Bao, Hangbo, Wang, Wenhui, Dong, Li, Liu, Qiang, Mohammed, Owais Khan, Aggarwal, Kriti, Som, Subhojit, Wei, Furu
We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network. Specifically, we introduce Mixture-of-Modality-Experts (MoME) Transformer, where each block conta
Externí odkaz:
http://arxiv.org/abs/2111.02358
Autor:
Som, Subhojit
In this thesis we address two inverse problems. The first is the reconstruction of sparse signals. In the second, electron paramagnetic resonance imaging (EPRI) is considered.The problem of recovering sparse signals has recently generated much intere
Externí odkaz:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1282135281
Autor:
Som, Subhojit, Schniter, Philip
We propose a novel algorithm for compressive imaging that exploits both the sparsity and persistence across scales found in the 2D wavelet transform coefficients of natural images. Like other recent works, we model wavelet structure using a hidden Ma
Externí odkaz:
http://arxiv.org/abs/1108.2632
Autor:
Som, Subhojit, Potter, Lee C
In compressive sensing, sparse signals are recovered from underdetermined noisy linear observations. One of the interesting problems which attracted a lot of attention in recent times is the support recovery or sparsity pattern recovery problem. The
Externí odkaz:
http://arxiv.org/abs/1004.4044
Vision-Language Transformers can be learned without human labels (e.g. class labels, bounding boxes, etc). Existing work, whether explicitly utilizing bounding boxes or patches, assumes that the visual backbone must first be trained on ImageNet class
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::04f6c36218ab0fe3f3a689b8f0877b03