Zobrazeno 1 - 10
of 226
pro vyhledávání: '"Rao, Raghuveer"'
Autor:
Han, Cheng, Wang, Qifan, Dianat, Sohail A., Rabbani, Majid, Rao, Raghuveer M., Fang, Yi, Guan, Qiang, Huang, Lifu, Liu, Dongfang
Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real app
Externí odkaz:
http://arxiv.org/abs/2407.04208
Autor:
Han, Cheng, Lu, Yawen, Sun, Guohao, Liang, James C., Cao, Zhiwen, Wang, Qifan, Guan, Qiang, Dianat, Sohail A., Rao, Raghuveer M., Geng, Tong, Tao, Zhiqiang, Liu, Dongfang
In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thought
Externí odkaz:
http://arxiv.org/abs/2406.01559
Autor:
Wang, Jiamian, Sun, Guohao, Wang, Pichao, Liu, Dongfang, Dianat, Sohail, Rabbani, Majid, Rao, Raghuveer, Tao, Zhiqiang
The increasing prevalence of video clips has sparked growing interest in text-video retrieval. Recent advances focus on establishing a joint embedding space for text and video, relying on consistent embedding representations to compute similarity. Ho
Externí odkaz:
http://arxiv.org/abs/2403.17998
Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled inf
Externí odkaz:
http://arxiv.org/abs/2401.12340
Autor:
Han, Cheng, Liang, James C., Wang, Qifan, Rabbani, Majid, Dianat, Sohail, Rao, Raghuveer, Wu, Ying Nian, Liu, Dongfang
We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion model within the GPT architecture, orchestrating a coherent sequence of visual p
Externí odkaz:
http://arxiv.org/abs/2401.09742
Autor:
Yu, Zifan, Tavakoli, Erfan Bank, Chen, Meida, You, Suya, Rao, Raghuveer, Agarwal, Sanjeev, Ren, Fengbo
The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects an
Externí odkaz:
http://arxiv.org/abs/2311.02535
Publikováno v:
SPIE Defense & Commercial Sensing 2023, Conference 12521, Automatic target recognition XXXIII, Orlando, Florida
One of the major obstacles in designing an automatic target recognition (ATR) algorithm, is that there are often labeled images in one domain (i.e., infrared source domain) but no annotated images in the other target domains (i.e., visible, SAR, LIDA
Externí odkaz:
http://arxiv.org/abs/2305.13886
Autor:
Yu, Zifan, Chen, Meida, Zhang, Zhikang, You, Suya, Rao, Raghuveer, Agarwal, Sanjeev, Ren, Fengbo
Common image-based LiDAR point cloud semantic segmentation (LiDAR PCSS) approaches have bottlenecks resulting from the boundary-blurring problem of convolution neural networks (CNNs) and quantitation loss of spherical projection. In this work, we pro
Externí odkaz:
http://arxiv.org/abs/2302.08594
Motivated by the increasing application of low-resolution LiDAR recently, we target the problem of low-resolution LiDAR-camera calibration in this work. The main challenges are two-fold: sparsity and noise in point clouds. To address the problem, we
Externí odkaz:
http://arxiv.org/abs/2211.03932