Zobrazeno 1 - 10
of 7 238
pro vyhledávání: '"He Bo"'
Deadline-aware transmission scheduling in immersive video streaming is crucial. The objective is to guarantee that at least a certain block in multi-links is fully delivered within their deadlines, which is referred to as delivery ratio. Compared wit
Externí odkaz:
http://arxiv.org/abs/2408.17028
Underwater monocular depth estimation serves as the foundation for tasks such as 3D reconstruction of underwater scenes. However, due to the influence of light and medium, the underwater environment undergoes a distinctive imaging process, which pres
Externí odkaz:
http://arxiv.org/abs/2407.17838
Autor:
He, Bo, Li, Hengduo, Jang, Young Kyun, Jia, Menglin, Cao, Xuefei, Shah, Ashish, Shrivastava, Abhinav, Lim, Ser-Nam
With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoC
Externí odkaz:
http://arxiv.org/abs/2404.05726
The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution. Despite sharing a common goal, different tasks often rely on distinct
Externí odkaz:
http://arxiv.org/abs/2403.17935
Existing visual instruction tuning methods typically prompt large language models with textual descriptions to generate instruction-following data. Despite the promising performance achieved, these descriptions are derived from image annotations, whi
Externí odkaz:
http://arxiv.org/abs/2311.07574
Autor:
Saini, Nirat, Wang, Hanyu, Swaminathan, Archana, Jayasundara, Vinoj, He, Bo, Gupta, Kamal, Shrivastava, Abhinav
Recognizing and generating object-state compositions has been a challenging task, especially when generalizing to unseen compositions. In this paper, we study the task of cutting objects in different styles and the resulting object state changes. We
Externí odkaz:
http://arxiv.org/abs/2309.14339
Publikováno v:
Rapid Prototyping Journal, 2024, Vol. 30, Issue 8, pp. 1638-1647.
Externí odkaz:
http://www.emeraldinsight.com/doi/10.1108/RPJ-02-2023-0052
Publikováno v:
Circuit World, 2021, Vol. 50, Issue 2/3, pp. 225-239.
Externí odkaz:
http://www.emeraldinsight.com/doi/10.1108/CW-11-2019-0159
Autor:
He, Bo, Yang, Xitong, Wang, Hanyu, Wu, Zuxuan, Chen, Hao, Huang, Shuaiyi, Ren, Yixuan, Lim, Ser-Nam, Shrivastava, Abhinav
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images, and have been recently applied to encode videos (e.g., NeRV, E-NeRV). While achieving promising results, existing INR-based methods are limit
Externí odkaz:
http://arxiv.org/abs/2303.14124
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries. Unlike the unimodal summarization, the multimodal summarization task explicitly leverages cross-modal information to
Externí odkaz:
http://arxiv.org/abs/2303.07284