Zobrazeno 1 - 10
of 3 597
pro vyhledávání: '"Wang, Yali"'
Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, f
Externí odkaz:
http://arxiv.org/abs/2408.10605
Autor:
Pei, Baoqi, Chen, Guo, Xu, Jilan, He, Yuping, Liu, Yicheng, Pan, Kanghua, Huang, Yifei, Wang, Yali, Lu, Tong, Wang, Limin, Qiao, Yu
In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulo
Externí odkaz:
http://arxiv.org/abs/2406.18070
Autor:
Li, Qingyun, Chen, Zhe, Wang, Weiyun, Wang, Wenhai, Ye, Shenglong, Jin, Zhenjiang, Chen, Guanzhou, He, Yinan, Gao, Zhangwei, Cui, Erfei, Yu, Jiashuo, Tian, Hao, Zhou, Jiasheng, Xu, Chao, Wang, Bin, Wei, Xingjian, Li, Wei, Zhang, Wenjian, Zhang, Bo, Cai, Pinlong, Wen, Licheng, Yan, Xiangchao, Li, Zhenxiang, Chu, Pei, Wang, Yi, Dou, Min, Tian, Changyao, Zhu, Xizhou, Lu, Lewei, Chen, Yushi, He, Junjun, Tu, Zhongying, Lu, Tong, Wang, Yali, Wang, Limin, Lin, Dahua, Qiao, Yu, Shi, Botian, He, Conghui, Dai, Jifeng
Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data ai
Externí odkaz:
http://arxiv.org/abs/2406.08418
Autor:
Ying, Kaining, Meng, Fanqing, Wang, Jin, Li, Zhiqian, Lin, Han, Yang, Yue, Zhang, Hao, Zhang, Wenbo, Lin, Yuqi, Liu, Shuo, Lei, Jiayi, Lu, Quanfeng, Chen, Runjian, Xu, Peng, Zhang, Renrui, Zhang, Haozhe, Gao, Peng, Wang, Yali, Qiao, Yu, Luo, Ping, Zhang, Kaipeng, Shao, Wenqi
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks te
Externí odkaz:
http://arxiv.org/abs/2404.16006
Autor:
Huang, Yifei, Chen, Guo, Xu, Jilan, Zhang, Mingfang, Yang, Lijin, Pei, Baoqi, Zhang, Hongjie, Dong, Lu, Wang, Yali, Wang, Limin, Qiao, Yu
Being able to map the activities of others into one's own point of view is one fundamental human skill even from a very early age. Taking a step toward understanding this human ability, we introduce EgoExoLearn, a large-scale dataset that emulates th
Externí odkaz:
http://arxiv.org/abs/2403.16182
Autor:
Wang, Yi, Li, Kunchang, Li, Xinhao, Yu, Jiashuo, He, Yinan, Wang, Chenting, Chen, Guo, Pei, Baoqi, Yan, Ziang, Zheng, Rongkun, Xu, Jilan, Wang, Zun, Shi, Yansong, Jiang, Tianxiang, Li, Songze, Zhang, Hongjie, Huang, Yifei, Qiao, Yu, Wang, Yali, Wang, Limin
We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue. Our core design is a progressive training approach that unifies th
Externí odkaz:
http://arxiv.org/abs/2403.15377
Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain. The proposed VideoMamba overcomes the limitations of existing 3D convolution neural networ
Externí odkaz:
http://arxiv.org/abs/2403.06977
Open-world video recognition is challenging since traditional networks are not generalized well on complex environment variations. Alternatively, foundation models with rich knowledge have recently shown their generalization power. However, how to ap
Externí odkaz:
http://arxiv.org/abs/2402.18951
Publikováno v:
Journal of Medical Internet Research, Vol 22, Iss 7, p e18527 (2020)
BackgroundAn OHC online health community (OHC) is an interactive platform for virtual communication between patients and physicians. Patients can typically search, seek, and share their experience and rate physicians, who may be involved in giving ad
Externí odkaz:
https://doaj.org/article/61636ffcb8554de7b0b38c10c3519d8b
Autor:
Lu, Chaochao, Qian, Chen, Zheng, Guodong, Fan, Hongxing, Gao, Hongzhi, Zhang, Jie, Shao, Jing, Deng, Jingyi, Fu, Jinlan, Huang, Kexin, Li, Kunchang, Li, Lijun, Wang, Limin, Sheng, Lu, Chen, Meiqi, Zhang, Ming, Ren, Qibing, Chen, Sirui, Gui, Tao, Ouyang, Wanli, Wang, Yali, Teng, Yan, Wang, Yaru, Wang, Yi, He, Yinan, Wang, Yingchun, Wang, Yixu, Zhang, Yongting, Qiao, Yu, Shen, Yujiong, Mou, Yurong, Chen, Yuxi, Zhang, Zaibin, Shi, Zhelun, Yin, Zhenfei, Wang, Zhipin
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the ex
Externí odkaz:
http://arxiv.org/abs/2401.15071