Zobrazeno 1 - 10
of 13 081
pro vyhledávání: '"ZHAO, Bo"'
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
Autor:
Bassi, Pedro R. A. S., Li, Wenxuan, Tang, Yucheng, Isensee, Fabian, Wang, Zifu, Chen, Jieneng, Chou, Yu-Cheng, Kirchhoff, Yannick, Rokuss, Maximilian, Huang, Ziyan, Ye, Jin, He, Junjun, Wald, Tassilo, Ulrich, Constantin, Baumgartner, Michael, Roy, Saikat, Maier-Hein, Klaus H., Jaeger, Paul, Ye, Yiwen, Xie, Yutong, Zhang, Jianpeng, Chen, Ziyang, Xia, Yong, Xing, Zhaohu, Zhu, Lei, Sadegheih, Yousef, Bozorgpour, Afshin, Kumari, Pratibha, Azad, Reza, Merhof, Dorit, Shi, Pengcheng, Ma, Ting, Du, Yuxin, Bai, Fan, Huang, Tiejun, Zhao, Bo, Wang, Haonan, Li, Xiaomeng, Gu, Hanxue, Dong, Haoyu, Yang, Jichen, Mazurowski, Maciej A., Gupta, Saumya, Wu, Linshan, Zhuang, Jiaxin, Chen, Hao, Roth, Holger, Xu, Daguang, Blaschko, Matthew B., Decherchi, Sergio, Cavalli, Andrea, Yuille, Alan L., Zhou, Zongwei
How can we test AI performance? This question seems trivial, but it isn't. Standard benchmarks often have problems such as in-distribution and small-size test sets, oversimplified metrics, unfair comparisons, and short-term outcome pressure. As a con
Externí odkaz:
http://arxiv.org/abs/2411.03670
Autor:
Wang, Xinlong, Zhang, Xiaosong, Luo, Zhengxiong, Sun, Quan, Cui, Yufeng, Wang, Jinsheng, Zhang, Fan, Wang, Yueze, Li, Zhen, Yu, Qiying, Zhao, Yingli, Ao, Yulong, Min, Xuebin, Li, Tao, Wu, Boya, Zhao, Bo, Zhang, Bowen, Wang, Liangdong, Liu, Guang, He, Zheqi, Yang, Xi, Liu, Jingjing, Lin, Yonghua, Huang, Tiejun, Wang, Zhongyuan
While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional approaches (e.
Externí odkaz:
http://arxiv.org/abs/2409.18869
Although current Multi-modal Large Language Models (MLLMs) demonstrate promising results in video understanding, processing extremely long videos remains an ongoing challenge. Typically, MLLMs struggle with handling thousands of visual tokens that ex
Externí odkaz:
http://arxiv.org/abs/2409.14485
Nonreciprocal thermal emitters that break Kirchhoff's law of thermal radiation promise exciting applications for thermal and energy applications. The design of the bandwidth and angular range of the nonreciprocal effect, which directly affects the pe
Externí odkaz:
http://arxiv.org/abs/2409.09192
Autor:
Cheng, Dingxin, Li, Mingda, Liu, Jingyu, Guo, Yongxin, Jiang, Bin, Liu, Qingbin, Chen, Xi, Zhao, Bo
Recently, integrating visual foundation models into large language models (LLMs) to form video understanding systems has attracted widespread attention. Most of the existing models compress diverse semantic information within the whole video and feed
Externí odkaz:
http://arxiv.org/abs/2409.06299
Autor:
Gao, Mingze, Liu, Jingyu, Li, Mingda, Xie, Jiangtao, Liu, Qingbin, Zhao, Bo, Chen, Xi, Xiong, Hui
Multimodal Large Language Models (MLLMs) have significantly improved performance across various image-language applications. Recently, there has been a growing interest in adapting image pre-trained MLLMs for video-related tasks. However, most effort
Externí odkaz:
http://arxiv.org/abs/2409.03206
Autor:
Li, Xiang, Yao, Yiqun, Jiang, Xin, Fang, Xuezhi, Wang, Chao, Liu, Xinzhang, Wang, Zihan, Zhao, Yu, Wang, Xin, Huang, Yuyao, Song, Shuangyong, Li, Yongxiang, Zhang, Zheng, Zhao, Bo, Sun, Aixin, Wang, Yequan, He, Zhongjiang, Wang, Zhongyuan, Li, Xuelong, Huang, Tiejun
Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacitie
Externí odkaz:
http://arxiv.org/abs/2407.02783
Autor:
Ding, Henghui, Liu, Chang, Wei, Yunchao, Ravi, Nikhila, He, Shuting, Bai, Song, Torr, Philip, Miao, Deshui, Li, Xin, He, Zhenyu, Wang, Yaowei, Yang, Ming-Hsuan, Xu, Zhensong, Yao, Jiangtao, Wu, Chengjing, Liu, Ting, Liu, Luoqi, Liu, Xinyu, Zhang, Jing, Zhang, Kexin, Yang, Yuting, Jiao, Licheng, Yang, Shuyuan, Gao, Mingqi, Luo, Jingnan, Yang, Jinyu, Han, Jungong, Zheng, Feng, Cao, Bin, Zhang, Yisi, Lin, Xuanxu, He, Xingjian, Zhao, Bo, Liu, Jing, Pan, Feiyu, Fang, Hao, Lu, Xiankai
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Seg
Externí odkaz:
http://arxiv.org/abs/2406.17005
Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. Unlike the previous referring video object segmentation (RVOS), this task f
Externí odkaz:
http://arxiv.org/abs/2406.13939
Autor:
Cai, Wenxiao, Ponomarenko, Iaroslav, Yuan, Jianhao, Li, Xiaoqi, Yang, Wankou, Dong, Hao, Zhao, Bo
Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI. In this paper, we propose SpatialBot for better spatial
Externí odkaz:
http://arxiv.org/abs/2406.13642