Zobrazeno 1 - 10
of 743
pro vyhledávání: '"LI Junnan"'
Recent advances in diffusion and flow-based generative models have demonstrated remarkable success in image restoration tasks, achieving superior perceptual quality compared to traditional deep learning approaches. However, these methods either requi
Externí odkaz:
http://arxiv.org/abs/2412.09465
Generative models, particularly diffusion models, have made significant success in data synthesis across various modalities, including images, videos, and 3D assets. However, current diffusion models are computationally intensive, often requiring num
Externí odkaz:
http://arxiv.org/abs/2412.05899
Large multimodal models (LMMs) with advanced video analysis capabilities have recently garnered significant attention. However, most evaluations rely on traditional methods like multiple-choice questions in benchmarks such as VideoMME and LongVideoBe
Externí odkaz:
http://arxiv.org/abs/2411.13281
Autor:
Li, Dongxu, Liu, Yudong, Wu, Haoning, Wang, Yue, Shen, Zhiqi, Qu, Bowen, Niu, Xinyao, Zhou, Fan, Huang, Chengen, Li, Yanpeng, Zhu, Chongyan, Ren, Xiaoyi, Li, Chao, Ye, Yifan, Zhang, Lihuan, Yan, Hanshu, Wang, Guoyin, Chen, Bei, Li, Junnan
Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles
Externí odkaz:
http://arxiv.org/abs/2410.05993
Autor:
Fatoni, Muhammad Hilman, Herneth, Christopher, Li, Junnan, Budiman, Fajar, Ganguly, Amartya, Haddadin, Sami
Markerless motion capture devices such as the Leap Motion Controller (LMC) have been extensively used for tracking hand, wrist, and forearm positions as an alternative to Marker-based Motion Capture (MMC). However, previous studies have highlighted t
Externí odkaz:
http://arxiv.org/abs/2408.17287
Autor:
Li, Junnan, Chen, Lingyun, Ringwald, Johannes, Fortunic, Edmundo Pozo, Ganguly, Amartya, Haddadin, Sami
This study addresses the absence of an identification framework to quantify a comprehensive dynamic model of human and anthropomorphic tendon-driven fingers, which is necessary to investigate the physiological properties of human fingers and improve
Externí odkaz:
http://arxiv.org/abs/2408.13044
This study addresses the critical need for diverse and comprehensive data focused on human arm joint torques while performing activities of daily living (ADL). Previous studies have often overlooked the influence of objects on joint torques during AD
Externí odkaz:
http://arxiv.org/abs/2408.07434
Large multimodal models (LMMs) are processing increasingly longer and richer inputs. Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark
Externí odkaz:
http://arxiv.org/abs/2407.15754
Autor:
Tiong, Anthony Meng Huat, Zhao, Junqi, Li, Boyang, Li, Junnan, Hoi, Steven C. H., Xiong, Caiming
Vision-language (VL) models, pretrained on colossal image-text datasets, have attained broad VL competence that is difficult to evaluate. A common belief is that a small number of VL skills underlie the variety of VL tests. In this paper, we perform
Externí odkaz:
http://arxiv.org/abs/2404.02415
Autor:
Panagopoulou, Artemis, Xue, Le, Yu, Ning, Li, Junnan, Li, Dongxu, Joty, Shafiq, Xu, Ran, Savarese, Silvio, Xiong, Caiming, Niebles, Juan Carlos
Recent research has achieved significant advancements in visual reasoning tasks through learning image-to-language projections and leveraging the impressive reasoning abilities of Large Language Models (LLMs). This paper introduces an efficient and e
Externí odkaz:
http://arxiv.org/abs/2311.18799