Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Ye, Hanrong"'
We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challe
Externí odkaz:
http://arxiv.org/abs/2407.01509
Autor:
Ye, Hanrong, Huang, De-An, Lu, Yao, Yu, Zhiding, Ping, Wei, Tao, Andrew, Kautz, Jan, Han, Song, Xu, Dan, Molchanov, Pavlo, Yin, Hongxu
We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LL
Externí odkaz:
http://arxiv.org/abs/2405.19335
Autor:
Ye, Hanrong, Xu, Dan
Recently, there has been an increased interest in the practical problem of learning multiple dense scene understanding tasks from partially annotated data, where each training sample is only labeled for a subset of the tasks. The missing of task labe
Externí odkaz:
http://arxiv.org/abs/2403.15389
We propose SegGen, a highly-effective training data generation method for image segmentation, which pushes the performance limits of state-of-the-art segmentation models to a significant extent. SegGen designs and integrates two data generation strat
Externí odkaz:
http://arxiv.org/abs/2311.03355
Autor:
Ye, Hanrong, Xu, Dan
Publikováno v:
ICCV 2023
Learning discriminative task-specific features simultaneously for multiple distinct tasks is a fundamental problem in multi-task learning. Recent state-of-the-art models consider directly decoding task-specific features from one shared task-generic f
Externí odkaz:
http://arxiv.org/abs/2307.15324
This paper targets the problem of multi-task dense prediction which aims to achieve simultaneous learning and inference on a bunch of multiple dense prediction tasks in a single framework. A core objective in design is how to effectively model cross-
Externí odkaz:
http://arxiv.org/abs/2307.07934
Autor:
Ye, Hanrong, Xu, Dan
Multi-task scene understanding aims to design models that can simultaneously predict several scene understanding tasks with one versatile model. Previous studies typically process multi-task features in a more local way, and thus cannot effectively l
Externí odkaz:
http://arxiv.org/abs/2306.04842
Autor:
Ye, Hanrong, Xu, Dan
Publikováno v:
ICLR 2023
This report serves as a supplementary document for TaskPrompter, detailing its implementation on a new joint 2D-3D multi-task learning benchmark based on Cityscapes-3D. TaskPrompter presents an innovative multi-task prompting framework that unifies t
Externí odkaz:
http://arxiv.org/abs/2304.00971
Autor:
Ye, Hanrong, Xu, Dan
Publikováno v:
ECCV 2022
Multi-task dense scene understanding is a thriving research domain that requires simultaneous perception and reasoning on a series of correlated tasks with pixel-wise prediction. Most existing works encounter a severe limitation of modeling in the lo
Externí odkaz:
http://arxiv.org/abs/2203.07997
This paper presents a novel and lightweight hyperparameter optimization (HPO) method, MOdular FActorial Design (MOFA). MOFA pursues several rounds of HPO, where each round alternates between exploration of hyperparameter space by factorial design and
Externí odkaz:
http://arxiv.org/abs/2011.09545