Výsledky vyhledávání

Report

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Autor: Qian, Yusu, Ye, Hanrong, Fauconnier, Jean-Philippe, Grasch, Peter, Yang, Yinfei, Gan, Zhe

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challe

Externí odkaz: http://arxiv.org/abs/2407.01509

Zobrazit plný text záznamu

Report

X-VILA: Cross-Modality Alignment for Large Language Model

Autor: Ye, Hanrong, Huang, De-An, Lu, Yao, Yu, Zhiding, Ping, Wei, Tao, Andrew, Kautz, Jan, Han, Song, Xu, Dan, Molchanov, Pavlo, Yin, Hongxu

We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LL

Externí odkaz: http://arxiv.org/abs/2405.19335

Zobrazit plný text záznamu

Report

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

Autor: Ye, Hanrong, Xu, Dan

Recently, there has been an increased interest in the practical problem of learning multiple dense scene understanding tasks from partially annotated data, where each training sample is only labeled for a subset of the tasks. The missing of task labe

Externí odkaz: http://arxiv.org/abs/2403.15389

Zobrazit plný text záznamu

Report

SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis

Autor: Ye, Hanrong, Kuen, Jason, Liu, Qing, Lin, Zhe, Price, Brian, Xu, Dan

We propose SegGen, a highly-effective training data generation method for image segmentation, which pushes the performance limits of state-of-the-art segmentation models to a significant extent. SegGen designs and integrates two data generation strat

Externí odkaz: http://arxiv.org/abs/2311.03355

Zobrazit plný text záznamu

Report

TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts

Autor: Ye, Hanrong, Xu, Dan

Publikováno v: ICCV 2023

Learning discriminative task-specific features simultaneously for multiple distinct tasks is a fundamental problem in multi-task learning. Recent state-of-the-art models consider directly decoding task-specific features from one shared task-generic f

Externí odkaz: http://arxiv.org/abs/2307.15324

Zobrazit plný text záznamu

Report

Contrastive Multi-Task Dense Prediction

Autor: Yang, Siwei, Ye, Hanrong, Xu, Dan

This paper targets the problem of multi-task dense prediction which aims to achieve simultaneous learning and inference on a bunch of multiple dense prediction tasks in a single framework. A core objective in design is how to effectively model cross-

Externí odkaz: http://arxiv.org/abs/2307.07934

Zobrazit plný text záznamu

Report

InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding

Autor: Ye, Hanrong, Xu, Dan

Multi-task scene understanding aims to design models that can simultaneously predict several scene understanding tasks with one versatile model. Previous studies typically process multi-task features in a more local way, and thus cannot effectively l

Externí odkaz: http://arxiv.org/abs/2306.04842

Zobrazit plný text záznamu

Report

Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation

Autor: Ye, Hanrong, Xu, Dan

Publikováno v: ICLR 2023

This report serves as a supplementary document for TaskPrompter, detailing its implementation on a new joint 2D-3D multi-task learning benchmark based on Cityscapes-3D. TaskPrompter presents an innovative multi-task prompting framework that unifies t

Externí odkaz: http://arxiv.org/abs/2304.00971

Zobrazit plný text záznamu

Report

InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

Autor: Ye, Hanrong, Xu, Dan

Publikováno v: ECCV 2022

Multi-task dense scene understanding is a thriving research domain that requires simultaneous perception and reasoning on a series of correlated tasks with pixel-wise prediction. Most existing works encounter a severe limitation of modeling in the lo

Externí odkaz: http://arxiv.org/abs/2203.07997

Zobrazit plný text záznamu

Report

MOFA: Modular Factorial Design for Hyperparameter Optimization

Autor: Xiong, Bo, Huang, Yimin, Ye, Hanrong, Staab, Steffen, Li, Zhenguo

This paper presents a novel and lightweight hyperparameter optimization (HPO) method, MOdular FActorial Design (MOFA). MOFA pursues several rounds of HPO, where each round alternates between exploration of hyperparameter space by factorial design and

Externí odkaz: http://arxiv.org/abs/2011.09545

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání