Výsledky vyhledávání - "Zhang, Renrui"

Report

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

Autor: Liu, Jiaming, Liu, Mengzhen, Wang, Zhenyu, Lee, Lily, Zhou, Kaichen, An, Pengju, Yang, Senqiao, Zhang, Renrui, Guo, Yandong, Zhang, Shanghang

A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two a

Externí odkaz: http://arxiv.org/abs/2406.04339

Zobrazit plný text záznamu

Report

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. T

Externí odkaz: http://arxiv.org/abs/2405.21075

Zobrazit plný text záznamu

Report

TripletMix: Triplet Data Augmentation for 3D Understanding

Autor: Wang, Jiaze, Wang, Yi, Guo, Ziyu, Zhang, Renrui, Zhou, Donghao, Chen, Guangyong, Liu, Anfeng, Heng, Pheng-Ann

Data augmentation has proven to be a vital tool for enhancing the generalization capabilities of deep learning models, especially in the context of 3D vision where traditional datasets are often limited. Despite previous advancements, existing method

Externí odkaz: http://arxiv.org/abs/2405.18523

Zobrazit plný text záznamu

Report

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

Autor: Liu, Jiaming, Li, Chenxuan, Wang, Guanqun, Lee, Lily, Zhou, Kaichen, Chen, Sixiang, Xiong, Chuyan, Ge, Jiaxin, Zhang, Renrui, Zhang, Shanghang

Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. R

Externí odkaz: http://arxiv.org/abs/2405.17418

Zobrazit plný text záznamu

Report

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

Autor: Lu, Xudong, Zhou, Aojun, Xu, Yuhui, Zhang, Renrui, Gao, Peng, Li, Hongsheng

Large Language Models (LLMs) have become pivotal in advancing the field of artificial intelligence, yet their immense sizes pose significant challenges for both fine-tuning and deployment. Current post-training pruning methods, while reducing the siz

Externí odkaz: http://arxiv.org/abs/2405.16057

Zobrazit plný text záznamu

Report

TerDiT: Ternary Diffusion Models with Transformers

Autor: Lu, Xudong, Zhou, Aojun, Lin, Ziyi, Liu, Qi, Xu, Yuhui, Zhang, Renrui, Wen, Yafei, Ren, Shuai, Gao, Peng, Yan, Junchi, Li, Hongsheng

Recent developments in large-scale pre-trained text-to-image diffusion models have significantly improved the generation of high-fidelity images, particularly with the emergence of diffusion models based on transformer architecture (DiTs). Among thes

Externí odkaz: http://arxiv.org/abs/2405.14854

Zobrazit plný text záznamu

Report

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we int

Externí odkaz: http://arxiv.org/abs/2405.05945

Zobrazit plný text záznamu

Report

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks te

Externí odkaz: http://arxiv.org/abs/2404.16006

Zobrazit plný text záznamu

Report

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Autor: Zhu, Xiangyang, Zhang, Renrui, He, Bowei, Guo, Ziyu, Liu, Jiaming, Xiao, Han, Fu, Chaoyou, Dong, Hao, Gao, Peng

To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'uns

Externí odkaz: http://arxiv.org/abs/2404.04050

Zobrazit plný text záznamu

Report

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Autor: Jiang, Dongzhi, Song, Guanglu, Wu, Xiaoshi, Zhang, Renrui, Shen, Dazhong, Zong, Zhuofan, Liu, Yu, Li, Hongsheng

Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensivel

Externí odkaz: http://arxiv.org/abs/2404.03653

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání