Výsledky vyhledávání

Report

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Autor: Chen, Yicheng, Li, Xiangtai, Li, Yining, Zeng, Yanhong, Wu, Jianzong, Zhao, Xiangyu, Chen, Kai

Diffusion-based models have shown great potential in generating high-quality images with various layouts, which can benefit downstream perception tasks. However, a fully automatic layout generation driven only by language and a suitable metric for me

Externí odkaz: http://arxiv.org/abs/2406.20085

Zobrazit plný text záznamu

Report

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Autor: Zhao, Xiangyu, Li, Xiangtai, Duan, Haodong, Huang, Haian, Li, Yining, Chen, Kai, Yang, Hua

Multi-modal large language models (MLLMs) have made significant strides in various visual understanding tasks. However, the majority of these models are constrained to process low-resolution images, which limits their effectiveness in perception task

Externí odkaz: http://arxiv.org/abs/2406.17770

Zobrazit plný text záznamu

Report

MotionBooth: Motion-Aware Customized Text-to-Video Generation

Autor: Wu, Jianzong, Li, Xiangtai, Zeng, Yanhong, Zhang, Jiangning, Zhou, Qianyu, Li, Yining, Tong, Yunhai, Chen, Kai

In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-v

Externí odkaz: http://arxiv.org/abs/2406.17758

Zobrazit plný text záznamu

Report

InternLM-Law: An Open Source Chinese Legal Large Language Model

Autor: Fei, Zhiwei, Zhang, Songyang, Shen, Xiaoyu, Zhu, Dawei, Wang, Xiao, Cao, Maosong, Zhou, Fengzhe, Li, Yining, Zhang, Wenwei, Lin, Dahua, Chen, Kai, Ge, Jidong

While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law,

Externí odkaz: http://arxiv.org/abs/2406.14887

Zobrazit plný text záznamu

Report

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Autor: Fang, Xinyu, Mao, Kangrui, Duan, Haodong, Zhao, Xiangyu, Li, Yining, Lin, Dahua, Chen, Kai

The advent of large vision-language models (LVLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding. Traditional VideoQA benchmarks, despite providing quantitative metrics, often fail to encomp

Externí odkaz: http://arxiv.org/abs/2406.14515

Zobrazit plný text záznamu

Report

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Autor: Hu, Kai, Yu, Weichen, Yao, Tianjun, Li, Xiang, Liu, Wenhe, Yu, Lijun, Li, Yining, Chen, Kai, Shen, Zhiqiang, Fredrikson, Matt

Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), w

Externí odkaz: http://arxiv.org/abs/2405.09113

Zobrazit plný text záznamu

Report

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution. Recent efforts have aimed to enhance the high-

Externí odkaz: http://arxiv.org/abs/2404.06512

Zobrazit plný text záznamu

Report

InternLM2 Technical Report

Autor: Cai, Zheng, Cao, Maosong, Chen, Haojiong, Chen, Kai, Chen, Keyu, Chen, Xin, Chen, Xun, Chen, Zehui, Chen, Zhi, Chu, Pei, Dong, Xiaoyi, Duan, Haodong, Fan, Qi, Fei, Zhaoye, Gao, Yang, Ge, Jiaye, Gu, Chenya, Gu, Yuzhe, Gui, Tao, Guo, Aijia, Guo, Qipeng, He, Conghui, Hu, Yingfan, Huang, Ting, Jiang, Tao, Jiao, Penglong, Jin, Zhenjiang, Lei, Zhikai, Li, Jiaxing, Li, Jingwen, Li, Linyang, Li, Shuaibin, Li, Wei, Li, Yining, Liu, Hongwei, Liu, Jiangning, Hong, Jiawei, Liu, Kaiwen, Liu, Kuikun, Liu, Xiaoran, Lv, Chengqi, Lv, Haijun, Lv, Kai, Ma, Li, Ma, Runyuan, Ma, Zerun, Ning, Wenchang, Ouyang, Linke, Qiu, Jiantao, Qu, Yuan, Shang, Fukai, Shao, Yunfan, Song, Demin, Song, Zifan, Sui, Zhihao, Sun, Peng, Sun, Yu, Tang, Huanze, Wang, Bin, Wang, Guoteng, Wang, Jiaqi, Wang, Jiayu, Wang, Rui, Wang, Yudong, Wang, Ziyi, Wei, Xingjian, Weng, Qizhen, Wu, Fan, Xiong, Yingtong, Xu, Chao, Xu, Ruiliang, Yan, Hang, Yan, Yirong, Yang, Xiaogui, Ye, Haochen, Ying, Huaiyuan, Yu, Jia, Yu, Jing, Zang, Yuhang, Zhang, Chuyu, Zhang, Li, Zhang, Pan, Zhang, Peng, Zhang, Ruijie, Zhang, Shuo, Zhang, Songyang, Zhang, Wenjian, Zhang, Wenwei, Zhang, Xingcheng, Zhang, Xinyue, Zhao, Hui, Zhao, Qian, Zhao, Xiaomeng, Zhou, Fengzhe, Zhou, Zaida, Zhuo, Jingming, Zou, Yicheng, Qiu, Xipeng, Qiao, Yu, Lin, Dahua

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introdu

Externí odkaz: http://arxiv.org/abs/2403.17297

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání