Výsledky vyhledávání

Report

Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering

Autor: Li, Haoyuan, Xu, Chang, Yang, Wen, Mi, Li, Yu, Huai, Zhang, Haijian

Unmanned Aerial Vehicle (UAV) Cross-View Geo-Localization (CVGL) presents significant challenges due to the view discrepancy between oblique UAV images and overhead satellite images. Existing methods heavily rely on the supervision of labeled dataset

Externí odkaz: http://arxiv.org/abs/2411.14816

Zobrazit plný text záznamu

Report

Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Autor: Huang, Hongzhe, Yu, Zhewen, Liu, Jiang, Cai, Li, Jiao, Dian, Zhang, Wenqiao, Tang, Siliang, Li, Juncheng, Jiang, Hao, Li, Haoyuan, Zhuang, Yueting

Recent advances in Multi-modal Large Language Models (MLLMs), such as LLaVA-series models, are driven by massive machine-generated instruction-following data tuning. Such automatic instruction collection pipelines, however, inadvertently introduce si

Externí odkaz: http://arxiv.org/abs/2409.18541

Zobrazit plný text záznamu

Report

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-en

Externí odkaz: http://arxiv.org/abs/2409.18042

Zobrazit plný text záznamu

Report

Collaboratively Learning Federated Models from Noisy Decentralized Data

Autor: Li, Haoyuan, Funk, Mathias, Gürel, Nezihe Merve, Saeed, Aaqib

Federated learning (FL) has emerged as a prominent method for collaboratively training machine learning models using local data from edge devices, all while keeping data decentralized. However, accounting for the quality of data contributed by local

Externí odkaz: http://arxiv.org/abs/2409.02189

Zobrazit plný text záznamu

Report

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Autor: Shu, Fangxun, Liao, Yue, Zhuo, Le, Xu, Chenning, Zhang, Lei, Zhang, Guanghao, Shi, Haonan, Chen, Long, Zhong, Tao, He, Wanggui, Fu, Siming, Li, Haoyuan, Li, Bolin, Yu, Zhelun, Liu, Si, Li, Hongsheng, Jiang, Hao

We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM dis

Externí odkaz: http://arxiv.org/abs/2408.15881

Zobrazit plný text záznamu

Report

NAS-Cap: Deep-Learning Driven 3-D Capacitance Extraction with Neural Architecture Search and Data Augmentation

Autor: Li, Haoyuan, Yang, Dingcheng, Pei, Chunyan, Yu, Wenjian

More accurate capacitance extraction is demanded for designing integrated circuits under advanced process technology. The pattern matching approach and the field solver for capacitance extraction have the drawbacks of inaccuracy and large computation

Externí odkaz: http://arxiv.org/abs/2408.13195

Zobrazit plný text záznamu

Report

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition

Autor: Lin, Tianwei, Liu, Jiang, Zhang, Wenqiao, Li, Zhaocheng, Dai, Yang, Li, Haoyuan, Yu, Zhelun, He, Wanggui, Li, Juncheng, Jiang, Hao, Tang, Siliang, Zhuang, Yueting

While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straig

Externí odkaz: http://arxiv.org/abs/2408.09856

Zobrazit plný text záznamu

Report

Dynamics of Nanoscale Phase Decomposition in Laser Ablation

Femtosecond laser ablation is a process that bears both fundamental physics interest and has wide industrial applications. For decades, the lack of probes on the relevant time and length scales has prevented access to the highly nonequilibrium phase

Externí odkaz: http://arxiv.org/abs/2407.10505

Zobrazit plný text záznamu

Report

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Autor: He, Wanggui, Fu, Siming, Liu, Mushui, Wang, Xierui, Xiao, Wenyi, Shu, Fangxun, Wang, Yi, Zhang, Lei, Yu, Zhelun, Li, Haoyuan, Huang, Ziwei, Gan, LeiLei, Jiang, Hao

Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation th

Externí odkaz: http://arxiv.org/abs/2407.07614

Zobrazit plný text záznamu

Report

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

Autor: Wang, Ye, Xun, Jiahao, Hong, Minjie, Zhu, Jieming, Jin, Tao, Lin, Wang, Li, Haoyuan, Li, Linjun, Xia, Yan, Zhao, Zhou, Dong, Zhenhua

Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either b

Externí odkaz: http://arxiv.org/abs/2406.14017

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání