Zobrazeno 1 - 10
of 285 004
pro vyhledávání: '"A. Tai"'
We introduce Audio-Agent, a multimodal framework for audio generation, editing and composition based on text or video inputs. Conventional approaches for text-to-audio (TTA) tasks often make single-pass inferences from text descriptions. While straig
Externí odkaz:
http://arxiv.org/abs/2410.03335
Autor:
Yoo, Jinsu, Feng, Zhenyang, Pan, Tai-Yu, Sun, Yihong, Phoo, Cheng Perng, Chen, Xiangyu, Campbell, Mark, Weinberger, Kilian Q., Hariharan, Bharath, Chao, Wei-Lun
Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is dep
Externí odkaz:
http://arxiv.org/abs/2410.02646
Autor:
Nguyen, Duy M. H., Diep, Nghiem T., Nguyen, Trung Q., Le, Hoang-Bao, Nguyen, Tai, Nguyen, Tien, Nguyen, TrungTin, Ho, Nhat, Xie, Pengtao, Wattenhofer, Roger, Zhou, James, Sonntag, Daniel, Niepert, Mathias
State-of-the-art medical multi-modal large language models (med-MLLM), like LLaVA-Med or BioMedGPT, leverage instruction-following data in pre-training. However, those models primarily focus on scaling the model size and data volume to boost performa
Externí odkaz:
http://arxiv.org/abs/2410.02615
Large Language Models (LLMs) require frequent updates to correct errors and keep pace with continuously evolving knowledge in a timely and effective manner. Recent research in it model editing has highlighted the challenges in balancing generalizatio
Externí odkaz:
http://arxiv.org/abs/2410.00454
Autor:
Chernobai, Misha, Tsai, Tai-Peng
We consider the existence and $L^q$ gradient estimates for perturbed Stokes systems with divergence-free critical drift in a bounded Lipschitz domain in $\mathbb{R}^n$, $n \ge 3$. The first two results assume the drift is either in $L^n$ or sufficien
Externí odkaz:
http://arxiv.org/abs/2410.01081
Conversational search offers an easier and faster alternative to conventional web search, while having downsides like lack of source verification. Research has examined performance disparities between these two systems in different settings. However,
Externí odkaz:
http://arxiv.org/abs/2409.19982
Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems
RAG (Retrieval-Augmented Generation) have recently gained significant attention for their enhanced ability to integrate external knowledge sources in open-domain question answering (QA) tasks. However, it remains unclear how these models address fair
Externí odkaz:
http://arxiv.org/abs/2409.19804
Superoscillation (SO) wavefunctions, that locally oscillate much faster than its fastest Fourier component, in light waves have enhanced optical technologies beyond diffraction limits, but never been controlled into 2D periodic lattices. Here, we rep
Externí odkaz:
http://arxiv.org/abs/2409.19565
6D object pose estimation aims at determining an object's translation, rotation, and scale, typically from a single RGBD image. Recent advancements have expanded this estimation from instance-level to category-level, allowing models to generalize acr
Externí odkaz:
http://arxiv.org/abs/2409.18261
Recent advancements in Large Multimodal Models (LMMs) have greatly enhanced their proficiency in 2D visual understanding tasks, enabling them to effectively process and understand images and videos. However, the development of LMMs with 3D-awareness
Externí odkaz:
http://arxiv.org/abs/2409.18125