Zobrazeno 1 - 10
of 952
pro vyhledávání: '"YANG Wenhao"'
Autor:
Wang, Taowen, Liu, Dongfang, Liang, James Chenhao, Yang, Wenhao, Wang, Qifan, Han, Cheng, Luo, Jiebo, Tang, Ruixiang
Recently in robotics, Vision-Language-Action (VLA) models have emerged as a transformative approach, enabling robots to execute complex tasks by integrating visual and linguistic inputs within an end-to-end learning framework. While VLA models offer
Externí odkaz:
http://arxiv.org/abs/2411.13587
Stochastic gradient descent is a classic algorithm that has gained great popularity especially in the last decades as the most common approach for training models in machine learning. While the algorithm has been well-studied when stochastic gradient
Externí odkaz:
http://arxiv.org/abs/2410.16340
Large language models (LLMs) have become increasingly proficient at simulating various personality traits, an important capability for supporting related applications (e.g., role-playing). To further improve this capacity, in this paper, we present a
Externí odkaz:
http://arxiv.org/abs/2410.12327
Grounding objects in images using visual cues is a well-established approach in computer vision, yet the potential of audio as a modality for object recognition and grounding remains underexplored. We introduce YOSS, "You Only Speak Once to See," to
Externí odkaz:
http://arxiv.org/abs/2409.18372
Domain gap often degrades the performance of speaker verification (SV) systems when the statistical distributions of training data and real-world test speech are mismatched. Channel variation, a primary factor causing this gap, is less addressed than
Externí odkaz:
http://arxiv.org/abs/2409.09396
Knowledge distillation (KD) is widely used in audio tasks, such as speaker verification (SV), by transferring knowledge from a well-trained large model (the teacher) to a smaller, more compact model (the student) for efficiency and portability. Exist
Externí odkaz:
http://arxiv.org/abs/2409.09389
Autor:
Kuiper, Patrick, Hasan, Ali, Yang, Wenhao, Ng, Yuting, Bidkhori, Hoda, Blanchet, Jose, Tarokh, Vahid
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. EVT supports using semi-parametric models called max-stable distributions built from
Externí odkaz:
http://arxiv.org/abs/2408.00131
Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to i
Externí odkaz:
http://arxiv.org/abs/2406.10956
This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing p
Externí odkaz:
http://arxiv.org/abs/2406.03787
Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of $\mathcal{
Externí odkaz:
http://arxiv.org/abs/2406.00489