Výsledky vyhledávání

Report

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Autor: Yousefi, Midia, Qian, Yao, Chen, Junkun, Wang, Gang, Liu, Yanqing, Wang, Dongmei, Wang, Xiaofei, Xue, Jian

End-to-end speech translation (ST), which translates source language speech directly into target language text, has garnered significant attention in recent years. Many ST applications require strict length control to ensure that the translation dura

Externí odkaz: http://arxiv.org/abs/2411.07387

Zobrazit plný text záznamu

Report

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

Autor: Ghosh, Sreyan, Rasooli, Mohammad Sadegh, Levit, Michael, Wang, Peidong, Xue, Jian, Manocha, Dinesh, Li, Jinyu

Generative Error Correction (GEC) has emerged as a powerful post-processing method to enhance the performance of Automatic Speech Recognition (ASR) systems. However, we show that GEC models struggle to generalize beyond the specific types of errors e

Externí odkaz: http://arxiv.org/abs/2410.13198

Zobrazit plný text záznamu

Report

Implications for galaxy property estimation revealed by CO luminosity-FWHM relations in local star-forming galaxies

Autor: Wu, Yi-Han, Wang, Jun-Feng, Li, Xiao-Hu, Jiang, Xue-Jian, Tsai, Chao-Wei, Wu, Jing-Wen, Shi, Kun-Peng, Zhu, Lin, Zhong, Wen-Yu

This study explores a relationship between the CO luminosity-full width at half-maximum linewidth linear relation (i.e. the CO LFR) and mean galaxy property of the local star-forming galaxy sample in the xCOLDGASS data base, via a mathematical statem

Externí odkaz: http://arxiv.org/abs/2410.06714

Zobrazit plný text záznamu

Report

Towards Unified Facial Action Unit Recognition Framework by Large Language Models

Autor: Hu, Guohong, Lan, Xing, Jiang, Hanyu, Lyu, Jiayi, Xue, Jian

Facial Action Units (AUs) are of great significance in the realm of affective computing. In this paper, we propose AU-LLaVA, the first unified AU recognition framework based on the Large Language Model (LLM). AU-LLaVA consists of a visual encoder, a

Externí odkaz: http://arxiv.org/abs/2409.08444

Zobrazit plný text záznamu

Report

MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis

Autor: Jiang, Hanyu, Xue, Jian, Lan, Xing, Hu, Guohong, Lu, Ke

This paper introduces MVLLaVA, an intelligent agent designed for novel view synthesis tasks. MVLLaVA integrates multiple multi-view diffusion models with a large multimodal model, LLaVA, enabling it to handle a wide range of tasks efficiently. MVLLaV

Externí odkaz: http://arxiv.org/abs/2409.07129

Zobrazit plný text záznamu

Report

FAST Observations of Four Comets to Search for the Molecular Line Emissions between 1.0 and 1.5 GHz Frequencies

Autor: Chen, Long-Fei, Tsai, Chao-Wei, Li, Jian-Yang, Yang, Bin, Li, Di, Duan, Yan, Hsia, Chih-Hao, Pan, Zhichen, Qian, Lei, Quan, Donghui, Jiang, Xue-Jian, Li, Xiaohu, Zhao, Ruining, Zuo, Pei

We used the Five-hundred-meter Aperture Spherical radio Telescope (FAST) to search for the molecular emissions in the L-band between 1.0 and 1.5 GHz toward four comets, C/2020 F3 (NEOWISE), C/2020 R4 (ATLAS), C/2021 A1 (Leonard), and 67P/Churyumov-Ge

Externí odkaz: http://arxiv.org/abs/2409.06227

Zobrazit plný text záznamu

Report

ExpLLM: Towards Chain of Thought for Facial Expression Recognition

Autor: Lan, Xing, Xue, Jian, Qi, Ji, Jiang, Dongmei, Lu, Ke, Chua, Tat-Seng

Facial expression recognition (FER) is a critical task in multimedia with significant implications across various domains. However, analyzing the causes of facial expressions is essential for accurately recognizing them. Current approaches, such as t

Externí odkaz: http://arxiv.org/abs/2409.02828

Zobrazit plný text záznamu

Report

The RAdio Galaxy Environment Reference Survey (RAGERS): Evidence of an anisotropic distribution of submillimeter galaxies in the 4C 23.56 protocluster at z=2.48

Autor: Zhou, Dazhi, Greve, Thomas R., Gullberg, Bitten, Lee, Minju M., Di Mascolo, Luca, Dicker, Simon R., Romero, Charles E., Chapman, Scott C., Chen, Chian-Chou, Cornish, Thomas, Devlin, Mark J., Ho, Luis C., Kohno, Kotaro, Lagos, Claudia D. P., Mason, Brian S., Mroczkowski, Tony, Wagg, Jeff F. W., Wang, Q. Daniel, Wang, Ran, Brinch, Malte., Dannerbauer, Helmut, Jiang, Xue-Jian, Lauritsen, Lynge R. B., Vijayan, Aswin P., Vizgan, David, Wardlow, Julie L., Sarazin, Craig L., Sarmiento, Karen P., Serjeant, Stephen, Bhandarkar, Tanay A., Haridas, Saianeesh K., Moravec, Emily, Orlowski-Scherer, John, Sievers, Jonathan L. R., Tanaka, Ichi, Wang, Yu-Jan, Zeballos, Milagros, Laza-Ramos, Andres, Liu, Yuanqi, Hassan, Mohd Shaiful Rizal, Jwel, Abdul Kadir Md, Nazri, Affan Adly, Lim, Ming-Kang, Ibrahim, Ungku Ferwani Salwa Ungku

High-redshift radio(-loud) galaxies (H$z$RGs) are massive galaxies with powerful radio-loud active galactic nuclei (AGNs) and serve as beacons for protocluster identification. However, the interplay between H$z$RGs and the large-scale environment rem

Externí odkaz: http://arxiv.org/abs/2408.02177

Zobrazit plný text záznamu

Report

The Radio Galaxy Environment Reference Survey (RAGERS): a submillimetre study of the environments of massive radio-quiet galaxies at $z = 1{\rm -}3$

Autor: Cornish, Thomas M., Wardlow, Julie L., Greve, Thomas R., Chapman, Scott, Chen, Chian-Chou, Dannerbauer, Helmut, Goto, Tomotsugu, Gullberg, Bitten, Ho, Luis C., Jiang, Xue-Jian, Lagos, Claudia, Lee, Minju, Serjeant, Stephen, Shim, Hyunjin, Smith, Daniel J. B., Vijayan, Aswin, Wagg, Jeff, Zhou, Dazhi

Publikováno v: Monthly Notices of the Royal Astronomical Society, Vol. 533, Issue 1 (2024) pp. 1032-1044

Measuring the environments of massive galaxies at high redshift is crucial to understanding galaxy evolution and the conditions that gave rise to the distribution of matter we see in the Universe today. While high-$z$ radio galaxies (H$z$RGs) and qua

Externí odkaz: http://arxiv.org/abs/2407.21099

Zobrazit plný text záznamu

Report

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Autor: Wang, Peidong, Xue, Jian, Li, Jinyu, Chen, Junkun, Subramanian, Aswin Shanmugam

Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In s

Externí odkaz: http://arxiv.org/abs/2406.10276

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání