Výsledky vyhledávání - "Zhang Shi-An"

Report

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Autor: Winata, Genta Indra, Hudi, Frederikus, Irawan, Patrick Amadeus, Anugraha, David, Putri, Rifki Afina, Wang, Yutong, Nohejl, Adam, Prathama, Ubaidillah Ariq, Ousidhoum, Nedjma, Amriani, Afifa, Rzayev, Anar, Das, Anirban, Pramodya, Ashmari, Adila, Aulia, Wilie, Bryan, Mawalim, Candy Olivia, Cheng, Ching Lam, Abolade, Daud, Chersoni, Emmanuele, Santus, Enrico, Ikhwantri, Fariz, Kuwanto, Garry, Zhao, Hanyang, Wibowo, Haryo Akbarianto, Lovenia, Holy, Cruz, Jan Christian Blaise, Putra, Jan Wira Gotama, Myung, Junho, Susanto, Lucky, Machin, Maria Angelica Riera, Zhukova, Marina, Anugraha, Michael, Adilazuarda, Muhammad Farid, Santosa, Natasha, Limkonchotiwat, Peerat, Dabre, Raj, Audino, Rio Alexander, Cahyawijaya, Samuel, Zhang, Shi-Xiong, Salim, Stephanie Yulia, Zhou, Yi, Gui, Yinxuan, Adelani, David Ifeoluwa, Lee, En-Shiun Annie, Okada, Shogo, Purwarianti, Ayu, Aji, Alham Fikri, Watanabe, Taro, Wijaya, Derry Tanti, Oh, Alice, Ngo, Chong-Wah

Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a mas

Externí odkaz: http://arxiv.org/abs/2410.12705

Zobrazit plný text záznamu

Report

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Autor: Zhao, Hanyang, Winata, Genta Indra, Das, Anirban, Zhang, Shi-Xiong, Yao, David D., Tang, Wenpin, Sahu, Sambit

Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding

Externí odkaz: http://arxiv.org/abs/2410.04203

Zobrazit plný text záznamu

Report

LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

Autor: Veldanda, Akshaj Kumar, Zhang, Shi-Xiong, Das, Anirban, Chakraborty, Supriyo, Rawls, Stephen, Sahu, Sambit, Naphade, Milind

Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to un

Externí odkaz: http://arxiv.org/abs/2409.13054

Zobrazit plný text záznamu

Report

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Autor: Winata, Genta Indra, Zhao, Hanyang, Das, Anirban, Tang, Wenpin, Yao, David D., Zhang, Shi-Xiong, Sahu, Sambit

Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into t

Externí odkaz: http://arxiv.org/abs/2409.11564

Zobrazit plný text záznamu

Report

Imaginary-time Mpemba effect in quantum many-body systems

Autor: Chang, Wei-Xuan, Yin, Shuai, Zhang, Shi-Xin, Li, Zi-Xiang

Various exotic phenomena emerge in non-equilibrium quantum many-body systems. The Mpemba effect, denoting the situation where a hot system freezes faster than the colder one, is a counterintuitive non-equilibrium phenomenon that has attracted endurin

Externí odkaz: http://arxiv.org/abs/2409.06547

Zobrazit plný text záznamu

Report

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Autor: Jin, Zengrui, Yang, Yifan, Shi, Mohan, Kang, Wei, Yang, Xiaoyu, Yao, Zengwei, Kuang, Fangjun, Guo, Liyong, Meng, Lingwei, Lin, Long, Xu, Yong, Zhang, Shi-Xiong, Povey, Daniel

The evolving speech processing landscape is increasingly focused on complex scenarios like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions. Existing methodologies for addressing these challenges fall into two

Externí odkaz: http://arxiv.org/abs/2409.00819

Zobrazit plný text záznamu

Report

Comparing Discrete and Continuous Space LLMs for Speech Recognition

Autor: Xu, Yaoxun, Zhang, Shi-Xiong, Yu, Jianwei, Wu, Zhiyong, Yu, Dong

This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervis

Externí odkaz: http://arxiv.org/abs/2409.00800

Zobrazit plný text záznamu

Report

Advancing Multi-talker ASR Performance with Large Language Models

Autor: Shi, Mohan, Jin, Zengrui, Xu, Yaoxun, Xu, Yong, Zhang, Shi-Xiong, Wei, Kun, Shao, Yiwen, Zhang, Chunlei, Yu, Dong

Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker ASR, with th

Externí odkaz: http://arxiv.org/abs/2408.17431

Zobrazit plný text záznamu

Report

Quantum Mpemba effects in many-body localization systems

Autor: Liu, Shuo, Zhang, Hao-Kai, Yin, Shuai, Zhang, Shi-Xin, Yao, Hong

The nonequilibrium dynamics of quantum many-body systems have attracted growing attention due to various intriguing phenomena absent in equilibrium physics. One famous example is the quantum Mpemba effect, where the subsystem symmetry is restored fas

Externí odkaz: http://arxiv.org/abs/2408.07750

Zobrazit plný text záznamu

Report

Video-Language Alignment via Spatio-Temporal Graph Transformer

Autor: Zhang, Shi-Xue, Wang, Hongfa, Zhu, Xiaobin, Gu, Weibo, Zhang, Tianjin, Yang, Chun, Liu, Wei, Yin, Xu-Cheng

Video-language alignment is a crucial multi-modal task that benefits various downstream applications, e.g., video-text retrieval and video question answering. Existing methods either utilize multi-modal information in video-text pairs or apply global

Externí odkaz: http://arxiv.org/abs/2407.11677

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání