Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Zhuang, Jimin"'
Autor:
Tang, Changli, Li, Yixuan, Yang, Yudong, Zhuang, Jimin, Sun, Guangzhi, Li, Wei, Ma, Zujun, Zhang, Chao
Videos contain a wealth of information, and generating detailed and accurate descriptions in natural language is a key aspect of video understanding. In this paper, we present video-SALMONN 2, an advanced audio-visual large language model (LLM) with
Externí odkaz:
http://arxiv.org/abs/2410.06682
Autor:
Wang, Siyin, Yu, Wenyi, Yang, Yudong, Tang, Changli, Li, Yixuan, Zhuang, Jimin, Chen, Xianzhao, Tian, Xiaohai, Zhang, Jun, Sun, Guangzhi, Lu, Lu, Zhang, Chao
Speech quality assessment typically requires evaluating audio from multiple aspects, such as mean opinion score (MOS) and speaker similarity (SIM) etc., which can be challenging to cover using one small model designed for a single task. In this paper
Externí odkaz:
http://arxiv.org/abs/2409.16644