Zobrazeno 1 - 10
of 34
pro vyhledávání: '"Guo, Ruohao"'
Autor:
Guo, Ruohao, Qu, Liao, Niu, Dantong, Qi, Yanyu, Yue, Wenzhen, Shi, Ji, Xing, Bowei, Ying, Xianghua
Audio-visual semantic segmentation (AVSS) aims to segment and classify sounding objects in videos with acoustic cues. However, most approaches operate on the close-set assumption and only identify pre-defined categories from training data, lacking th
Externí odkaz:
http://arxiv.org/abs/2407.21721
Autor:
Du, Jiangshu, Wang, Yibo, Zhao, Wenting, Deng, Zhongfen, Liu, Shuaiqi, Lou, Renze, Zou, Henry Peng, Venkit, Pranav Narayanan, Zhang, Nan, Srinath, Mukund, Zhang, Haoran Ranran, Gupta, Vipul, Li, Yinghui, Li, Tao, Wang, Fei, Liu, Qin, Liu, Tianlin, Gao, Pengzhi, Xia, Congying, Xing, Chen, Cheng, Jiayang, Wang, Zhaowei, Su, Ying, Shah, Raj Sanjay, Guo, Ruohao, Gu, Jing, Li, Haoran, Wei, Kangda, Wang, Zihao, Cheng, Lu, Ranathunga, Surangika, Fang, Meng, Fu, Jie, Liu, Fei, Huang, Ruihong, Blanco, Eduardo, Cao, Yixin, Zhang, Rui, Yu, Philip S., Yin, Wenpeng
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many rout
Externí odkaz:
http://arxiv.org/abs/2406.16253
Autor:
Yue, Wenzhen, Ying, Xianghua, Guo, Ruohao, Chen, DongDong, Shi, Ji, Xing, Bowei, Zhu, Yuqing, Chen, Taiyan
In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our
Externí odkaz:
http://arxiv.org/abs/2404.18948
Autor:
Fu, Deqing, Guo, Ruohao, Khalighinejad, Ghazal, Liu, Ollie, Dhingra, Bhuwan, Yogatama, Dani, Jia, Robin, Neiswanger, Willie
Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs. But do their capabilities change depending on the input modality? In this work, we propose $\textbf{IsoBench}$, a benchm
Externí odkaz:
http://arxiv.org/abs/2404.01266
In this paper, we propose a new multi-modal task, namely audio-visual instance segmentation (AVIS), in which the goal is to identify, segment, and track individual sounding object instances in audible videos, simultaneously. To our knowledge, it is t
Externí odkaz:
http://arxiv.org/abs/2310.18709
Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events. Recent methods for this task leverage the attention mechanism to capture the semantic correlations
Externí odkaz:
http://arxiv.org/abs/2310.07517
In this paper, we study the task of instructional dialogue and focus on the cooking domain. Analyzing the generated output of the GPT-J model, we reveal that the primary challenge for a recipe-grounded dialog system is how to provide the instructions
Externí odkaz:
http://arxiv.org/abs/2305.17280
Language style is often used by writers to convey their intentions, identities, and mastery of language. In this paper, we show that current large language models struggle to capture some language styles without fine-tuning. To address this challenge
Externí odkaz:
http://arxiv.org/abs/2305.14592
Publikováno v:
Nanophotonics, Vol 13, Iss 1, Pp 9-18 (2023)
In this paper, we report the use of femtosecond radially polarized vortex laser with MHz repetition rate for direct writing of cladding waveguides (WGs) and realization of waveguide laser oscillations in ytterbium-doped calcium fluoride crystal. The
Externí odkaz:
https://doaj.org/article/d040bee7d07d497ba7e78e2a83123bd4
Images, captured by a camera, play a critical role in training Deep Neural Networks (DNNs). Usually, we assume the images acquired by cameras are consistent with the ones perceived by human eyes. However, due to the different physical mechanisms betw
Externí odkaz:
http://arxiv.org/abs/2110.10444