Výsledky vyhledávání

Report

Open-Vocabulary Audio-Visual Semantic Segmentation

Autor: Guo, Ruohao, Qu, Liao, Niu, Dantong, Qi, Yanyu, Yue, Wenzhen, Shi, Ji, Xing, Bowei, Ying, Xianghua

Audio-visual semantic segmentation (AVSS) aims to segment and classify sounding objects in videos with acoustic cues. However, most approaches operate on the close-set assumption and only identify pre-defined categories from training data, lacking th

Externí odkaz: http://arxiv.org/abs/2407.21721

Zobrazit plný text záznamu

Report

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many rout

Externí odkaz: http://arxiv.org/abs/2406.16253

Zobrazit plný text záznamu

Report

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

Autor: Yue, Wenzhen, Ying, Xianghua, Guo, Ruohao, Chen, DongDong, Shi, Ji, Xing, Bowei, Zhu, Yuqing, Chen, Taiyan

In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our

Externí odkaz: http://arxiv.org/abs/2404.18948

Zobrazit plný text záznamu

Report

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

Autor: Fu, Deqing, Guo, Ruohao, Khalighinejad, Ghazal, Liu, Ollie, Dhingra, Bhuwan, Yogatama, Dani, Jia, Robin, Neiswanger, Willie

Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs. But do their capabilities change depending on the input modality? In this work, we propose $\textbf{IsoBench}$, a benchm

Externí odkaz: http://arxiv.org/abs/2404.01266

Zobrazit plný text záznamu

Report

Audio-Visual Instance Segmentation

Autor: Guo, Ruohao, Chen, Yaru, Qi, Yanyu, Yue, Wenzhen, Niu, Dantong, Ying, Xianghua

In this paper, we propose a new multi-modal task, namely audio-visual instance segmentation (AVIS), in which the goal is to identify, segment, and track individual sounding object instances in audible videos, simultaneously. To our knowledge, it is t

Externí odkaz: http://arxiv.org/abs/2310.18709

Zobrazit plný text záznamu

Report

CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing

Autor: Chen, Yaru, Guo, Ruohao, Liu, Xubo, Wu, Peipei, Li, Guangyao, Li, Zhenbo, Wang, Wenwu

Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events. Recent methods for this task leverage the attention mechanism to capture the semantic correlations

Externí odkaz: http://arxiv.org/abs/2310.07517

Zobrazit plný text záznamu

Report

Improved Instruction Ordering in Recipe-Grounded Conversation

Autor: Le, Duong Minh, Guo, Ruohao, Xu, Wei, Ritter, Alan

In this paper, we study the task of instructional dialogue and focus on the cooking domain. Analyzing the generated output of the GPT-J model, we reveal that the primary challenge for a recipe-grounded dialog system is how to provide the instructions

Externí odkaz: http://arxiv.org/abs/2305.17280

Zobrazit plný text záznamu

Report

Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding

Autor: Guo, Ruohao, Xu, Wei, Ritter, Alan

Language style is often used by writers to convey their intentions, identities, and mastery of language. In this paper, we show that current large language models struggle to capture some language styles without fine-tuning. To address this challenge

Externí odkaz: http://arxiv.org/abs/2305.14592

Zobrazit plný text záznamu

Akademický článek

MHz repetition rate femtosecond radially polarized vortex laser direct writing Yb:CaF2 waveguide laser operating in continuous-wave and pulsed regimes

Autor: Liu Kaixin, Dong Yue, Zhang Zihao, Duan Xinghao, Guo Ruohao, Zhai Zhongjun, Wang Junli

Publikováno v: Nanophotonics, Vol 13, Iss 1, Pp 9-18 (2023)

In this paper, we report the use of femtosecond radially polarized vortex laser with MHz repetition rate for direct writing of cladding waveguides (WGs) and realization of waveguide laser oscillations in ytterbium-doped calcium fluoride crystal. The

Externí odkaz: https://doaj.org/article/d040bee7d07d497ba7e78e2a83123bd4

Zobrazit plný text záznamu

Report

Moir\'e Attack (MA): A New Potential Risk of Screen Photos

Autor: Niu, Dantong, Guo, Ruohao, Wang, Yisen

Images, captured by a camera, play a critical role in training Deep Neural Networks (DNNs). Usually, we assume the images acquired by cameras are consistent with the ones perceived by human eyes. However, due to the different physical mechanisms betw

Externí odkaz: http://arxiv.org/abs/2110.10444

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání