Výsledky vyhledávání

Report

StoryNavi: On-Demand Narrative-Driven Reconstruction of Video Play With Generative AI

Autor: Xu, Alston Lantian, Ma, Tianwei, Liu, Tianmeng, Liu, Can, Cassinelli, Alvaro

Manually navigating lengthy videos to seek information or answer questions can be a tedious and time-consuming task for users. We introduce StoryNavi, a novel system powered by VLLMs for generating customised video play experiences by retrieving mate

Externí odkaz: http://arxiv.org/abs/2410.03207

Zobrazit plný text záznamu

Report

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

Autor: Chen, Chen, Li, Xiaolou, Liu, Zehua, Li, Lantian, Wang, Dong

In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention. Key components of this research include tasks such as lip reading, audio-visual speech recognition, and visual-to-speech synthesis.

Externí odkaz: http://arxiv.org/abs/2409.19575

Zobrazit plný text záznamu

Report

E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group

Autor: Pan, Yue, Lyu, Chen, Yang, Zhenyu, Li, Lantian, Liu, Qi, Shao, Xiuting

Context: With the waning of Moore's Law, the software industry is placing increasing importance on finding alternative solutions for continuous performance enhancement. The significance and research results of software performance optimization have b

Externí odkaz: http://arxiv.org/abs/2408.12948

Zobrazit plný text záznamu

Report

Cryogenic nonlinear conversion processes in periodically-poled thin-film lithium niobate waveguides

Autor: Cheng, Yujie, Li, Xiaoting, Feng, Lantian, Li, Haochuan, Sun, Wenzhao, Song, Xinyu, Ding, Yuyang, Guo, Guangcan, Wang, Cheng, Ren, Xifeng

Periodically poled thin-film lithium niobate (TFLN) waveguides, which enable efficient quadratic nonlinear processes, serve as crucial foundation for classical and quantum signal processing with photonic integrated circuits. To expand their applicati

Externí odkaz: http://arxiv.org/abs/2408.05907

Zobrazit plný text záznamu

Report

Gradient Flow Decoding

Autor: Wadayama, Tadashi, Wei, Lantian

This paper presents the Gradient Flow (GF) decoding for LDPC codes. GF decoding, a continuous-time methodology based on gradient flow, employs a potential energy function associated with bipolar codewords of LDPC codes. The decoding process of the GF

Externí odkaz: http://arxiv.org/abs/2408.00293

Zobrazit plný text záznamu

Report

A Geometry-Aware Algorithm to Learn Hierarchical Embeddings in Hyperbolic Space

Autor: Wang, Zhangyu, Xu, Lantian, Kong, Zhifeng, Wang, Weilong, Peng, Xuyu, Zheng, Enyang

Hyperbolic embeddings are a class of representation learning methods that offer competitive performances when data can be abstracted as a tree-like graph. However, in practice, learning hyperbolic embeddings of hierarchical data is difficult due to t

Externí odkaz: http://arxiv.org/abs/2407.16641

Zobrazit plný text záznamu

Report

Few-Shot Keyword Spotting from Mixed Speech

Autor: Yuan, Junming, Shi, Ying, Li, LanTian, Wang, Dong, Hamdulla, Askar

Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spo

Externí odkaz: http://arxiv.org/abs/2407.06078

Zobrazit plný text záznamu

Report

Serialized Output Training by Learned Dominance

Autor: Shi, Ying, Li, Lantian, Yin, Shi, Wang, Dong, Han, Jiqing

Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied o

Externí odkaz: http://arxiv.org/abs/2407.03966

Zobrazit plný text záznamu

Report

Efficient Antagonistic k-plex Enumeration in Signed Graphs

Autor: Xu, Lantian, Li, Rong-Hua, Wen, Dong, Dai, Qiangqiang, Wang, Guoren, Qin, Lu

A signed graph is a graph where each edge receives a sign, positive or negative. The signed graph model has been used in many real applications, such as protein complex discovery and social network analysis. Finding cohesive subgraphs in signed graph

Externí odkaz: http://arxiv.org/abs/2406.16268

Zobrazit plný text záznamu

Report

CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge

Autor: Chen, Chen, Liu, Zehua, Li, Xiaolou, Li, Lantian, Wang, Dong

The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR

Externí odkaz: http://arxiv.org/abs/2406.10313

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání