Zobrazeno 1 - 10
of 2 250
pro vyhledávání: '"Lantian A"'
Manually navigating lengthy videos to seek information or answer questions can be a tedious and time-consuming task for users. We introduce StoryNavi, a novel system powered by VLLMs for generating customised video play experiences by retrieving mate
Externí odkaz:
http://arxiv.org/abs/2410.03207
In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention. Key components of this research include tasks such as lip reading, audio-visual speech recognition, and visual-to-speech synthesis.
Externí odkaz:
http://arxiv.org/abs/2409.19575
Context: With the waning of Moore's Law, the software industry is placing increasing importance on finding alternative solutions for continuous performance enhancement. The significance and research results of software performance optimization have b
Externí odkaz:
http://arxiv.org/abs/2408.12948
Autor:
Cheng, Yujie, Li, Xiaoting, Feng, Lantian, Li, Haochuan, Sun, Wenzhao, Song, Xinyu, Ding, Yuyang, Guo, Guangcan, Wang, Cheng, Ren, Xifeng
Periodically poled thin-film lithium niobate (TFLN) waveguides, which enable efficient quadratic nonlinear processes, serve as crucial foundation for classical and quantum signal processing with photonic integrated circuits. To expand their applicati
Externí odkaz:
http://arxiv.org/abs/2408.05907
Autor:
Wadayama, Tadashi, Wei, Lantian
This paper presents the Gradient Flow (GF) decoding for LDPC codes. GF decoding, a continuous-time methodology based on gradient flow, employs a potential energy function associated with bipolar codewords of LDPC codes. The decoding process of the GF
Externí odkaz:
http://arxiv.org/abs/2408.00293
Hyperbolic embeddings are a class of representation learning methods that offer competitive performances when data can be abstracted as a tree-like graph. However, in practice, learning hyperbolic embeddings of hierarchical data is difficult due to t
Externí odkaz:
http://arxiv.org/abs/2407.16641
Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spo
Externí odkaz:
http://arxiv.org/abs/2407.06078
Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied o
Externí odkaz:
http://arxiv.org/abs/2407.03966
A signed graph is a graph where each edge receives a sign, positive or negative. The signed graph model has been used in many real applications, such as protein complex discovery and social network analysis. Finding cohesive subgraphs in signed graph
Externí odkaz:
http://arxiv.org/abs/2406.16268
The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR
Externí odkaz:
http://arxiv.org/abs/2406.10313