Výsledky vyhledávání

Akademický článek

Autor: 張榮芳

Publikováno v: Theology Annual. 2021, Vol. 42, p91-116. 26p.

Report

Large Action Models: From Inception to Implementation

Autor: Wang, Lu, Yang, Fangkai, Zhang, Chaoyun, Lu, Junting, Qian, Jiaxu, He, Shilin, Zhao, Pu, Qiao, Bo, Huang, Ray, Qin, Si, Su, Qisheng, Ye, Jiayi, Zhang, Yudi, Lou, Jian-Guang, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Lang

Externí odkaz: http://arxiv.org/abs/2412.10047

Zobrazit plný text záznamu

Report

Squeezing and Entanglement Dynamics in Phase-Sensitive Non-Hermitian Systems

Autor: Huang, Ruicong, Wang, Wencong, Liang, Yuyang, Liu, Dongmei, Gu, Min

Over the past decade, parity-time (PT) symmetry and anti-PT (APT) symmetry in various physical systems have been extensively studied, leading to significant experimental and theoretical advancements. However, physical systems that simultaneously exhi

Externí odkaz: http://arxiv.org/abs/2412.03926

Zobrazit plný text záznamu

Report

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Autor: Li, Yan, Xing, Yifei, Lan, Xiangyuan, Li, Xin, Chen, Haifeng, Jiang, Dongmei

Cross-modal alignment is crucial for multimodal representation fusion due to the inherent heterogeneity between modalities. While Transformer-based methods have shown promising results in modeling inter-modal relationships, their quadratic computatio

Externí odkaz: http://arxiv.org/abs/2412.00833

Zobrazit plný text záznamu

Report

Large Language Model-Brained GUI Agents: A Survey

Autor: Zhang, Chaoyun, He, Shilin, Qian, Jiaxu, Li, Bowen, Li, Liqun, Qin, Si, Kang, Yu, Ma, Minghua, Liu, Guyue, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. The

Externí odkaz: http://arxiv.org/abs/2411.18279

Zobrazit plný text záznamu

Report

Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents

Autor: Chae, Joongwon, Wang, Zhenyu, Zhang, Lian, Yu, Dongmei, Qin, Peiwu

Recent advances in multimodal models have demonstrated impressive capabilities in object recognition and scene understanding. However, these models often struggle with precise spatial localization - a critical capability for real-world applications.

Externí odkaz: http://arxiv.org/abs/2411.18270

Zobrazit plný text záznamu

Report

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Autor: Kan, Zhehan, Zhang, Ce, Liao, Zihan, Tian, Yapeng, Yang, Wenming, Xiao, Junyuan, Li, Xu, Jiang, Dongmei, Wang, Yaowei, Liao, Qingmin

Large Vision-Language Model (LVLM) systems have demonstrated impressive vision-language reasoning capabilities but suffer from pervasive and severe hallucination issues, posing significant risks in critical domains such as healthcare and autonomous s

Externí odkaz: http://arxiv.org/abs/2411.12713

Zobrazit plný text záznamu

Report

Sharingan: Extract User Action Sequence from Desktop Recordings

Autor: Chen, Yanting, Ren, Yi, Qin, Xiaoting, Zhang, Jue, Yuan, Kehong, Han, Lu, Lin, Qingwei, Zhang, Dongmei, Rajmohan, Saravan, Zhang, Qi

Video recordings of user activities, particularly desktop recordings, offer a rich source of data for understanding user behaviors and automating processes. However, despite advancements in Vision-Language Models (VLMs) and their increasing use in vi

Externí odkaz: http://arxiv.org/abs/2411.08768

Zobrazit plný text záznamu

Report

One-Sided Device-Independent Random Number Generation Through Fiber Channels

Autor: Zhang, Jinfang, Li, Yi, Zhao, Mengyu, Han, Dongmei, Liu, Jun, Wang, Meihong, Gong, Qihuang, Xiang, Yu, He, Qiongyi, Su, Xiaolong

Randomness is an essential resource and plays important roles in various applications ranging from cryptography to simulation of complex systems. Certified randomness from quantum process is ensured to have the element of privacy but usually relies o

Externí odkaz: http://arxiv.org/abs/2411.08441

Zobrazit plný text záznamu

Report

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Autor: Yousefi, Midia, Qian, Yao, Chen, Junkun, Wang, Gang, Liu, Yanqing, Wang, Dongmei, Wang, Xiaofei, Xue, Jian

End-to-end speech translation (ST), which translates source language speech directly into target language text, has garnered significant attention in recent years. Many ST applications require strict length control to ensure that the translation dura

Externí odkaz: http://arxiv.org/abs/2411.07387

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání