Výsledky vyhledávání

Report

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Autor: Zhang, Jiwen, Wu, Jihao, Teng, Yihua, Liao, Minghui, Xu, Nuo, Xiao, Xiao, Wei, Zhongyu, Tang, Duyu

Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of actions of API. Even though the task highly relies on past actions and visual o

Externí odkaz: http://arxiv.org/abs/2403.02713

Zobrazit plný text záznamu

Report

Emage: Non-Autoregressive Text-to-Image Generation

Autor: Feng, Zhangyin, Hu, Runyi, Liu, Liangxin, Zhang, Fan, Tang, Duyu, Dai, Yong, Feng, Xiaocheng, Li, Jiwei, Qin, Bing, Shi, Shuming

Autoregressive and diffusion models drive the recent breakthroughs on text-to-image generation. Despite their huge success of generating high-realistic images, a common shortcoming of these models is their high inference latency - autoregressive mode

Externí odkaz: http://arxiv.org/abs/2312.14988

Zobrazit plný text záznamu

Report

SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills

Autor: Feng, Zhangyin, Dai, Yong, Zhang, Fan, Tang, Duyu, Feng, Xiaocheng, Wu, Shuangzhi, Qin, Bing, Cao, Yunbo, Shi, Shuming

Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, whic

Externí odkaz: http://arxiv.org/abs/2306.16176

Zobrazit plný text záznamu

Report

Improved Visual Story Generation with Adaptive Context Modeling

Autor: Feng, Zhangyin, Ren, Yuchen, Yu, Xinmiao, Feng, Xiaocheng, Tang, Duyu, Shi, Shuming, Qin, Bing

Diffusion models developed on top of powerful text-to-image generation models like Stable Diffusion achieve remarkable success in visual story generation. However, the best-performing approach considers historically generated results as flattened mem

Externí odkaz: http://arxiv.org/abs/2305.16811

Zobrazit plný text záznamu

Report

STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training

Autor: Zhong, Weihong, Zheng, Mao, Tang, Duyu, Luo, Xuan, Gong, Heng, Feng, Xiaocheng, Qin, Bing

Although large-scale video-language pre-training models, which usually build a global alignment between the video and the text, have achieved remarkable progress on various downstream tasks, the idea of adopting fine-grained information during the pr

Externí odkaz: http://arxiv.org/abs/2302.09736

Zobrazit plný text záznamu

Report

Effidit: Your AI Writing Assistant

Autor: Shi, Shuming, Zhao, Enbo, Tang, Duyu, Wang, Yan, Li, Piji, Bi, Wei, Jiang, Haiyun, Huang, Guoping, Cui, Leyang, Huang, Xinting, Zhou, Cong, Dai, Yong, Ma, Dongyang

In this technical report, we introduce Effidit (Efficient and Intelligent Editing), a digital writing assistant that facilitates users to write higher-quality text more efficiently by using artificial intelligence (AI) technologies. Previous writing

Externí odkaz: http://arxiv.org/abs/2208.01815

Zobrazit plný text záznamu

Report

One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code

Autor: Dai, Yong, Tang, Duyu, Liu, Liangxin, Tan, Minghuan, Zhou, Cong, Wang, Jingquan, Feng, Zhangyin, Zhang, Fan, Hu, Xueyu, Shi, Shuming

People perceive the world with multiple senses (e.g., through hearing sounds, reading words and seeing objects). However, most existing AI systems only process an individual modality. This paper presents an approach that excels at handling multiple m

Externí odkaz: http://arxiv.org/abs/2205.06126

Zobrazit plný text záznamu

Report

SkillNet-NLG: General-Purpose Natural Language Generation with a Sparsely Activated Approach

Autor: Liao, Junwei, Tang, Duyu, Zhang, Fan, Shi, Shuming

We present SkillNet-NLG, a sparsely activated approach that handles many natural language generation tasks with one model. Different from traditional dense models that always activate all the parameters, SkillNet-NLG selectively activates relevant pa

Externí odkaz: http://arxiv.org/abs/2204.12184

Zobrazit plný text záznamu

Report

Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

Autor: Zhou, Cong, Dai, Yong, Tang, Duyu, Zhao, Enbo, Feng, Zhangyin, Kuang, Li, Shi, Shuming

Chinese BERT models achieve remarkable progress in dealing with grammatical errors of word substitution. However, they fail to handle word insertion and deletion because BERT assumes the existence of a word at each position. To address this, we prese

Externí odkaz: http://arxiv.org/abs/2204.12052

Zobrazit plný text záznamu

Report

MarkBERT: Marking Word Boundaries Improves Chinese BERT

Autor: Li, Linyang, Dai, Yong, Tang, Duyu, Qiu, Xipeng, Xu, Zenglin, Shi, Shuming

We present a Chinese BERT model dubbed MarkBERT that uses word information in this work. Existing word-based BERT models regard words as basic units, however, due to the vocabulary limit of BERT, they only cover high-frequency words and fall back to

Externí odkaz: http://arxiv.org/abs/2203.06378

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání