Zobrazeno 1 - 10
of 3 348
pro vyhledávání: '"LIU Xudong"'
Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search, which aims to break the barriers between modality and language simultaneously and achieves image-text retrieval in the multi-lingual scenario with a single model. In recent
Externí odkaz:
http://arxiv.org/abs/2406.18254
Autor:
Anastassiou, Philip, Chen, Jiawei, Chen, Jitong, Chen, Yuanzhe, Chen, Zhuo, Chen, Ziyi, Cong, Jian, Deng, Lelai, Ding, Chuang, Gao, Lu, Gong, Mingqing, Huang, Peisong, Huang, Qingqing, Huang, Zhiying, Huo, Yuanyuan, Jia, Dongya, Li, Chumin, Li, Feiya, Li, Hui, Li, Jiaxin, Li, Xiaoyang, Li, Xingxing, Liu, Lin, Liu, Shouda, Liu, Sichao, Liu, Xudong, Liu, Yuchen, Liu, Zhengxi, Lu, Lu, Pan, Junjie, Wang, Xin, Wang, Yuping, Wang, Yuxuan, Wei, Zhen, Wu, Jian, Yao, Chao, Yang, Yifeng, Yi, Yuanhao, Zhang, Junteng, Zhang, Qidi, Zhang, Shuo, Zhang, Wenjie, Zhang, Yang, Zhao, Zilin, Zhong, Dejian, Zhuang, Xiaobin
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in sp
Externí odkaz:
http://arxiv.org/abs/2406.02430
Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations to identify sparse facial landmarks in the absence of annotated data. To tackle this task, existing state-of-the-art (SOT
Externí odkaz:
http://arxiv.org/abs/2405.18322
Object detection with event cameras benefits from the sensor's low latency and high dynamic range. However, it is costly to fully label event streams for supervised training due to their high temporal resolution. To reduce this cost, we present LEOD,
Externí odkaz:
http://arxiv.org/abs/2311.17286
Current methods for Knowledge-Based Question Answering (KBQA) usually rely on complex training techniques and model frameworks, leading to many limitations in practical applications. Recently, the emergence of In-Context Learning (ICL) capabilities i
Externí odkaz:
http://arxiv.org/abs/2309.04695
Live commerce is the act of selling products online through live streaming. The customer's diverse demands for online products introduce more challenges to Livestreaming Product Recognition. Previous works have primarily focused on fashion clothing d
Externí odkaz:
http://arxiv.org/abs/2308.04912
This work presents our solutions to the Algonauts Project 2023 Challenge. The primary objective of the challenge revolves around employing computational models to anticipate brain responses captured during participants' observation of intricate natur
Externí odkaz:
http://arxiv.org/abs/2308.00262
Recent advances in zero-shot and few-shot classification heavily rely on the success of pre-trained vision-language models (VLMs) such as CLIP. Due to a shortage of large-scale datasets, training such models for event camera data remains infeasible.
Externí odkaz:
http://arxiv.org/abs/2306.06354
Phase modulation plays a crucial role in various terahertz applications, including biomedical imaging, high-rate communication, and radar detection. Existing terahertz phase shifters typically rely on tuning the resonant effect of metamaterial struct
Externí odkaz:
http://arxiv.org/abs/2305.10632
Autor:
Liao, Tingting, Zhang, Xiaomei, Xiu, Yuliang, Yi, Hongwei, Liu, Xudong, Qi, Guo-Jun, Zhang, Yong, Wang, Xuan, Zhu, Xiangyu, Lei, Zhen
This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a
Externí odkaz:
http://arxiv.org/abs/2304.03903