Zobrazeno 1 - 10
of 461
pro vyhledávání: '"Xu, Yufei"'
ControlNet excels at creating content that closely matches precise contours in user-provided masks. However, when these masks contain noise, as a frequent occurrence with non-expert users, the output would include unwanted artifacts. This paper first
Externí odkaz:
http://arxiv.org/abs/2403.00467
Animal Pose Estimation and Tracking (APT) is a critical task in detecting and monitoring the keypoints of animals across a series of video frames, which is essential for understanding animal behavior. Past works relating to animals have primarily foc
Externí odkaz:
http://arxiv.org/abs/2312.15612
Diffusion models have achieved remarkable success in generating realistic images but suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This difficulty arises from the complex task of learning the physic
Externí odkaz:
http://arxiv.org/abs/2311.17957
Autor:
Chen, Tao, Lv, Liang, Wang, Di, Zhang, Jing, Yang, Yue, Zhao, Zeyang, Wang, Chen, Guo, Xiaowei, Chen, Hao, Wang, Qingye, Xu, Yufei, Zhang, Qiming, Du, Bo, Zhang, Liangpei, Tao, Dacheng
With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages. Recently, artificial intelligence (AI) techniques such as deep le
Externí odkaz:
http://arxiv.org/abs/2305.01899
Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint. However, the design of hand-crafted windows, which is data-agnostic, constrains the
Externí odkaz:
http://arxiv.org/abs/2303.15105
In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability
Externí odkaz:
http://arxiv.org/abs/2212.04246
Autor:
Kiefer, Benjamin, Kristan, Matej, Perš, Janez, Žust, Lojze, Poiesi, Fabio, Andrade, Fabio Augusto de Alcantara, Bernardino, Alexandre, Dawkins, Matthew, Raitoharju, Jenni, Quan, Yitong, Atmaca, Adem, Höfer, Timon, Zhang, Qiming, Xu, Yufei, Zhang, Jing, Tao, Dacheng, Sommer, Lars, Spraul, Raphael, Zhao, Hangyue, Zhang, Hongpu, Zhao, Yanyun, Augustin, Jan Lukas, Jeon, Eui-ik, Lee, Impyeong, Zedda, Luca, Loddo, Andrea, Di Ruberto, Cecilia, Verma, Sagar, Gupta, Siddharth, Muralidhara, Shishir, Hegde, Niharika, Xing, Daitao, Evangeliou, Nikolaos, Tzes, Anthony, Bartl, Vojtěch, Špaňhel, Jakub, Herout, Adam, Bhowmik, Neelanjan, Breckon, Toby P., Kundargi, Shivanand, Anvekar, Tejas, Desai, Chaitra, Tabib, Ramesh Ashok, Mudengudi, Uma, Vats, Arpita, Song, Yang, Liu, Delong, Li, Yonglin, Li, Shuman, Tan, Chenhao, Lan, Long, Somers, Vladimir, De Vleeschouwer, Christophe, Alahi, Alexandre, Huang, Hsiang-Wei, Yang, Cheng-Yen, Hwang, Jenq-Neng, Kim, Pyong-Kun, Kim, Kwangju, Lee, Kyoungoh, Jiang, Shuai, Li, Haiwen, Ziqiang, Zheng, Vu, Tuan-Anh, Nguyen-Truong, Hai, Yeung, Sai-Kit, Jia, Zhuang, Yang, Sophia, Hsu, Chih-Chung, Hou, Xiu-Yu, Jhang, Yu-An, Yang, Simon, Yang, Mau-Tsuen
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritim
Externí odkaz:
http://arxiv.org/abs/2211.13508
Self-supervised pre-training vision transformer (ViT) via masked image modeling (MIM) has been proven very effective. However, customized algorithms should be carefully designed for the hierarchical ViTs, e.g., GreenMIM, instead of using the vanilla
Externí odkaz:
http://arxiv.org/abs/2211.01785
Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers being the primary choice due to their good scalability and representation ability. However, large-scale models in remote s
Externí odkaz:
http://arxiv.org/abs/2208.03987
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detectio
Externí odkaz:
http://arxiv.org/abs/2207.06603