Zobrazeno 1 - 10
of 417
pro vyhledávání: '"Du, Shaoyi"'
Existing benchmarks like NLGraph and GraphQA evaluate LLMs on graphs by focusing mainly on pairwise relationships, overlooking the high-order correlations found in real-world data. Hypergraphs, which can model complex beyond-pairwise relationships, o
Externí odkaz:
http://arxiv.org/abs/2410.10083
Autor:
Feng, Yifan, Huang, Jiangang, Du, Shaoyi, Ying, Shihui, Yong, Jun-Hai, Li, Yipeng, Ding, Guiguang, Ji, Rongrong, Gao, Yue
We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that
Externí odkaz:
http://arxiv.org/abs/2408.04804
Learning rotation-invariant distinctive features is a fundamental requirement for point cloud registration. Existing methods often use rotation-sensitive networks to extract features, while employing rotation augmentation to learn an approximate inva
Externí odkaz:
http://arxiv.org/abs/2407.10142
Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention. While many fruitful efforts have been made to human motion prediction, most approaches focus on pose-driven predictio
Externí odkaz:
http://arxiv.org/abs/2405.18700
Autor:
Tian, Fengrui, Liu, Yaoyao, Kortylewski, Adam, Duan, Yueqi, Du, Shaoyi, Yuille, Alan, Wang, Angtian
3D object pose estimation is a challenging task. Previous works always require thousands of object images with annotated poses for learning the 3D pose correspondence, which is laborious and time-consuming for labeling. In this paper, we propose to l
Externí odkaz:
http://arxiv.org/abs/2404.05626
In this work, we pioneer Semantic Flow, a neural semantic representation of dynamic scenes from monocular videos. In contrast to previous NeRF methods that reconstruct dynamic scenes from the colors and volume densities of individual points, Semantic
Externí odkaz:
http://arxiv.org/abs/2404.05163
Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information fr
Externí odkaz:
http://arxiv.org/abs/2403.19316
Autor:
Xie, Weidong, Luo, Lun, Ye, Nanfei, Ren, Yi, Du, Shaoyi, Wang, Minhang, Xu, Jintao, Ai, Rui, Gu, Weihao, Chen, Xieyuanli
Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving
Externí odkaz:
http://arxiv.org/abs/2403.18762
In this paper, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits a strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The strategy sets up generation objecti
Externí odkaz:
http://arxiv.org/abs/2401.02142
Crowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly conge
Externí odkaz:
http://arxiv.org/abs/2311.04509