Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Pang, Ziqi"'
Autor:
Pang, Ziqi, Zhang, Tianyuan, Luan, Fujun, Man, Yunze, Tan, Hao, Zhang, Kai, Freeman, William T., Wang, Yu-Xiong
We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of generating images in arbitrary token orders. Unlike previous decoder-only AR models that rely on a predefined generation order, RandAR removes this inductive bias, unlock
Externí odkaz:
http://arxiv.org/abs/2412.01827
Publikováno v:
NeurIPs 2024
In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs). This task poses significant challenges due to the explosion in graph size, dependencies among graph entities, and the
Externí odkaz:
http://arxiv.org/abs/2410.07157
With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a simple but overlooked strategy: restricting the size of memory banks. This diverges from the prevalent practice of expanding memory banks to accomm
Externí odkaz:
http://arxiv.org/abs/2406.08476
This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet prev
Externí odkaz:
http://arxiv.org/abs/2310.12973
Trajectory forecasting is a widely-studied problem for autonomous navigation. However, existing benchmarks evaluate forecasting based on independent snapshots of trajectories, which are not representative of real-world applications that operate on a
Externí odkaz:
http://arxiv.org/abs/2310.01351
While bird's-eye-view (BEV) perception models can be useful for building high-definition maps (HD-Maps) with less human labor, their results are often unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps from different viewp
Externí odkaz:
http://arxiv.org/abs/2305.08851
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking
This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it "Past-and-Future reasoning for Tracki
Externí odkaz:
http://arxiv.org/abs/2302.03802
Autor:
Fan, Lue, Pang, Ziqi, Zhang, Tianyuan, Wang, Yu-Xiong, Zhao, Hang, Wang, Feng, Wang, Naiyan, Zhang, Zhaoxiang
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Overlooking this difference, many 3D detectors directly follow the common practice of
Externí odkaz:
http://arxiv.org/abs/2112.06375
Previous online 3D Multi-Object Tracking(3DMOT) methods terminate a tracklet when it is not associated with new detections for a few frames. But if an object just goes dark, like being temporarily occluded by other objects or simply getting out of FO
Externí odkaz:
http://arxiv.org/abs/2111.13672
3D multi-object tracking (MOT) has witnessed numerous novel benchmarks and approaches in recent years, especially those under the "tracking-by-detection" paradigm. Despite their progress and usefulness, an in-depth analysis of their strengths and wea
Externí odkaz:
http://arxiv.org/abs/2111.09621