Zobrazeno 1 - 10
of 4 804
pro vyhledávání: '"Su, Hang"'
Autor:
Wang, Zhengyi, Lorraine, Jonathan, Wang, Yikai, Su, Hang, Zhu, Jun, Fidler, Sanja, Zeng, Xiaohui
This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textu
Externí odkaz:
http://arxiv.org/abs/2411.09595
Autor:
Wang, Fengxiang, Duan, Ranjie, Xiao, Peng, Jia, Xiaojun, Chen, YueFeng, Wang, Chongwen, Tao, Jialing, Su, Hang, Zhu, Jun, Xue, Hui
Large Language Models (LLMs) demonstrate outstanding performance in their reservoir of knowledge and understanding capabilities, but they have also been shown to be prone to illegal or unethical reactions when subjected to jailbreak attacks. To ensur
Externí odkaz:
http://arxiv.org/abs/2411.03814
Autor:
Tan, Hengkai, Xu, Xuezhou, Ying, Chengyang, Mao, Xinyi, Liu, Songming, Zhang, Xingxing, Su, Hang, Zhu, Jun
Learning a precise robotic grasping policy is crucial for embodied agents operating in complex real-world manipulation tasks. Despite significant advancements, most models still struggle with accurate spatial positioning of objects to be grasped. We
Externí odkaz:
http://arxiv.org/abs/2411.01850
Developing effective text summarizers remains a challenge due to issues like hallucinations, key information omissions, and verbosity in LLM-generated summaries. This work explores using LLM-generated feedback to improve summary quality by aligning t
Externí odkaz:
http://arxiv.org/abs/2410.13116
We introduce nvTorchCam, an open-source library under the Apache 2.0 license, designed to make deep learning algorithms camera model-independent. nvTorchCam abstracts critical camera operations such as projection and unprojection, allowing developers
Externí odkaz:
http://arxiv.org/abs/2410.12074
Classifier-Free Guidance (CFG) is a critical technique for enhancing the sample quality of visual generative models. However, in autoregressive (AR) multi-modal generation, CFG introduces design inconsistencies between language and visual content, co
Externí odkaz:
http://arxiv.org/abs/2410.09347
Autor:
Liu, Songming, Wu, Lingxuan, Li, Bangguo, Tan, Hengkai, Chen, Huayu, Wang, Zhengyi, Xu, Ke, Su, Hang, Zhu, Jun
Bimanual manipulation is essential in robotics, yet developing foundation models is extremely challenging due to the inherent complexity of coordinating two robot arms (leading to multi-modal action distributions) and the scarcity of training data. I
Externí odkaz:
http://arxiv.org/abs/2410.07864
Existing benchmarks for summarization quality evaluation often lack diverse input scenarios, focus on narrowly defined dimensions (e.g., faithfulness), and struggle with subjective and coarse-grained annotation schemes. To address these shortcomings,
Externí odkaz:
http://arxiv.org/abs/2409.19898
Autor:
Wei, Xingxing, Kang, Caixin, Dong, Yinpeng, Wang, Zhengyi, Ruan, Shouwei, Chen, Yubo, Su, Hang
Adversarial patches present significant challenges to the robustness of deep learning models, making the development of effective defenses become critical for real-world applications. This paper introduces DIFFender, a novel DIFfusion-based DeFender
Externí odkaz:
http://arxiv.org/abs/2409.09406
We propose HYBRIDDEPTH, a robust depth estimation pipeline that addresses key challenges in depth estimation,including scale ambiguity, hardware heterogeneity, and generalizability. HYBRIDDEPTH leverages focal stack, data conveniently accessible in c
Externí odkaz:
http://arxiv.org/abs/2407.18443