Výsledky vyhledávání - "An, Xiaomeng"

Report

Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models

Autor: Hu, Xiaomeng, Chen, Pin-Yu, Ho, Tsung-Yi

Large Language Models (LLMs) are increasingly being integrated into services such as ChatGPT to provide responses to user queries. To mitigate potential harm and prevent misuse, there have been concerted efforts to align the LLMs with human values an

Externí odkaz: http://arxiv.org/abs/2412.18171

Zobrazit plný text záznamu

Report

Improved Forecasts of Global Extreme Marine Heatwaves Through a Physics-guided Data-driven Approach

Autor: Shu, Ruiqi, Wu, Hao, Gao, Yuan, Xu, Fanghua, Gou, Ruijian, Huang, Xiaomeng

The unusually warm sea surface temperature events known as marine heatwaves (MHWs) have a profound impact on marine ecosystems. Accurate prediction of extreme MHWs has significant scientific and financial worth. However, existing methods still have c

Externí odkaz: http://arxiv.org/abs/2412.15532

Zobrazit plný text záznamu

Report

Consistency of Compositional Generalization across Multiple Levels

Autor: Li, Chuanhao, Li, Zhen, Jing, Chenchen, Fan, Xiaomeng, Ye, Wenbo, Wu, Yuwei, Jia, Yunde

Compositional generalization is the capability of a model to understand novel compositions composed of seen concepts. There are multiple levels of novel compositions including phrase-phrase level, phrase-word level, and word-word level. Existing meth

Externí odkaz: http://arxiv.org/abs/2412.13636

Zobrazit plný text záznamu

Report

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

Autor: Chu, Xiaomeng, Deng, Jiajun, You, Guoliang, Duan, Yifan, Li, Houqiang, Zhang, Yanyong

We propose Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection by the following insight. The Radar-Camera fusion in outdoor 3D scene perception is capped by the image-to-BEV transformation--if the depth of pixels

Externí odkaz: http://arxiv.org/abs/2412.12725

Zobrazit plný text záznamu

Report

HResFormer: Hybrid Residual Transformer for Volumetric Medical Image Segmentation

Autor: Ren, Sucheng, Li, Xiaomeng

Vision Transformer shows great superiority in medical image segmentation due to the ability in learning long-range dependency. For medical image segmentation from 3D data, such as computed tomography (CT), existing methods can be broadly classified i

Externí odkaz: http://arxiv.org/abs/2412.11458

Zobrazit plný text záznamu

Report

MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images

Autor: Wang, Lehan, Qi, Chongchong, Ou, Chubin, An, Lin, Jin, Mei, Kong, Xiangbin, Li, Xiaomeng

Existing multi-modal learning methods on fundus and OCT images mostly require both modalities to be available and strictly paired for training and testing, which appears less practical in clinical scenarios. To expand the scope of clinical applicatio

Externí odkaz: http://arxiv.org/abs/2412.09402

Zobrazit plný text záznamu

Report

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Document content extraction is crucial in computer vision, especially for meeting the high-quality data needs of large language models (LLMs) and retrieval-augmented generation (RAG) technologies. However, current document parsing methods suffer from

Externí odkaz: http://arxiv.org/abs/2412.07626

Zobrazit plný text záznamu

Report

StarWhisper Telescope: Agent-Based Observation Assistant System to Approach AI Astrophysicist

With the rapid advancements in Large Language Models (LLMs), LLM-based agents have introduced convenient and user-friendly methods for leveraging tools across various domains. In the field of astronomical observation, the construction of new telescop

Externí odkaz: http://arxiv.org/abs/2412.06412

Zobrazit plný text záznamu

Report

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Autor: Wang, Yibin, Tan, Zhiyu, Wang, Junyan, Yang, Xiaomeng, Jin, Cheng, Li, Hao

Recent advancements in text-to-video (T2V) generative models have shown impressive capabilities. However, these models are still inadequate in aligning synthesized videos with human preferences (e.g., accurately reflecting text descriptions), which i

Externí odkaz: http://arxiv.org/abs/2412.04814

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání