Zobrazeno 1 - 10
of 79
pro vyhledávání: '"Wu, Xingjiao"'
Preference-Based reinforcement learning (PBRL) learns directly from the preferences of human teachers regarding agent behaviors without needing meticulously designed reward functions. However, existing PBRL methods often learn primarily from explicit
Externí odkaz:
http://arxiv.org/abs/2409.07268
When dealing with the task of fine-grained scene image classification, most previous works lay much emphasis on global visual features when doing multi-modal feature fusion. In other words, models are deliberately designed based on prior intuitions a
Externí odkaz:
http://arxiv.org/abs/2407.02769
Detecting stereotypes and biases in Large Language Models (LLMs) is crucial for enhancing fairness and reducing adverse impacts on individuals or groups when these models are applied. Traditional methods, which rely on embedding spaces or are based o
Externí odkaz:
http://arxiv.org/abs/2405.03098
With the increasing prevalence of smartphones and websites, Image Aesthetic Assessment (IAA) has become increasingly crucial. While the significance of attributes in IAA is widely recognized, many attribute-based methods lack consideration for the se
Externí odkaz:
http://arxiv.org/abs/2311.11306
Autor:
Wu, Anran, Xiao, Luwei, Wu, Xingjiao, Yang, Shuwen, Xu, Junjie, Zhuang, Zisong, Xie, Nian, Jin, Cheng, He, Liang
Visually-situated languages such as charts and plots are omnipresent in real-world documents. These graphical depictions are human-readable and are often analyzed in visually-rich documents to address a variety of questions that necessitate complex r
Externí odkaz:
http://arxiv.org/abs/2310.18983
Pre-trained multimodal models have achieved significant success in retrieval-based question answering. However, current multimodal retrieval question-answering models face two main challenges. Firstly, utilizing compressed evidence features as input
Externí odkaz:
http://arxiv.org/abs/2310.09696
Detecting stereotypes and biases in Large Language Models (LLMs) can enhance fairness and reduce adverse impacts on individuals or groups when these LLMs are applied. However, the majority of existing methods focus on measuring the model's preference
Externí odkaz:
http://arxiv.org/abs/2308.10397
Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases. However, directly applying the transformer structure to remove noise is challengin
Externí odkaz:
http://arxiv.org/abs/2304.06346
Autor:
Li, Xin, Ma, Tao, Hou, Yuenan, Shi, Botian, Yang, Yuchen, Liu, Youquan, Wu, Xingjiao, Chen, Qin, Li, Yikang, Qiao, Yu, He, Liang
LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across the whole scene. Such practice lack
Externí odkaz:
http://arxiv.org/abs/2303.03595
Multi-modal 3D object detection has been an active research topic in autonomous driving. Nevertheless, it is non-trivial to explore the cross-modal feature fusion between sparse 3D points and dense 2D pixels. Recent approaches either fuse the image f
Externí odkaz:
http://arxiv.org/abs/2210.09615