Zobrazeno 1 - 10
of 1 003
pro vyhledávání: '"Tang, XiaoYing"'
Autor:
Chen, Shiyun, Lin, Li, Cheng, Pujin, Jin, ZhiCheng, Chen, JianJian, Zhu, HaiDong, Wong, Kenneth K. Y., Tang, Xiaoying
Multimodal learning has been demonstrated to enhance performance across various clinical tasks, owing to the diverse perspectives offered by different modalities of data. However, existing multimodal segmentation methods rely on well-registered multi
Externí odkaz:
http://arxiv.org/abs/2412.20418
Federated Learning (FL) has received much attention in recent years. However, although clients are not required to share their data in FL, the global model itself can implicitly remember clients' local data. Therefore, it's necessary to effectively r
Externí odkaz:
http://arxiv.org/abs/2412.20200
Autor:
Pan, Bikang, Li, Qun, Tang, Xiaoying, Huang, Wei, Fang, Zhen, Liu, Feng, Wang, Jingya, Yu, Jingyi, Shi, Ye
The emergence of vision-language foundation models, such as CLIP, has revolutionized image-text representation, enabling a broad range of applications via prompt learning. Despite its promise, real-world datasets often contain noisy labels that can d
Externí odkaz:
http://arxiv.org/abs/2412.01256
Quantitative analysis of animal behavior and biomechanics requires accurate animal pose and shape estimation across species, and is important for animal welfare and biological research. However, the small network capacity of previous methods and limi
Externí odkaz:
http://arxiv.org/abs/2412.00837
Federated Learning (FL) is a distributed learning approach that trains neural networks across multiple devices while keeping their local data private. However, FL often faces challenges due to data heterogeneity, leading to inconsistent local optima
Externí odkaz:
http://arxiv.org/abs/2411.16303
The increasing concern for data privacy has driven the rapid development of federated learning (FL), a privacy-preserving collaborative paradigm. However, the statistical heterogeneity among clients in FL results in inconsistent performance of the se
Externí odkaz:
http://arxiv.org/abs/2410.20141
Multimodal Large Language Models (MLLMs) demonstrate a strong understanding of the real world and can even handle complex tasks. However, they still fail on some straightforward visual question-answering (VQA) problems. This paper dives deeper into t
Externí odkaz:
http://arxiv.org/abs/2410.11437
Domain Generalization (DG) aims to train models that can effectively generalize to unseen domains. However, in the context of Federated Learning (FL), where clients collaboratively train a model without directly sharing their data, most existing DG a
Externí odkaz:
http://arxiv.org/abs/2410.11267
Video Temporal Grounding (VTG) is a crucial capability for video understanding models and plays a vital role in downstream tasks such as video browsing and editing. To effectively handle various tasks simultaneously and enable zero-shot prediction, t
Externí odkaz:
http://arxiv.org/abs/2410.05643
In medical contexts, the imbalanced data distribution in long-tailed datasets, due to scarce labels for rare diseases, greatly impairs the diagnostic accuracy of deep learning models. Recent multimodal text-image supervised foundation models offer ne
Externí odkaz:
http://arxiv.org/abs/2408.14770