Zobrazeno 1 - 10
of 3 500
pro vyhledávání: '"WANG, Xiangdong"'
Computer-aided cancer survival risk prediction plays an important role in the timely treatment of patients. This is a challenging weakly supervised ordinal regression task associated with multiple clinical factors involved such as pathological images
Externí odkaz:
http://arxiv.org/abs/2409.02145
Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success in multi-modal understanding tasks. These models usually aggregate uni-modal local representations, namely frame or word features, into global
Externí odkaz:
http://arxiv.org/abs/2408.07919
Cancer survival prediction is a challenging task that involves analyzing of the tumor microenvironment within Whole Slide Image (WSI). Previous methods cannot effectively capture the intricate interaction features among instances within the local are
Externí odkaz:
http://arxiv.org/abs/2407.00664
Autor:
Tao, Rui, Huang, Yuxing, Wang, Xiangdong, Yan, Long, Zhai, Lufeng, Ouchi, Kazushige, Li, Taihao
Weakly-supervised learning has emerged as a promising approach to leverage limited labeled data in various domains by bridging the gap between fully supervised methods and unsupervised techniques. Acquisition of strong annotations for detecting sound
Externí odkaz:
http://arxiv.org/abs/2309.11783
Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associate audio features with human language, making it a natural zero-shot classifier to recognize unseen sound categories. To adapt CLAP to downstream tasks, prior works inevitably req
Externí odkaz:
http://arxiv.org/abs/2309.08357
Learning meaningful frame-wise features on a partially labeled dataset is crucial to semi-supervised sound event detection. Prior works either maintain consistency on frame-level predictions or seek feature-level similarity among neighboring frames,
Externí odkaz:
http://arxiv.org/abs/2309.08355
Autor:
Guo, Zhifang, Mao, Jianguo, Tao, Rui, Yan, Long, Ouchi, Kazushige, Liu, Hong, Wang, Xiangdong
Text-based audio generation models have limitations as they cannot encompass all the information in audio, leading to restricted controllability when relying solely on text. To address this issue, we propose a novel model that enhances the controllab
Externí odkaz:
http://arxiv.org/abs/2308.11940
Large language models reveal deep comprehension and fluent generation in the field of multi-modality. Although significant advancements have been achieved in audio multi-modality, existing methods are rarely leverage language model for sound event de
Externí odkaz:
http://arxiv.org/abs/2308.11530