Zobrazeno 1 - 10
of 108
pro vyhledávání: '"Mahmud, Tanvir"'
Autor:
Mahmud, Tanvir, Marculescu, Diana
Audio separation in real-world scenarios, where mixtures contain a variable number of sources, presents significant challenges due to limitations of existing models, such as over-separation, under-separation, and dependence on predefined training sou
Externí odkaz:
http://arxiv.org/abs/2409.19270
Recent advances in pre-trained vision transformers have shown promise in parameter-efficient audio-visual learning without audio pre-training. However, few studies have investigated effective methods for aligning multimodal features in parameter-effi
Externí odkaz:
http://arxiv.org/abs/2406.04930
Video-to-video synthesis models face significant challenges, such as ensuring consistent character generation across frames, maintaining smooth temporal transitions, and preserving quality during fast motion. The introduction of joint fully cross-fra
Externí odkaz:
http://arxiv.org/abs/2406.04873
Publikováno v:
IEEE/CVF Computer Vision and Pattern Recognition (CVPR) Conference, 2024
Visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video. Existing self-supervised and weakly supervised source localization methods struggle to accurately distinguish th
Externí odkaz:
http://arxiv.org/abs/2404.01751
Conditional sound separation in multi-source audio mixtures without having access to single source sound data during training is a long standing challenge. Existing mix-and-separate based methods suffer from significant performance drop with multi-so
Externí odkaz:
http://arxiv.org/abs/2404.01740
Publikováno v:
European Conference on Computer Vision (ECCV) 2024
As deep neural networks evolve from convolutional neural networks (ConvNets) to advanced vision transformers (ViTs), there is an increased need to eliminate redundant data for faster processing without compromising accuracy. Previous methods are ofte
Externí odkaz:
http://arxiv.org/abs/2403.16020
Publikováno v:
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
Despite significant progress in semi-supervised learning for image object detection, several key issues are yet to be addressed for video object detection: (1) Achieving good performance for supervised video object detection greatly depends on the av
Externí odkaz:
http://arxiv.org/abs/2309.01391
Publikováno v:
NeurIPS Workshop on Heavy Tails in ML, 2023
We propose an embarrassingly simple method -- instance-aware repeat factor sampling (IRFS) to address the problem of imbalanced data in long-tailed object detection. Imbalanced datasets in real-world object detection often suffer from a large dispari
Externí odkaz:
http://arxiv.org/abs/2305.08069
Melanoma is considered to be the deadliest variant of skin cancer causing around 75\% of total skin cancer deaths. To diagnose Melanoma, clinicians assess and compare multiple skin lesions of the same patient concurrently to gather contextual informa
Externí odkaz:
http://arxiv.org/abs/2303.03672
Autor:
Mahmud, Tanvir, Marculescu, Diana
An audio-visual event (AVE) is denoted by the correspondence of the visual and auditory signals in a video segment. Precise localization of the AVEs is very challenging since it demands effective multi-modal feature correspondence to ground the short
Externí odkaz:
http://arxiv.org/abs/2210.05060