Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Kim, Sungnyun"'
Autor:
Kim, Sungnyun, Liao, Haofu, Appalaraju, Srikar, Tang, Peng, Tu, Zhuowen, Satzoda, Ravi Kumar, Manmatha, R., Mahadevan, Vijay, Soatto, Stefano
Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distillin
Externí odkaz:
http://arxiv.org/abs/2410.03061
Offline multi-agent reinforcement learning (MARL) is increasingly recognized as crucial for effectively deploying RL algorithms in environments where real-time interaction is impractical, risky, or costly. In the offline setting, learning from a stat
Externí odkaz:
http://arxiv.org/abs/2408.13092
Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily foc
Externí odkaz:
http://arxiv.org/abs/2407.03563
Federated Learning (FL) has emerged as a pivotal framework for the development of effective global models (global FL) or personalized models (personalized FL) across clients with heterogeneous, non-iid data distribution. A key challenge in FL is clie
Externí odkaz:
http://arxiv.org/abs/2406.02355
Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive sequence mode
Externí odkaz:
http://arxiv.org/abs/2402.03898
Albeit great performance of Transformer-based speech selfsupervised learning (SSL) models, their large parameter size and computational cost make them unfavorable to utilize. In this study, we propose to compress the speech SSL models by distilling s
Externí odkaz:
http://arxiv.org/abs/2312.09040
In this study, we aim to extend the capabilities of diffusion-based text-to-image (T2I) generation models by incorporating diverse modalities beyond textual description, such as sketch, box, color palette, and style embedding, within a single model.
Externí odkaz:
http://arxiv.org/abs/2305.15194
Autor:
Bae, Sangmin, Kim, June-Woo, Cho, Won-Yang, Baek, Hyerim, Son, Soyoun, Lee, Byungjo, Ha, Changwan, Tae, Kyongpil, Kim, Sungnyun, Yun, Se-Young
Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge dee
Externí odkaz:
http://arxiv.org/abs/2305.14032
Transformer-based speech self-supervised learning (SSL) models, such as HuBERT, show surprising performance in various speech processing tasks. However, huge number of parameters in speech SSL models necessitate the compression to a more compact mode
Externí odkaz:
http://arxiv.org/abs/2305.11685
Deep learning in general domains has constantly been extended to domain-specific tasks requiring the recognition of fine-grained characteristics. However, real-world applications for fine-grained tasks suffer from two challenges: a high reliance on e
Externí odkaz:
http://arxiv.org/abs/2303.11101