Zobrazeno 1 - 10
of 106
pro vyhledávání: '"Guha, Prithwijit"'
Autor:
Aggarwal, Yogesh, Guha, Prithwijit
Face detection is frequently attempted by using heavy pre-trained backbone networks like ResNet-50/101/152 and VGG16/19. Few recent works have also proposed lightweight detectors with customized backbones, novel loss functions and efficient training
Externí odkaz:
http://arxiv.org/abs/2406.19107
Autor:
Singh, Pranjali, Guha, Prithwijit
Underwater image quality is affected by fluorescence, low illumination, absorption, and scattering. Recent works in underwater image enhancement have proposed different deep network architectures to handle these problems. Most of these works have pro
Externí odkaz:
http://arxiv.org/abs/2406.18628
The use of complex attention modules has improved the performance of the Visual Question Answering (VQA) task. This work aims to learn an improved multi-modal representation through dense interaction of visual and textual modalities. The proposed mod
Externí odkaz:
http://arxiv.org/abs/2302.14777
Publikováno v:
In AEUE - International Journal of Electronics and Communications April 2024 177
Autor:
Manocha, Prateek, Guha, Prithwijit
Whenever we speak, our voice is accompanied by facial movements and expressions. Several recent works have shown the synthesis of highly photo-realistic videos of talking faces, but they either require a source video to drive the target face or only
Externí odkaz:
http://arxiv.org/abs/2011.01114
Even though there has been tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. To this end, we propose a model-independent cyclic framework which increases consistency and robustness
Externí odkaz:
http://arxiv.org/abs/2007.04422
This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end model to solve the task of visual question answering (VQA). The first level of CQ-VQA, referred to as question categorizer (QC), classifies questions to reduce the potential answ
Externí odkaz:
http://arxiv.org/abs/2002.06800
Distinct striation patterns are observed in the spectrograms of speech and music. This motivated us to propose three novel time-frequency features for speech-music classification. These features are extracted in two stages. First, a preset number of
Externí odkaz:
http://arxiv.org/abs/1811.01222
Publikováno v:
In Speech Communication July 2022 142:34-48
Deep Reinforcement Learning has enabled the learning of policies for complex tasks in partially observable environments, without explicitly learning the underlying model of the tasks. While such model-free methods achieve considerable performance, th
Externí odkaz:
http://arxiv.org/abs/1701.02392