Zobrazeno 1 - 10
of 49
pro vyhledávání: '"Narang, Ankur"'
Autor:
Khirwar, Madhav, Narang, Ankur
Air pollution represents a pivotal environmental challenge globally, playing a major role in climate change via greenhouse gas emissions and negatively affecting the health of billions. However predicting the spatial and temporal patterns of pollutan
Externí odkaz:
http://arxiv.org/abs/2402.07164
Autor:
Khirwar, Madhav, Narang, Ankur
Greenhouse gases are pivotal drivers of climate change, necessitating precise quantification and source identification to foster mitigation strategies. We introduce GeoViT, a compact vision transformer model adept in processing satellite imagery for
Externí odkaz:
http://arxiv.org/abs/2311.14301
In this paper, we present a Diffusion GAN based approach (Prosodic Diff-TTS) to generate the corresponding high-fidelity speech based on the style description and content text as an input to generate speech samples within only 4 denoising steps. It l
Externí odkaz:
http://arxiv.org/abs/2310.18169
Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due t
Externí odkaz:
http://arxiv.org/abs/2212.11275
Autor:
Kumar, Neeraj, Goel, Srishti, Narang, Ankur, Lall, Brejesh, Hasan, Mujtaba, Agarwal, Pranshu, Sarkar, Dipankar
We consider the challenging problem of audio to animated video generation. We propose a novel method OneShotAu2AV to generate an animated video of arbitrary length using an audio clip and a single unseen image of a person as an input. The proposed me
Externí odkaz:
http://arxiv.org/abs/2102.09737
Audio to Video generation is an interesting problem that has numerous applications across industry verticals including film making, multi-media, marketing, education and others. High-quality video generation with expressive facial movements is a chal
Externí odkaz:
http://arxiv.org/abs/2012.07842
Speech-driven facial video generation has been a complex problem due to its multi-modal aspects namely audio and video domain. The audio comprises lots of underlying features such as expression, pitch, loudness, prosody(speaking style) and facial vid
Externí odkaz:
http://arxiv.org/abs/2012.07304
The style of the speech varies from person to person and every person exhibits his or her own style of speaking that is determined by the language, geography, culture and other factors. Style is best captured by prosody of a signal. High quality mult
Externí odkaz:
http://arxiv.org/abs/2012.07252
Federated learning has allowed the training of statistical models over remote devices without the transfer of raw client data. In practice, training in heterogeneous and large networks introduce novel challenges in various aspects like network load,
Externí odkaz:
http://arxiv.org/abs/2011.07229
The Federated Learning setting has a central server coordinating the training of a model on a network of devices. One of the challenges is variable training performance when the dataset has a class imbalance. In this paper, we address this by introdu
Externí odkaz:
http://arxiv.org/abs/2011.06283