Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Kögel, Fabian"'
State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech. For expressive speech datasets however, we observe characteristic audio distortions. We demonstrate tha
Externí odkaz:
http://arxiv.org/abs/2306.01442
Human-like attention as a supervisory signal to guide neural attention has shown significant promise but is currently limited to uni-modal integration - even for inherently multimodal tasks such as visual question answering (VQA). We present the Mult
Externí odkaz:
http://arxiv.org/abs/2109.13139
We present VQA-MHUG - a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA) collected using a high-speed eye tracker. We use our dataset to analyze the similarity between human and
Externí odkaz:
http://arxiv.org/abs/2109.13116
Autor:
Lindenschmidt, Karl-Erich, Alfredsen, Knut, Carstensen, Dirk, Choryński, Adam, Gustafsson, David, Halicki, Michał, Hentschel, Bernd, Karjalainen, Niina, Kögel, Michael, Kolerski, Tomasz, Kornaś-Dynia, Marika, Kubicki, Michał, Kundzewicz, Zbigniew W., Lauschke, Cornelia, Malinger, Albert, Marszelewski, Włodzimierz, Möldner, Fabian, Näslund-Landenmark, Barbro, Niedzielski, Tomasz, Parjanne, Antti
Publikováno v:
Water (20734441); Jan2023, Vol. 15 Issue 1, p76, 23p