Zobrazeno 1 - 10
of 37
pro vyhledávání: '"Duong, Ngoc Q. K."'
High-level understanding of stories in video such as movies and TV shows from raw data is extremely challenging. Modern video question answering (VideoQA) systems often use additional human-made sources like plot synopses, scripts, video descriptions
Externí odkaz:
http://arxiv.org/abs/2103.14517
Autor:
Phan, Huy, Nguyen, Huy Le, Chén, Oliver Y., Koch, Philipp, Duong, Ngoc Q. K., McLoughlin, Ian, Mertins, Alfred
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from n
Externí odkaz:
http://arxiv.org/abs/2010.09132
Audio event localization and detection (SELD) have been commonly tackled using multitask models. Such a model usually consists of a multi-label event classification branch with sigmoid cross-entropy loss for event activity detection and a regression
Externí odkaz:
http://arxiv.org/abs/2009.05527
In this work, we address a novel, but potentially emerging, problem of discriminating the natural human voices and those played back by any kind of audio devices in the context of interactions with in-house voice user interface. The tackled problem m
Externí odkaz:
http://arxiv.org/abs/1901.11291
Publikováno v:
ICCV 2019
Humans share a strong tendency to memorize/forget some of the visual information they encounter. This paper focuses on providing computational models for the prediction of the intrinsic memorability of visual content. To address this new challenge, w
Externí odkaz:
http://arxiv.org/abs/1812.01973
Autor:
Parekh, Sanjeel, Essid, Slim, Ozerov, Alexey, Duong, Ngoc Q. K., Pérez, Patrick, Richard, Gaël
Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events. To this end, we propose a novel multimodal framework that instantiates multiple instance learning. We
Externí odkaz:
http://arxiv.org/abs/1804.07345
Scene-agnostic visual inpainting remains very challenging despite progress in patch-based methods. Recently, Pathak et al. 2016 have introduced convolutional "context encoders" (CEs) for unsupervised feature learning through image completion tasks. W
Externí odkaz:
http://arxiv.org/abs/1803.10348
Autor:
Duong, Ngoc Q. K., Duong, Hien-Thanh
Audio fingerprinting, also named as audio hashing, has been well-known as a powerful technique to perform audio identification and synchronization. It basically involves two major steps: fingerprint (voice pattern) design and matching search. While t
Externí odkaz:
http://arxiv.org/abs/1502.06811
Autor:
Ozerov Alexey, Duong, Ngoc Q. K.
Deep neural networks (DNNs) have achieved great success in various machine learning tasks. However, most existing powerful DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory and comput
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::17f26017e1248a505189e53133694005
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.