Zobrazeno 1 - 10
of 515
pro vyhledávání: '"Huang, Deng"'
Autor:
Yang, Haosen, Huang, Deng, Wen, Bin, Wu, Jiannan, Yao, Hongxun, Jiang, Yi, Zhu, Xiatian, Yuan, Zehuan
Masked autoencoders (MAEs) have emerged recently as art self-supervised spatiotemporal representation learners. Inheriting from the image counterparts, however, existing video MAEs still focus largely on static appearance learning whilst are limited
Externí odkaz:
http://arxiv.org/abs/2210.04154
Autor:
Yen, Yao-Te, Zhou, Song-Lin, Huang, Deng-Ying, Tseng, Shih-Hao, Wang, Chung-Feng, Chyueh, San-Chong
Publikováno v:
In Forensic Science International July 2024 360
Autor:
Huang, Deng, Wu, Wenhao, Hu, Weiwen, Liu, Xu, He, Dongliang, Wu, Zhihua, Wu, Xiangmiao, Tan, Mingkui, Ding, Errui
We study self-supervised video representation learning, which is a challenging task due to 1) lack of labels for explicit supervision; 2) unstructured and noisy visual information. Existing methods mainly use contrastive loss with video clips as the
Externí odkaz:
http://arxiv.org/abs/2106.02342
Autor:
HUANG, DENG-YING, 黃鐙瑩
107
With the lifestyle change, the lack of regular exercises, unhealthy eating habitat and the extreme weather are the risk factors that cause the induction of cardiovascular diseases (CVD). Not only elders and disease-specific group especially
With the lifestyle change, the lack of regular exercises, unhealthy eating habitat and the extreme weather are the risk factors that cause the induction of cardiovascular diseases (CVD). Not only elders and disease-specific group especially
Externí odkaz:
http://ndltd.ncl.edu.tw/handle/6h5zet
Autor:
Chen, Peihao, Huang, Deng, He, Dongliang, Long, Xiang, Zeng, Runhao, Wen, Shilei, Tan, Mingkui, Gan, Chuang
We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition. This task, however, is extremely challenging d
Externí odkaz:
http://arxiv.org/abs/2011.07949
We addressed the challenging task of video question answering, which requires machines to answer questions about videos in a natural language form. Previous state-of-the-art methods attempt to apply spatio-temporal attention mechanism on video frame
Externí odkaz:
http://arxiv.org/abs/2008.09105
In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator:
Externí odkaz:
http://arxiv.org/abs/2007.10984
We focus on the task of generating sound from natural videos, and the sound should be both temporally and content-wise aligned with visual signals. This task is extremely challenging because some sounds generated \emph{outside} a camera can not be in
Externí odkaz:
http://arxiv.org/abs/2008.00820
Recent deep learning approaches have achieved impressive performance on visual sound separation tasks. However, these approaches are mostly built on appearance and optical flow like motion feature representations, which exhibit limited abilities to f
Externí odkaz:
http://arxiv.org/abs/2004.09476
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.