Zobrazeno 1 - 9
of 9
pro vyhledávání: '"68T45, 62H30"'
This paper proposes a hybrid fusion-based deep learning approach based on two different modalities, audio and video, to improve human activity recognition and violence detection in public places. To take advantage of audiovisual fusion, late fusion,
Externí odkaz:
http://arxiv.org/abs/2408.02033
Autor:
Gabdullin, Nikita
Autoencoders (AE) are simple yet powerful class of neural networks that compress data by projecting input into low-dimensional latent space (LS). Whereas LS is formed according to the loss function minimization during training, its properties and top
Externí odkaz:
http://arxiv.org/abs/2402.08441
Recent advances in visual tracking are based on siamese feature extractors and template matching. For this category of trackers, latest research focuses on better feature embeddings and similarity measures. In this work, we focus on building holistic
Externí odkaz:
http://arxiv.org/abs/1907.12920
Autor:
Abavisani, Mahdi, Patel, Vishal M.
Publikováno v:
IEEE Signal Processing Letters, 2019
We present a transductive deep learning-based formulation for the sparse representation-based classification (SRC) method. The proposed network consists of a convolutional autoencoder along with a fully-connected layer. The role of the autoencoder ne
Externí odkaz:
http://arxiv.org/abs/1904.11093
Publikováno v:
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1165-1174
We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal inform
Externí odkaz:
http://arxiv.org/abs/1812.06145
Autor:
Abavisani, Mahdi, Patel, Vishal M.
Publikováno v:
IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 6, pp. 1601-1614, Dec. 2018
We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder take
Externí odkaz:
http://arxiv.org/abs/1804.06498
Autor:
Vishal M. Patel, Mahdi Abavisani
Publikováno v:
IEEE Signal Processing Letters. 26:948-952
We present a transductive deep learning-based formulation for the sparse representation-based classification (SRC) method. The proposed network consists of a convolutional autoencoder along with a fully connected layer. The role of the autoencoder ne
Publikováno v:
CVPR
We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal inform
Autor:
Vishal M. Patel, Mahdi Abavisani
We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder take
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4325214e50254a2df4b5f73b5e712beb