Zobrazeno 1 - 10
of 16
pro vyhledávání: '"Longteng Guo"'
Publikováno v:
Scientific Reports, Vol 14, Iss 1, Pp 1-13 (2024)
Abstract EEG-based brain-computer interfaces (BCIs) have the potential to decode visual information. Recently, artificial neural networks (ANNs) have been used to classify EEG signals evoked by visual stimuli. However, methods using ANNs to extract f
Externí odkaz:
https://doaj.org/article/56af9dce8c0a42e9ac9d830ca2db3034
Publikováno v:
Applied Sciences, Vol 9, Iss 16, p 3260 (2019)
Image captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progres
Externí odkaz:
https://doaj.org/article/f3d9bebd9f14465c8bd0bcd597fce7a8
Publikováno v:
Applied Sciences, Vol 9, Iss 14, p 2888 (2019)
In the task of image captioning, learning the attentive image regions is necessary to adaptively and precisely focus on the object semantics relevant to each decoded word. In this paper, we propose a convolutional attention module that can preserve t
Externí odkaz:
https://doaj.org/article/b651abaedb404bc88fb7bf67aed6a38a
Publikováno v:
Applied Sciences, Vol 8, Iss 6, p 909 (2018)
Although the policy-gradient methods for reinforcement learning have shown significant improvement in image captioning, how to achieve high performance during the reinforcement optimizing process is still not a simple task. There are at least two dif
Externí odkaz:
https://doaj.org/article/a0aef56050d0464ca041692cb29ee181
Publikováno v:
IEEE Transactions on Multimedia. 22:2149-2162
The encoder-decoder framework has been the base of popular image captioning models, which typically predicts the target sentence based on the encoded source image one word at a time in sequence. However, such a single-pass decoding framework encounte
Publikováno v:
ACM Multimedia
The quality of video representation directly decides the performance of video related tasks, for both understanding and generation. In this paper, we propose single-modality pretrained feature fusion technique which is composed of reasonable multi-vi
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783030873578
ICIG (2)
ICIG (2)
Human pose estimation has drawn much attention recently, but it remains challenging due to the deformation of human joints, the occlusion between limbs, etc. And more discriminative feature representations will bring more accurate prediction results.
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::63e98662598aa0ccc10caebb5d61788f
https://doi.org/10.1007/978-3-030-87358-5_31
https://doi.org/10.1007/978-3-030-87358-5_31
Publikováno v:
ICME
Image captioning aims to first observe an image, most notably the involved objects that are highly context-dependent, and then depict it with a natural description. However, most of the current models solely use the isolated objects vectors as image
Publikováno v:
CVPR
Self-attention (SA) network has shown profound value in image captioning. In this paper, we improve SA from two aspects to promote the performance of image captioning. First, we propose Normalized Self-Attention (NSA), a reparameterization of SA that
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4bd754b3a2c50e9e4734d4f30c7176d3
http://arxiv.org/abs/2003.08897
http://arxiv.org/abs/2003.08897
Publikováno v:
IJCAI
Most image captioning models are autoregressive, i.e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9c2e319daa5fcf5315bc287f5b47e119