Zobrazeno 1 - 10
of 105
pro vyhledávání: '"Wildes, Richard P"'
Understanding what deep network models capture in their learned representations is a fundamental challenge in computer vision. We present a new methodology to understanding such vision models, the Visual Concept Connectome (VCC), which discovers huma
Externí odkaz:
http://arxiv.org/abs/2404.02233
Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Concerns for the privacy of individuals captured in public imagery have led to privacy-preserving action recognition. Existing approaches often suffer from issues arising through obfuscation being applied globally and a lack of interpretability. Glob
Externí odkaz:
http://arxiv.org/abs/2403.12710
Autor:
Karim, Rezaul, Wildes, Richard P.
Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this
Externí odkaz:
http://arxiv.org/abs/2310.12296
Few-shot video segmentation is the task of delineating a specific novel class in a query video using few labelled support images. Typical approaches compare support and query features while limiting comparisons to a single feature layer and thereby i
Externí odkaz:
http://arxiv.org/abs/2307.07812
Autor:
Dvornik, Nikita, Hadji, Isma, Zhang, Ran, Derpanis, Konstantinos G., Garg, Animesh, Wildes, Richard P., Jepson, Allan D.
Instructional videos are an important resource to learn procedural tasks from human demonstrations. However, the instruction steps in such videos are typically short and sparse, with most of the video being irrelevant to the procedure. This motivates
Externí odkaz:
http://arxiv.org/abs/2304.13265
In this paper, we present an end-to-end trainable unified multiscale encoder-decoder transformer that is focused on dense prediction tasks in video. The presented Multiscale Encoder-Decoder Video Transformer (MED-VT) uses multiscale representation th
Externí odkaz:
http://arxiv.org/abs/2304.05930
Autor:
Kowal, Matthew, Siam, Mennatullah, Islam, Md Amirul, Bruce, Neil D. B., Wildes, Richard P., Derpanis, Konstantinos G.
There is limited understanding of the information captured by deep spatiotemporal models in their intermediate representations. For example, while evidence suggests that action recognition algorithms are heavily influenced by visual appearance in sin
Externí odkaz:
http://arxiv.org/abs/2211.01783
This paper investigates the modeling of automated machine description on sports video, which has seen much progress recently. Nevertheless, state-of-the-art approaches fall quite short of capturing how human experts analyze sports scenes. There are s
Externí odkaz:
http://arxiv.org/abs/2208.04897
Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available
Externí odkaz:
http://arxiv.org/abs/2207.06261
Autor:
Kowal, Matthew, Siam, Mennatullah, Islam, Md Amirul, Bruce, Neil D. B., Wildes, Richard P., Derpanis, Konstantinos G.
Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate rep
Externí odkaz:
http://arxiv.org/abs/2206.02846