Zobrazeno 1 - 10
of 134
pro vyhledávání: '"Alatan, A. Aydın"'
In recent years, transformer-based architectures become the de facto standard for sequence modeling in deep learning frameworks. Inspired by the successful examples, we propose a causal visual-inertial fusion transformer (VIFT) for pose estimation in
Externí odkaz:
http://arxiv.org/abs/2409.08769
Autor:
Sarikamis, F. Aykut, Alatan, A. Aydin
3D Gaussian Splatting has recently shown promising results as an alternative scene representation in SLAM systems to neural implicit representations. However, current methods either lack dense depth maps to supervise the mapping process or detailed t
Externí odkaz:
http://arxiv.org/abs/2408.01126
We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficult
Externí odkaz:
http://arxiv.org/abs/2404.09692
Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its acti
Externí odkaz:
http://arxiv.org/abs/2309.02843
A common architectural choice for deep metric learning is a convolutional neural network followed by global average pooling (GAP). Albeit simple, GAP is a highly effective way to aggregate information. One possible explanation for the effectiveness o
Externí odkaz:
http://arxiv.org/abs/2308.09228
Autor:
Gurbuz, Yeti Z., Alatan, A. Aydin
Global average pooling (GAP) is a popular component in deep metric learning (DML) for aggregating features. Its effectiveness is often attributed to treating each feature vector as a distinct semantic entity and GAP as a combination of them. Albeit s
Externí odkaz:
http://arxiv.org/abs/2307.07620
Utilization of event-based cameras is expected to improve the visual quality of video frame interpolation solutions. We introduce a learning-based method to exploit moving region boundaries in a video sequence to increase the overall interpolation qu
Externí odkaz:
http://arxiv.org/abs/2303.02025
Convolution blocks serve as local feature extractors and are the key to success of the neural networks. To make local semantic feature embedding rather explicit, we reformulate convolution blocks as feature selection according to the best matching ke
Externí odkaz:
http://arxiv.org/abs/2210.00992
Video frame interpolation (VFI) is a fundamental vision task that aims to synthesize several frames between two consecutive original video images. Most algorithms aim to accomplish VFI by using only keyframes, which is an ill-posed problem since the
Externí odkaz:
http://arxiv.org/abs/2209.09359
Deep metric learning (DML) aims to minimize empirical expected loss of the pairwise intra-/inter- class proximity violations in the embedding space. We relate DML to feasibility problem of finite chance constraints. We show that minimizer of proxy-ba
Externí odkaz:
http://arxiv.org/abs/2209.09060