Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Koner, Rajat"'
Vision Transformers (ViT) have emerged as the de-facto choice for numerous industry grade vision solutions. But their inference cost can be prohibitive for many settings, as they compute self-attention in each layer which suffers from quadratic compu
Externí odkaz:
http://arxiv.org/abs/2407.12753
Autor:
Hannan, Tanveer, Koner, Rajat, Bernhard, Maximilian, Shit, Suprosanna, Menze, Bjoern, Tresp, Volker, Schubert, Matthias, Seidl, Thomas
Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during
Externí odkaz:
http://arxiv.org/abs/2305.17096
The field of multimodal research focusing on the comprehension and creation of both images and text has witnessed significant strides. This progress is exemplified by the emergence of sophisticated models dedicated to image captioning at scale, such
Externí odkaz:
http://arxiv.org/abs/2212.12249
Autor:
Koner, Rajat, Hannan, Tanveer, Shit, Suprosanna, Sharifzadeh, Sahand, Schubert, Matthias, Seidl, Thomas, Tresp, Volker
Publikováno v:
Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-2023)
Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by fu
Externí odkaz:
http://arxiv.org/abs/2208.10547
Autor:
Shit, Suprosanna, Koner, Rajat, Wittmann, Bastian, Paetzold, Johannes, Ezhov, Ivan, Li, Hongwei, Pan, Jiazhen, Sharifzadeh, Sahand, Kaissis, Georgios, Tresp, Volker, Menze, Bjoern
A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally
Externí odkaz:
http://arxiv.org/abs/2203.10202
Publikováno v:
SafeAI@AAAI (2022)
It is essential for safety-critical applications of deep neural networks to determine when new inputs are significantly different from the training distribution. In this paper, we explore this out-of-distribution (OOD) detection problem for image cla
Externí odkaz:
http://arxiv.org/abs/2203.08549
Video Object Segmentation (VOS) has been targeted by various fully-supervised and self-supervised approaches. While fully-supervised methods demonstrate excellent results, self-supervised ones, which do not use pixel-level ground truth, attract much
Externí odkaz:
http://arxiv.org/abs/2202.07025
Publikováno v:
BMVC,2021
A serious problem in image classification is that a trained model might perform well for input data that originates from the same distribution as the data available for model training, but performs much worse for out-of-distribution (OOD) samples. In
Externí odkaz:
http://arxiv.org/abs/2107.08976
Visual Question Answering (VQA) is concerned with answering free-form questions about an image. Since it requires a deep semantic and linguistic understanding of the question and the ability to associate it with various objects that are present in th
Externí odkaz:
http://arxiv.org/abs/2107.06325
Identifying objects in an image and their mutual relationships as a scene graph leads to a deep understanding of image content. Despite the recent advancement in deep learning, the detection and labeling of visual object relationships remain a challe
Externí odkaz:
http://arxiv.org/abs/2107.05448