Výsledky vyhledávání

Report

Number it: Temporal Grounding Videos like Flipping Manga

Autor: Wu, Yongliang, Hu, Xinting, Sun, Yuyang, Zhou, Yizhou, Zhu, Wenbo, Rao, Fengyun, Schiele, Bernt, Yang, Xu

Video Large Language Models (Vid-LLMs) have made remarkable advancements in comprehending video content for QA dialogue. However, they struggle to extend this visual understanding to tasks requiring precise temporal localization, known as Video Tempo

Externí odkaz: http://arxiv.org/abs/2411.10332

Zobrazit plný text záznamu

Report

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

Autor: Arya, Shreyash, Rao, Sukrut, Böhle, Moritz, Schiele, Bernt

B-cos Networks have been shown to be effective for obtaining highly human interpretable explanations of model decisions by architecturally enforcing stronger alignment between inputs and weight. B-cos variants of convolutional networks (CNNs) and vis

Externí odkaz: http://arxiv.org/abs/2411.00715

Zobrazit plný text záznamu

Report

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Autor: Wang, Haiyang, Fan, Yue, Naeem, Muhammad Ferjad, Xian, Yongqin, Lenssen, Jan Eric, Wang, Liwei, Tombari, Federico, Schiele, Bernt

Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily f

Externí odkaz: http://arxiv.org/abs/2410.23168

Zobrazit plný text záznamu

Report

Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs

Autor: Ling, Tianheng, Qian, Chao, Schiele, Gregor

This study addresses the deployment challenges of integer-only quantized Transformers on resource-constrained embedded FPGAs (Xilinx Spartan-7 XC7S15). We enhanced the flexibility of our VHDL template by introducing a selectable resource type for sto

Externí odkaz: http://arxiv.org/abs/2410.03294

Zobrazit plný text záznamu

Report

Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking

Autor: Segu, Mattia, Piccinelli, Luigi, Li, Siyuan, Yang, Yung-Hsu, Schiele, Bernt, Van Gool, Luc

Multiple object tracking in complex scenarios - such as coordinated dance performances, team sports, or dynamic animal groups - presents unique challenges. In these settings, objects frequently move in coordinated patterns, occlude each other, and ex

Externí odkaz: http://arxiv.org/abs/2410.01806

Zobrazit plný text záznamu

Report

Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs

Autor: Segu, Mattia, Piccinelli, Luigi, Li, Siyuan, Van Gool, Luc, Yu, Fisher, Schiele, Bernt

The supervision of state-of-the-art multiple object tracking (MOT) methods requires enormous annotation efforts to provide bounding boxes for all frames of all videos, and instance IDs to associate them through time. To this end, we introduce Walker,

Externí odkaz: http://arxiv.org/abs/2409.17221

Zobrazit plný text záznamu

Report

ElasticAI: Creating and Deploying Energy-Efficient Deep Learning Accelerator for Pervasive Computing

Autor: Qian, Chao, Ling, Tianheng, Schiele, Gregor

Deploying Deep Learning (DL) on embedded end devices is a scorching trend in pervasive computing. Since most Microcontrollers on embedded devices have limited computing power, it is necessary to add a DL accelerator. Embedded Field Programmable Gate

Externí odkaz: http://arxiv.org/abs/2409.09044

Zobrazit plný text záznamu

Report

On-device AI: Quantization-aware Training of Transformers in Time-Series

Autor: Ling, Tianheng, Schiele, Gregor

Artificial Intelligence (AI) models for time-series in pervasive computing keep getting larger and more complicated. The Transformer model is by far the most compelling of these AI models. However, it is difficult to obtain the desired performance wh

Externí odkaz: http://arxiv.org/abs/2408.16495

Zobrazit plný text záznamu

Report

Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets

Autor: Boettcher, Wolfgang, Hoyer, Lukas, Unal, Ozan, Lenssen, Jan Eric, Schiele, Bernt

In this work, we introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels. Training or fine-tuning semantic segmentation models with weak supervision has become an important topi

Externí odkaz: http://arxiv.org/abs/2408.12489

Zobrazit plný text záznamu

Report

MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment

Autor: Das, Anurag, Hu, Xinting, Jiang, Li, Schiele, Bernt

Recent approaches have shown that large-scale vision-language models such as CLIP can improve semantic segmentation performance. These methods typically aim for pixel-level vision-language alignment, but often rely on low resolution image features fr

Externí odkaz: http://arxiv.org/abs/2407.21654

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání