Zobrazeno 1 - 10
of 1 287
pro vyhledávání: '"Patras, P."'
To address computational and memory limitations of Large Multimodal Models in the Video Question-Answering task, several recent methods extract textual representations per frame (e.g., by captioning) and feed them to a Large Language Model (LLM) that
Externí odkaz:
http://arxiv.org/abs/2412.17415
Autor:
Alwazzan, Omnia, Gallagher-Syed, Amaya, Millner, Thomas O., Brandner, Sebastian, Patras, Ioannis, Marino, Silvia, Slabaugh, Gregory
The integration of DNA methylation data with a Whole Slide Image (WSI) offers significant potential for enhancing the diagnostic precision of central nervous system (CNS) tumor classification in neuropathology. While existing approaches typically int
Externí odkaz:
http://arxiv.org/abs/2411.17418
Vision-Language Models (VLMs) are crucial for applications requiring integrated understanding textual and visual information. However, existing VLMs struggle with long videos due to computational inefficiency, memory limitations, and difficulties in
Externí odkaz:
http://arxiv.org/abs/2411.15556
In this paper, we derive a Chen-Strichartz formula for stochastic differential equations driven by Levy processes, that is, we derive a series expansion of the logarithm of the flowmap of the stochastic differential equation in terms of commutators o
Externí odkaz:
http://arxiv.org/abs/2411.06827
Despite their success and widespread adoption, the opaque nature of deep neural networks (DNNs) continues to hinder trust, especially in critical applications. Current interpretability solutions often yield inconsistent or oversimplified explanations
Externí odkaz:
http://arxiv.org/abs/2410.05484
Privacy issue is a main concern in developing face recognition techniques. Although synthetic face images can partially mitigate potential legal risks while maintaining effective face recognition (FR) performance, FR models trained by face images syn
Externí odkaz:
http://arxiv.org/abs/2409.18876
In this paper, we introduce Behavior4All, a comprehensive, open-source toolkit for in-the-wild facial behavior analysis, integrating Face Localization, Valence-Arousal Estimation, Basic Expression Recognition and Action Unit Detection, all within a s
Externí odkaz:
http://arxiv.org/abs/2409.17717
Generating human portraits is a hot topic in the image generation area, e.g. mask-to-face generation and text-to-face generation. However, these unimodal generation methods lack controllability in image generation. Controllability can be enhanced by
Externí odkaz:
http://arxiv.org/abs/2409.11010
Learning with Noisy labels (LNL) poses a significant challenge for the Machine Learning community. Some of the most widely used approaches that select as clean samples for which the model itself (the in-training model) has high confidence, e.g., `sma
Externí odkaz:
http://arxiv.org/abs/2408.10012
The steady improvement of Diffusion Models for visual synthesis has given rise to many new and interesting use cases of synthetic images but also has raised concerns about their potential abuse, which poses significant societal threats. To address th
Externí odkaz:
http://arxiv.org/abs/2408.09153