Výsledky vyhledávání

Report

Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

Autor: Sung-Bin, Kim, Senocak, Arda, Ha, Hyunwoo, Oh, Tae-Hyun

How does audio describe the world around us? In this work, we propose a method for generating images of visual scenes from diverse in-the-wild sounds. This cross-modal generation task is challenging due to the significant information gap between audi

Externí odkaz: http://arxiv.org/abs/2412.06209

Zobrazit plný text záznamu

Report

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Autor: Sung-Bin, Kim, Hyun-Bin, Oh, Lee, JungMok, Senocak, Arda, Chung, Joon Son, Oh, Tae-Hyun

Following the success of Large Language Models (LLMs), expanding their boundaries to new modalities represents a significant paradigm shift in multimodal understanding. Human perception is inherently multimodal, relying not only on text but also on a

Externí odkaz: http://arxiv.org/abs/2410.18325

Zobrazit plný text záznamu

Report

Non-overlapping, Schwarz-type Domain Decomposition Method for Physics and Equality Constrained Artificial Neural Networks

Autor: Hu, Qifeng, Basir, Shamsulhaq, Senocak, Inanc

We present a non-overlapping, Schwarz-type domain decomposition method with a generalized interface condition, designed for physics-informed machine learning of partial differential equations (PDEs) in both forward and inverse contexts. Our approach

Externí odkaz: http://arxiv.org/abs/2409.13644

Zobrazit plný text záznamu

Report

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Autor: Senocak, Arda, Ryu, Hyeonggon, Kim, Junsik, Oh, Tae-Hyun, Pfister, Hanspeter, Chung, Joon Son

Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interac

Externí odkaz: http://arxiv.org/abs/2407.13676

Zobrazit plný text záznamu

Report

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

Autor: Feng, Jiu, Erol, Mehmet Hamza, Chung, Joon Son, Senocak, Arda

Transformers have rapidly overtaken CNN-based architectures as the new standard in audio classification. Transformer-based models, such as the Audio Spectrogram Transformers (AST), also inherit the fixed-size input paradigm from CNNs. However, this l

Externí odkaz: http://arxiv.org/abs/2407.08691

Zobrazit plný text záznamu

Report

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Autor: Erol, Mehmet Hamza, Senocak, Arda, Feng, Jiu, Chung, Joon Son

Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-atten

Externí odkaz: http://arxiv.org/abs/2406.03344

Zobrazit plný text záznamu

Report

Spontaneous onset of three-dimensional motion with subsequent spatial and temporal reduction in convective flow systems

Autor: Stofanak, Patrick J., Xiao, Cheng-Nian, Senocak, Inanc

We study the spontaneous emergence of three-dimensional motion from a quiescent, pure conduction state in stably stratified, convective flow within a triangular enclosure, which eventually self-organizes into a two-dimensional steady state. This phen

Externí odkaz: http://arxiv.org/abs/2312.14887

Zobrazit plný text záznamu

Report

Can CLIP Help Sound Source Localization?

Autor: Park, Sooyoung, Senocak, Arda, Chung, Joon Son

Large-scale pre-trained image-text models demonstrate remarkable versatility across diverse tasks, benefiting from their robust representational capabilities and effective multimodal alignment. We extend the application of these models, specifically

Externí odkaz: http://arxiv.org/abs/2311.04066

Zobrazit plný text záznamu

Akademický článek

Geometric approaches to establish the fundamentals of Lorentz spaces $\mathbb{R}_2^3$ and $\mathbb{R}_1^2$

Autor: Sevilay Çoruh Şenocak, Salim Yüce

Publikováno v: Mathematica Bohemica, Vol 149, Iss 4, Pp 549-567 (2024)

The aim of this paper is to investigate the orthogonality of vectors to each other and the Gram-Schmidt method in the Minkowski space $\mathbb{R}_2^3$. Hyperbolic cosine formulas are given for all triangle types in the Minkowski plane $\mathbb{R}_1^2

Externí odkaz: https://doaj.org/article/5233960e5ea64782bc933f54b981276e

Zobrazit plný text záznamu

Report

Sound Source Localization is All about Cross-Modal Alignment

Autor: Senocak, Arda, Ryu, Hyeonggon, Kim, Junsik, Oh, Tae-Hyun, Pfister, Hanspeter, Chung, Joon Son

Humans can easily perceive the direction of sound sources in a visual scene, termed sound source localization. Recent studies on learning-based sound source localization have mainly explored the problem from a localization perspective. However, prior

Externí odkaz: http://arxiv.org/abs/2309.10724

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání