Zobrazeno 1 - 10
of 1 784
pro vyhledávání: '"Moubayed, A."'
Autor:
Hudson, G. Thomas, Slack, Dean, Winterbottom, Thomas, Sterling, Jamie, Xiao, Chenghao, Shentu, Junjie, Moubayed, Noura Al
Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Tradition
Externí odkaz:
http://arxiv.org/abs/2411.10503
In the fast-paced and volatile financial markets, accurately predicting stock movements based on financial news is critical for investors and analysts. Traditional models often struggle to capture the intricate and dynamic relationships between news
Externí odkaz:
http://arxiv.org/abs/2409.00438
Autor:
Zhao, Kun, Xiao, Chenghao, Tang, Chen, Yang, Bohao, Ye, Kai, Moubayed, Noura Al, Zhan, Liang, Lin, Chenghua
Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG w
Externí odkaz:
http://arxiv.org/abs/2406.17911
With the unprecedented performance being achieved by text-to-image (T2I) diffusion models, T2I customization further empowers users to tailor the diffusion model to new concepts absent in the pre-training dataset, termed subject-driven generation. Mo
Externí odkaz:
http://arxiv.org/abs/2405.17965
Autor:
Winterbottom, Thomas, Hudson, G. Thomas, Kluvanec, Daniel, Slack, Dean, Sterling, Jamie, Shentu, Junjie, Xiao, Chenghao, Zhou, Zheming, Moubayed, Noura Al
Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data. Inspired by the empirical success of causal language modelling and next-token prediction in language modelling, we explore the extent to
Externí odkaz:
http://arxiv.org/abs/2405.17450
Autor:
Stirling, Jamie, Al-Moubayed, Noura
Compositional image generation requires models to generalise well in situations where two or more input concepts do not necessarily appear together in training (compositional generalisation). Despite recent progress in compositional image generation
Externí odkaz:
http://arxiv.org/abs/2405.06535
Semantic textual similartiy (STS) and information retrieval tasks (IR) tasks have been the two major avenues to record the progress of embedding models in the past few years. Under the emerging Retrieval-augmented Generation (RAG) paradigm, we envisi
Externí odkaz:
http://arxiv.org/abs/2404.06347
Achieving an effective fine-grained appearance variation over 2D facial images, whilst preserving facial identity, is a challenging task due to the high complexity and entanglement of common 2D facial feature encoding spaces. Despite these challenges
Externí odkaz:
http://arxiv.org/abs/2403.19897
Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, faci
Externí odkaz:
http://arxiv.org/abs/2402.09966
Autor:
Xiao, Chenghao, Huang, Zhuoxu, Chen, Danlu, Hudson, G Thomas, Li, Yizhi, Duan, Haoran, Lin, Chenghua, Fu, Jie, Han, Jungong, Moubayed, Noura Al
Pretrained language models are long known to be subpar in capturing sentence and document-level semantics. Though heavily investigated, transferring perturbation-based methods from unsupervised visual representation learning to NLP remains an unsolve
Externí odkaz:
http://arxiv.org/abs/2402.08183