Výsledky vyhledávání

Report

DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation

Autor: Phung, Hao, Dao, Quan, Dao, Trung, Phan, Hoang, Metaxas, Dimitris, Tran, Anh

We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space networks,

Externí odkaz: http://arxiv.org/abs/2411.04168

Zobrazit plný text záznamu

Report

Continuous Spatio-Temporal Memory Networks for 4D Cardiac Cine MRI Segmentation

Autor: Ye, Meng, Xin, Bingyu, Axel, Leon, Metaxas, Dimitris

Current cardiac cine magnetic resonance image (cMR) studies focus on the end diastole (ED) and end systole (ES) phases, while ignoring the abundant temporal information in the whole image sequence. This is because whole sequence segmentation is curre

Externí odkaz: http://arxiv.org/abs/2410.23191

Zobrazit plný text záznamu

Report

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

Autor: He, Xiaoxiao, Han, Ligong, Dao, Quan, Wen, Song, Bai, Minhao, Liu, Di, Zhang, Han, Min, Martin Renqiang, Juefei-Xu, Felix, Tan, Chaowei, Liu, Bo, Li, Kang, Li, Hongdong, Huang, Junzhou, Ahmed, Faez, Srivastava, Akash, Metaxas, Dimitris

Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to ena

Externí odkaz: http://arxiv.org/abs/2410.08207

Zobrazit plný text záznamu

Report

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Autor: Chen, Yuxiao, Li, Kai, Bao, Wentao, Patel, Deep, Kong, Yu, Min, Martin Renqiang, Metaxas, Dimitris N.

Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between video segmen

Externí odkaz: http://arxiv.org/abs/2409.16145

Zobrazit plný text záznamu

Report

Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation

Autor: Zhangli, Qilong, Liu, Di, Aich, Abhishek, Metaxas, Dimitris, Schulter, Samuel

Leveraging multiple training datasets to scale up image segmentation models is beneficial for increasing robustness and semantic understanding. Individual datasets have well-defined ground truth with non-overlapping mask layouts and mutually exclusiv

Externí odkaz: http://arxiv.org/abs/2409.09893

Zobrazit plný text záznamu

Report

Visual Prompting in Multimodal Large Language Models: A Survey

Autor: Wu, Junda, Zhang, Zhehao, Xia, Yu, Li, Xintong, Xia, Zhaoyang, Chang, Aaron, Yu, Tong, Kim, Sungchul, Rossi, Ryan A., Zhang, Ruiyi, Mitra, Subrata, Metaxas, Dimitris N., Yao, Lina, Shang, Jingbo, McAuley, Julian

Multimodal large language models (MLLMs) equip pre-trained large-language models (LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied, visual prompting has emerged for more fine-grained and free-form visual instruc

Externí odkaz: http://arxiv.org/abs/2409.15310

Zobrazit plný text záznamu

Report

Confining quantum field theories

Autor: Metaxas, Dimitrios

It is widely believed, and axiomatically postulated in mathematical quantum field theory, that the vacuum is a unique vector state (up to a phase factor). The recent solution of the quantum Yang-Mills theory of the strong interaction revealed the pre

Externí odkaz: http://arxiv.org/abs/2409.01168

Zobrazit plný text záznamu

Report

New Capability to Look Up an ASL Sign from a Video Example

Autor: Neidle, Carol, Opoku, Augustine, Ballard, Carey, Zhou, Yang, He, Xiaoxiao, Dimitriadis, Gregory, Metaxas, Dimitris

Looking up an unknown sign in an ASL dictionary can be difficult. Most ASL dictionaries are organized based on English glosses, despite the fact that (1) there is no convention for assigning English-based glosses to ASL signs; and (2) there is no 1-1

Externí odkaz: http://arxiv.org/abs/2407.13571

Zobrazit plný text záznamu

Report

Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing

Autor: Metaxas, Ioannis Maniadis, Tzimiropoulos, Georgios, Patras, Ioannis

Self-supervised learning has recently emerged as the preeminent pretraining paradigm across and between modalities, with remarkable results. In the image domain specifically, group (or cluster) discrimination has been one of the most successful metho

Externí odkaz: http://arxiv.org/abs/2407.11168

Zobrazit plný text záznamu

Report

APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

Autor: Jin, Can, Peng, Hongwu, Zhao, Shiyu, Wang, Zhenting, Xu, Wujiang, Han, Ligong, Zhao, Jiahui, Zhong, Kai, Rajasekaran, Sanguthevar, Metaxas, Dimitris N.

Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. E

Externí odkaz: http://arxiv.org/abs/2406.14449

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání