Zobrazeno 1 - 10
of 5 872
pro vyhledávání: '"A, Metaxas"'
We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space networks,
Externí odkaz:
http://arxiv.org/abs/2411.04168
Current cardiac cine magnetic resonance image (cMR) studies focus on the end diastole (ED) and end systole (ES) phases, while ignoring the abundant temporal information in the whole image sequence. This is because whole sequence segmentation is curre
Externí odkaz:
http://arxiv.org/abs/2410.23191
Autor:
He, Xiaoxiao, Han, Ligong, Dao, Quan, Wen, Song, Bai, Minhao, Liu, Di, Zhang, Han, Min, Martin Renqiang, Juefei-Xu, Felix, Tan, Chaowei, Liu, Bo, Li, Kang, Li, Hongdong, Huang, Junzhou, Ahmed, Faez, Srivastava, Akash, Metaxas, Dimitris
Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to ena
Externí odkaz:
http://arxiv.org/abs/2410.08207
Autor:
Chen, Yuxiao, Li, Kai, Bao, Wentao, Patel, Deep, Kong, Yu, Min, Martin Renqiang, Metaxas, Dimitris N.
Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between video segmen
Externí odkaz:
http://arxiv.org/abs/2409.16145
Leveraging multiple training datasets to scale up image segmentation models is beneficial for increasing robustness and semantic understanding. Individual datasets have well-defined ground truth with non-overlapping mask layouts and mutually exclusiv
Externí odkaz:
http://arxiv.org/abs/2409.09893
Autor:
Wu, Junda, Zhang, Zhehao, Xia, Yu, Li, Xintong, Xia, Zhaoyang, Chang, Aaron, Yu, Tong, Kim, Sungchul, Rossi, Ryan A., Zhang, Ruiyi, Mitra, Subrata, Metaxas, Dimitris N., Yao, Lina, Shang, Jingbo, McAuley, Julian
Multimodal large language models (MLLMs) equip pre-trained large-language models (LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied, visual prompting has emerged for more fine-grained and free-form visual instruc
Externí odkaz:
http://arxiv.org/abs/2409.15310
Autor:
Metaxas, Dimitrios
It is widely believed, and axiomatically postulated in mathematical quantum field theory, that the vacuum is a unique vector state (up to a phase factor). The recent solution of the quantum Yang-Mills theory of the strong interaction revealed the pre
Externí odkaz:
http://arxiv.org/abs/2409.01168
Autor:
Neidle, Carol, Opoku, Augustine, Ballard, Carey, Zhou, Yang, He, Xiaoxiao, Dimitriadis, Gregory, Metaxas, Dimitris
Looking up an unknown sign in an ASL dictionary can be difficult. Most ASL dictionaries are organized based on English glosses, despite the fact that (1) there is no convention for assigning English-based glosses to ASL signs; and (2) there is no 1-1
Externí odkaz:
http://arxiv.org/abs/2407.13571
Self-supervised learning has recently emerged as the preeminent pretraining paradigm across and between modalities, with remarkable results. In the image domain specifically, group (or cluster) discrimination has been one of the most successful metho
Externí odkaz:
http://arxiv.org/abs/2407.11168
Autor:
Jin, Can, Peng, Hongwu, Zhao, Shiyu, Wang, Zhenting, Xu, Wujiang, Han, Ligong, Zhao, Jiahui, Zhong, Kai, Rajasekaran, Sanguthevar, Metaxas, Dimitris N.
Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. E
Externí odkaz:
http://arxiv.org/abs/2406.14449