Zobrazeno 1 - 10
of 1 626
pro vyhledávání: '"Metaxas P"'
Autor:
Wu, Yushu, Zhang, Zhixing, Li, Yanyu, Xu, Yanwu, Kag, Anil, Sui, Yang, Coskun, Huseyin, Ma, Ke, Lebedev, Aleksei, Hu, Ju, Metaxas, Dimitris, Wang, Yanzhi, Tulyakov, Sergey, Ren, Jian
We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from arbitrary
Externí odkaz:
http://arxiv.org/abs/2412.10494
Autor:
Ouali, Yassine, Bulat, Adrian, Xenos, Alexandros, Zaganidis, Anestis, Metaxas, Ioannis Maniadis, Martinez, Brais, Tzimiropoulos, Georgios
Contrastively-trained Vision-Language Models (VLMs) like CLIP have become the de facto approach for discriminative vision-language representation learning. However, these models have limited language understanding, often exhibiting a "bag of words" b
Externí odkaz:
http://arxiv.org/abs/2412.04378
Autor:
Zhao, Shiyu, Wang, Zhenting, Juefei-Xu, Felix, Xia, Xide, Liu, Miao, Wang, Xiaofang, Liang, Mingfu, Zhang, Ning, Metaxas, Dimitris N., Yu, Licheng
Prevailing Multimodal Large Language Models (MLLMs) encode the input image(s) as vision tokens and feed them into the language backbone, similar to how Large Language Models (LLMs) process the text tokens. However, the number of vision tokens increas
Externí odkaz:
http://arxiv.org/abs/2412.00556
Diffusion models (DMs) excel in photorealism, image editing, and solving inverse problems, aided by classifier-free guidance and image inversion techniques. However, rectified flow models (RFMs) remain underexplored for these tasks. Existing DM-based
Externí odkaz:
http://arxiv.org/abs/2412.00100
Multi-planar tagged MRI is the gold standard for regional heart wall motion evaluation. However, accurate recovery of the 3D true heart wall motion from a set of 2D apparent motion cues is challenging, due to incomplete sampling of the true motion an
Externí odkaz:
http://arxiv.org/abs/2411.15233
We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space networks,
Externí odkaz:
http://arxiv.org/abs/2411.04168
Current cardiac cine magnetic resonance image (cMR) studies focus on the end diastole (ED) and end systole (ES) phases, while ignoring the abundant temporal information in the whole image sequence. This is because whole sequence segmentation is curre
Externí odkaz:
http://arxiv.org/abs/2410.23191
Autor:
He, Xiaoxiao, Han, Ligong, Dao, Quan, Wen, Song, Bai, Minhao, Liu, Di, Zhang, Han, Min, Martin Renqiang, Juefei-Xu, Felix, Tan, Chaowei, Liu, Bo, Li, Kang, Li, Hongdong, Huang, Junzhou, Ahmed, Faez, Srivastava, Akash, Metaxas, Dimitris
Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to ena
Externí odkaz:
http://arxiv.org/abs/2410.08207
Autor:
Chen, Yuxiao, Li, Kai, Bao, Wentao, Patel, Deep, Kong, Yu, Min, Martin Renqiang, Metaxas, Dimitris N.
Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between video segmen
Externí odkaz:
http://arxiv.org/abs/2409.16145
Leveraging multiple training datasets to scale up image segmentation models is beneficial for increasing robustness and semantic understanding. Individual datasets have well-defined ground truth with non-overlapping mask layouts and mutually exclusiv
Externí odkaz:
http://arxiv.org/abs/2409.09893