Výsledky vyhledávání

Report

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Autor: Wu, Yushu, Zhang, Zhixing, Li, Yanyu, Xu, Yanwu, Kag, Anil, Sui, Yang, Coskun, Huseyin, Ma, Ke, Lebedev, Aleksei, Hu, Ju, Metaxas, Dimitris, Wang, Yanzhi, Tulyakov, Sergey, Ren, Jian

We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from arbitrary

Externí odkaz: http://arxiv.org/abs/2412.10494

Zobrazit plný text záznamu

Report

Discriminative Fine-tuning of LVLMs

Autor: Ouali, Yassine, Bulat, Adrian, Xenos, Alexandros, Zaganidis, Anestis, Metaxas, Ioannis Maniadis, Martinez, Brais, Tzimiropoulos, Georgios

Contrastively-trained Vision-Language Models (VLMs) like CLIP have become the de facto approach for discriminative vision-language representation learning. However, these models have limited language understanding, often exhibiting a "bag of words" b

Externí odkaz: http://arxiv.org/abs/2412.04378

Zobrazit plný text záznamu

Report

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

Autor: Zhao, Shiyu, Wang, Zhenting, Juefei-Xu, Felix, Xia, Xide, Liu, Miao, Wang, Xiaofang, Liang, Mingfu, Zhang, Ning, Metaxas, Dimitris N., Yu, Licheng

Prevailing Multimodal Large Language Models (MLLMs) encode the input image(s) as vision tokens and feed them into the language backbone, similar to how Large Language Models (LLMs) process the text tokens. However, the number of vision tokens increas

Externí odkaz: http://arxiv.org/abs/2412.00556

Zobrazit plný text záznamu

Report

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation

Autor: Patel, Maitreya, Wen, Song, Metaxas, Dimitris N., Yang, Yezhou

Diffusion models (DMs) excel in photorealism, image editing, and solving inverse problems, aided by classifier-free guidance and image inversion techniques. However, rectified flow models (RFMs) remain underexplored for these tasks. Existing DM-based

Externí odkaz: http://arxiv.org/abs/2412.00100

Zobrazit plný text záznamu

Report

Learning Volumetric Neural Deformable Models to Recover 3D Regional Heart Wall Motion from Multi-Planar Tagged MRI

Autor: Ye, Meng, Xin, Bingyu, Guo, Bangwei, Axel, Leon, Metaxas, Dimitris

Multi-planar tagged MRI is the gold standard for regional heart wall motion evaluation. However, accurate recovery of the 3D true heart wall motion from a set of 2D apparent motion cues is challenging, due to incomplete sampling of the true motion an

Externí odkaz: http://arxiv.org/abs/2411.15233

Zobrazit plný text záznamu

Report

DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation

Autor: Phung, Hao, Dao, Quan, Dao, Trung, Phan, Hoang, Metaxas, Dimitris, Tran, Anh

We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space networks,

Externí odkaz: http://arxiv.org/abs/2411.04168

Zobrazit plný text záznamu

Report

Continuous Spatio-Temporal Memory Networks for 4D Cardiac Cine MRI Segmentation

Autor: Ye, Meng, Xin, Bingyu, Axel, Leon, Metaxas, Dimitris

Current cardiac cine magnetic resonance image (cMR) studies focus on the end diastole (ED) and end systole (ES) phases, while ignoring the abundant temporal information in the whole image sequence. This is because whole sequence segmentation is curre

Externí odkaz: http://arxiv.org/abs/2410.23191

Zobrazit plný text záznamu

Report

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

Autor: He, Xiaoxiao, Han, Ligong, Dao, Quan, Wen, Song, Bai, Minhao, Liu, Di, Zhang, Han, Min, Martin Renqiang, Juefei-Xu, Felix, Tan, Chaowei, Liu, Bo, Li, Kang, Li, Hongdong, Huang, Junzhou, Ahmed, Faez, Srivastava, Akash, Metaxas, Dimitris

Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to ena

Externí odkaz: http://arxiv.org/abs/2410.08207

Zobrazit plný text záznamu

Report

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Autor: Chen, Yuxiao, Li, Kai, Bao, Wentao, Patel, Deep, Kong, Yu, Min, Martin Renqiang, Metaxas, Dimitris N.

Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between video segmen

Externí odkaz: http://arxiv.org/abs/2409.16145

Zobrazit plný text záznamu

Report

Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation

Autor: Zhangli, Qilong, Liu, Di, Aich, Abhishek, Metaxas, Dimitris, Schulter, Samuel

Leveraging multiple training datasets to scale up image segmentation models is beneficial for increasing robustness and semantic understanding. Individual datasets have well-defined ground truth with non-overlapping mask layouts and mutually exclusiv

Externí odkaz: http://arxiv.org/abs/2409.09893

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání