Výsledky vyhledávání - "Xu, MengMeng"

Dissertation/ Thesis

Autor: Xu, Mengmeng

The growth of digital cameras and data communication has led to an exponential increase in video production and dissemination. As a result, automatic video analysis and understanding has become a crucial research topic in the computer vision communit

Externí odkaz: http://hdl.handle.net/10754/691538

Zobrazit plný text záznamu

Report

Move Anything with Layered Scene Diffusion

Autor: Ren, Jiawei, Xu, Mengmeng, Wu, Jui-Chieh, Liu, Ziwei, Xiang, Tao, Toisoul, Antoine

Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts? Recent works generate controllable scenes via learning spatially disentangled latent codes, but these methods do not apply to diff

Externí odkaz: http://arxiv.org/abs/2404.07178

Zobrazit plný text záznamu

Report

Faster Diffusion via Temporal Attention Decomposition

Autor: Liu, Haozhe, Zhang, Wentian, Xie, Jinheng, Faccio, Francesco, Xu, Mengmeng, Xiang, Tao, Shou, Mike Zheng, Perez-Rua, Juan-Manuel, Schmidhuber, Jürgen

We explore the role of attention mechanism during inference in text-conditional diffusion models. Empirical observations suggest that cross-attention outputs converge to a fixed point after several inference steps. The convergence time naturally divi

Externí odkaz: http://arxiv.org/abs/2404.02747

Zobrazit plný text záznamu

Report

Complex Hyperbolic Geometry of Chain Links

Autor: Ma, Jiming, Xie, Baohua, Xu, Mengmeng

The complex hyperbolic triangle group $\Gamma=\Delta_{4,\infty,\infty;\infty}$ acting on the complex hyperbolic plane ${\bf H}^2_{\mathbb C}$ is generated by complex reflections $I_1$, $I_2$, $I_3$ such that the product $I_2I_3$ has order four, the p

Externí odkaz: http://arxiv.org/abs/2403.01531

Zobrazit plný text záznamu

Report

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

Autor: Simon, Christian, He, Sen, Perez-Rua, Juan-Manuel, Xu, Mengmeng, Benhalloum, Amine, Xiang, Tao

Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the

Externí odkaz: http://arxiv.org/abs/2312.16218

Zobrazit plný text záznamu

Report

GenTron: Diffusion Transformers for Image and Video Generation

Autor: Chen, Shoufa, Xu, Mengmeng, Ren, Jiawei, Cong, Yuren, He, Sen, Xie, Yanping, Sinha, Animesh, Luo, Ping, Xiang, Tao, Perez-Rua, Juan-Manuel

In this study, we explore Transformer-based diffusion models for image and video generation. Despite the dominance of Transformer architectures in various fields due to their flexibility and scalability, the visual generative domain primarily utilize

Externí odkaz: http://arxiv.org/abs/2312.04557

Zobrazit plný text záznamu

Report

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Autor: Cong, Yuren, Xu, Mengmeng, Simon, Christian, Chen, Shoufa, Ren, Jiawei, Xie, Yanping, Perez-Rua, Juan-Manuel, Rosenhahn, Bodo, Xiang, Tao, He, Sen

Text-to-video editing aims to edit the visual appearance of a source video conditional on textual prompts. A major challenge in this task is to ensure that all frames in the edited video are visually consistent. Most recent works apply advanced text-

Externí odkaz: http://arxiv.org/abs/2310.05922

Zobrazit plný text záznamu

Akademický článek

UAV flight control system based on STM32F4 and FreeRTOS

Autor: ZHOU Zhiguang, XU Mengmeng, SHI Meilin, LI Qingyuan, WANG Fan

Publikováno v: Hangkong gongcheng jinzhan, Vol 15, Iss 4, Pp 93-99 (2024)

The high-performance CPU is taken as processors and Vxworks is taken as the design kernels for most flight control systems, which has the problems of high cost, large size and non-disclosure of kernel source code. A low-cost and reliable flight contr

Externí odkaz: https://doaj.org/article/79073261f14945478f381b83b82555f8

Zobrazit plný text záznamu

Report

Mindstorms in Natural Language-Based Societies of Mind

Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of

Externí odkaz: http://arxiv.org/abs/2305.17066

Zobrazit plný text záznamu

Report

Boundary-Denoising for Video Activity Localization

Autor: Xu, Mengmeng, Soldan, Mattia, Gao, Jialin, Liu, Shuming, Pérez-Rúa, Juan-Manuel, Ghanem, Bernard

Video activity localization aims at understanding the semantic content in long untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action detection,

Externí odkaz: http://arxiv.org/abs/2304.02934

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání