Zobrazeno 1 - 10
of 1 272
pro vyhledávání: '"Xu, MengMeng"'
Autor:
Xu, Mengmeng
The growth of digital cameras and data communication has led to an exponential increase in video production and dissemination. As a result, automatic video analysis and understanding has become a crucial research topic in the computer vision communit
Externí odkaz:
http://hdl.handle.net/10754/691538
Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts? Recent works generate controllable scenes via learning spatially disentangled latent codes, but these methods do not apply to diff
Externí odkaz:
http://arxiv.org/abs/2404.07178
Autor:
Liu, Haozhe, Zhang, Wentian, Xie, Jinheng, Faccio, Francesco, Xu, Mengmeng, Xiang, Tao, Shou, Mike Zheng, Perez-Rua, Juan-Manuel, Schmidhuber, Jürgen
We explore the role of attention mechanism during inference in text-conditional diffusion models. Empirical observations suggest that cross-attention outputs converge to a fixed point after several inference steps. The convergence time naturally divi
Externí odkaz:
http://arxiv.org/abs/2404.02747
The complex hyperbolic triangle group $\Gamma=\Delta_{4,\infty,\infty;\infty}$ acting on the complex hyperbolic plane ${\bf H}^2_{\mathbb C}$ is generated by complex reflections $I_1$, $I_2$, $I_3$ such that the product $I_2I_3$ has order four, the p
Externí odkaz:
http://arxiv.org/abs/2403.01531
Autor:
Simon, Christian, He, Sen, Perez-Rua, Juan-Manuel, Xu, Mengmeng, Benhalloum, Amine, Xiang, Tao
Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the
Externí odkaz:
http://arxiv.org/abs/2312.16218
Autor:
Chen, Shoufa, Xu, Mengmeng, Ren, Jiawei, Cong, Yuren, He, Sen, Xie, Yanping, Sinha, Animesh, Luo, Ping, Xiang, Tao, Perez-Rua, Juan-Manuel
In this study, we explore Transformer-based diffusion models for image and video generation. Despite the dominance of Transformer architectures in various fields due to their flexibility and scalability, the visual generative domain primarily utilize
Externí odkaz:
http://arxiv.org/abs/2312.04557
Autor:
Cong, Yuren, Xu, Mengmeng, Simon, Christian, Chen, Shoufa, Ren, Jiawei, Xie, Yanping, Perez-Rua, Juan-Manuel, Rosenhahn, Bodo, Xiang, Tao, He, Sen
Text-to-video editing aims to edit the visual appearance of a source video conditional on textual prompts. A major challenge in this task is to ensure that all frames in the edited video are visually consistent. Most recent works apply advanced text-
Externí odkaz:
http://arxiv.org/abs/2310.05922
Publikováno v:
Hangkong gongcheng jinzhan, Vol 15, Iss 4, Pp 93-99 (2024)
The high-performance CPU is taken as processors and Vxworks is taken as the design kernels for most flight control systems, which has the problems of high cost, large size and non-disclosure of kernel source code. A low-cost and reliable flight contr
Externí odkaz:
https://doaj.org/article/79073261f14945478f381b83b82555f8
Autor:
Zhuge, Mingchen, Liu, Haozhe, Faccio, Francesco, Ashley, Dylan R., Csordás, Róbert, Gopalakrishnan, Anand, Hamdi, Abdullah, Hammoud, Hasan Abed Al Kader, Herrmann, Vincent, Irie, Kazuki, Kirsch, Louis, Li, Bing, Li, Guohao, Liu, Shuming, Mai, Jinjie, Piękos, Piotr, Ramesh, Aditya, Schlag, Imanol, Shi, Weimin, Stanić, Aleksandar, Wang, Wenyi, Wang, Yuhui, Xu, Mengmeng, Fan, Deng-Ping, Ghanem, Bernard, Schmidhuber, Jürgen
Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of
Externí odkaz:
http://arxiv.org/abs/2305.17066
Autor:
Xu, Mengmeng, Soldan, Mattia, Gao, Jialin, Liu, Shuming, Pérez-Rúa, Juan-Manuel, Ghanem, Bernard
Video activity localization aims at understanding the semantic content in long untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action detection,
Externí odkaz:
http://arxiv.org/abs/2304.02934