Zobrazeno 1 - 10
of 102
pro vyhledávání: '"Fan, Mingyuan"'
This paper explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation, termed as FluxMusic. Generally, along with design in advanced Flux\footnote{https://github.com/black-forest-labs/flux} model, we trans
Externí odkaz:
http://arxiv.org/abs/2409.00587
With increasing concerns and regulations on data privacy, fine-tuning pretrained language models (PLMs) in federated learning (FL) has become a common paradigm for NLP tasks. Despite being extensively studied, the existing methods for this problem st
Externí odkaz:
http://arxiv.org/abs/2409.00116
Text-to-image diffusion models have shown the ability to learn a diverse range of concepts. However, it is worth noting that they may also generate undesirable outputs, consequently giving rise to significant security concerns. Specifically, issues s
Externí odkaz:
http://arxiv.org/abs/2408.01014
In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is scalable and competitive with dense networks while exhibiting highly optimized inference. The DiT-MoE includes two simple designs: shared expert routing and exp
Externí odkaz:
http://arxiv.org/abs/2407.11633
Adversarial attack has garnered considerable attention due to its profound implications for the secure deployment of robots in sensitive security scenarios. To potentially push for advances in the field, this paper studies the adversarial attack in t
Externí odkaz:
http://arxiv.org/abs/2407.11073
This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba sequentially stacked blocks alternate between Transformer and Mamba layers, and
Externí odkaz:
http://arxiv.org/abs/2406.01159
Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffu
Externí odkaz:
http://arxiv.org/abs/2404.13358
Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for their application in long-context tasks, such as high-resolution image gener
Externí odkaz:
http://arxiv.org/abs/2404.04478
This paper presents a new exploration into a category of diffusion models built upon state space architecture. We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, funct
Externí odkaz:
http://arxiv.org/abs/2402.05608
Autor:
Duan, Xiaoyue, Cui, Shuhao, Kang, Guoliang, Zhang, Baochang, Fei, Zhengcong, Fan, Mingyuan, Huang, Junshi
Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e.g., changing postures) to the main objects in the input image without changing their identity or attributes. To guarantee consistent attributes, som
Externí odkaz:
http://arxiv.org/abs/2312.14611