Výsledky vyhledávání

Report

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Autor: Ma, Yiyang, Liu, Xingchao, Chen, Xiaokang, Liu, Wen, Wu, Chengyue, Wu, Zhiyu, Pan, Zizheng, Xie, Zhenda, Zhang, Haowei, yu, Xingkai, Zhao, Liang, Wang, Yisong, Liu, Jiaying, Ruan, Chong

We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates autoregressive language models with rectified flow, a state-of-the-art method

Externí odkaz: http://arxiv.org/abs/2411.07975

Zobrazit plný text záznamu

Report

Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition

Autor: Lin, Lilang, Wu, Lehong, Zhang, Jiahang, Liu, Jiaying

Generative models, as a powerful technique for generation, also gradually become a critical tool for recognition tasks. However, in skeleton-based action recognition, the features obtained from existing pre-trained generative methods contain redundan

Externí odkaz: http://arxiv.org/abs/2410.20349

Zobrazit plný text záznamu

Report

MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion

Autor: Wu, Lehong, Lin, Lilang, Zhang, Jiahang, Ma, Yiyang, Liu, Jiaying

Self-supervised learning has proved effective for skeleton-based human action understanding. However, previous works either rely on contrastive learning that suffers false negative problems or are based on reconstruction that learns too much unessent

Externí odkaz: http://arxiv.org/abs/2409.10473

Zobrazit plný text záznamu

Report

FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation

Autor: Gao, Xiang, Liu, Jiaying

Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing wonderful image generation with natural-language text prompt. However, the issue of lacking controlla

Externí odkaz: http://arxiv.org/abs/2408.00998

Zobrazit plný text záznamu

Report

Intelligent Artistic Typography: A Comprehensive Review of Artistic Text Design and Generation

Autor: Bai, Yuhang, Huang, Zichuan, Gao, Wenshuo, Yang, Shuai, Liu, Jiaying

Artistic text generation aims to amplify the aesthetic qualities of text while maintaining readability. It can make the text more attractive and better convey its expression, thus enjoying a wide range of application scenarios such as social media di

Externí odkaz: http://arxiv.org/abs/2407.14774

Zobrazit plný text záznamu

Report

Shap-Mix: Shapley Value Guided Mixing for Long-Tailed Skeleton Based Action Recognition

Autor: Zhang, Jiahang, Lin, Lilang, Liu, Jiaying

In real-world scenarios, human actions often fall into a long-tailed distribution. It makes the existing skeleton-based action recognition works, which are mostly designed based on balanced datasets, suffer from a sharp performance degradation. Recen

Externí odkaz: http://arxiv.org/abs/2407.12312

Zobrazit plný text záznamu

Report

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Autor: Gao, Xiang, Xu, Zhengbo, Zhao, Junhan, Liu, Jiaying

Publikováno v: Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(3), 1824-1832

Recently, large-scale text-to-image (T2I) diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing open-domain image translation via user-provided text prompts. This paper proposes frequency-controlled diffusion

Externí odkaz: http://arxiv.org/abs/2407.03006

Zobrazit plný text záznamu

Report

Coding for Intelligence from the Perspective of Category

Autor: Yang, Wenhan, Hu, Zixuan, Lin, Lilang, Liu, Jiaying, Duan, Ling-Yu

Coding, which targets compressing and reconstructing data, and intelligence, often regarded at an abstract computational level as being centered around model learning and prediction, interweave recently to give birth to a series of significant progre

Externí odkaz: http://arxiv.org/abs/2407.01017

Zobrazit plný text záznamu

Report

Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on Depression

Autor: Liu, Jiaying Lizzy, Wang, Yunlong, Lyu, Yao, Su, Yiheng, Niu, Shuo, Xu, Xuhai Orson, Zhang, Yan

Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by

Externí odkaz: http://arxiv.org/abs/2406.19528

Zobrazit plný text záznamu

Report

Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond

Autor: Zhang, Jiahang, Lin, Lilang, Yang, Shuai, Liu, Jiaying

Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for skeleton-based action understanding. Different from the image domain, skeleton data possesses sparser spatial stru

Externí odkaz: http://arxiv.org/abs/2406.02978

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání