Výsledky vyhledávání

Report

Improved Video VAE for Latent Video Diffusion Model

Autor: Wu, Pingyu, Zhu, Kai, Liu, Yu, Zhao, Liming, Zhai, Wei, Cao, Yang, Zha, Zheng-Jun

Variational Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora and other latent video diffusion generation models. While most of existing video VAEs inflate a pretrained image V

Externí odkaz: http://arxiv.org/abs/2411.06449

Zobrazit plný text záznamu

Report

EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting

Autor: Liao, Bohao, Zhai, Wei, Wan, Zengyu, Zhang, Tianzhu, Cao, Yang, Zha, Zheng-Jun

Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF o

Externí odkaz: http://arxiv.org/abs/2410.15392

Zobrazit plný text záznamu

Report

Visual-Geometric Collaborative Guidance for Affordance Learning

Autor: Luo, Hongchen, Zhai, Wei, Wang, Jiao, Cao, Yang, Zha, Zheng-Jun

Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions. Prevailing afforda

Externí odkaz: http://arxiv.org/abs/2410.11363

Zobrazit plný text záznamu

Report

MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

Autor: Zhai, Wei, Bai, Nan, Zhao, Qing, Li, Jianqiang, Wang, Fan, Qi, Hongzhi, Jiang, Meng, Wang, Xiaoqin, Yang, Bing Xiang, Fu, Guanghui

As the prevalence of mental health challenges, social media has emerged as a key platform for individuals to express their emotions.Deep learning tends to be a promising solution for analyzing mental health on social media. However, black box models

Externí odkaz: http://arxiv.org/abs/2410.10323

Zobrazit plný text záznamu

Report

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Autor: Yang, Jian, Yin, Dacheng, Zhou, Yizhou, Rao, Fengyun, Zhai, Wei, Cao, Yang, Zha, Zheng-Jun

Recent advancements in multi-modal large language models have propelled the development of joint probabilistic models capable of both image understanding and generation. However, we have identified that recent methods inevitably suffer from loss of i

Externí odkaz: http://arxiv.org/abs/2410.10798

Zobrazit plný text záznamu

Report

VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection

Autor: Deng, Huilin, Luo, Hongchen, Zhai, Wei, Cao, Yang, Kang, Yu

Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects by establishing feature mapping between textual prompts and inspection images, demonstrating excellent research value in flexible industrial manufactur

Externí odkaz: http://arxiv.org/abs/2409.20146

Zobrazit plný text záznamu

Report

Grounding 3D Scene Affordance From Egocentric Interactions

Autor: Liu, Cuiyu, Zhai, Wei, Yang, Yuhang, Luo, Hongchen, Liang, Sen, Cao, Yang, Zha, Zheng-Jun

Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances ba

Externí odkaz: http://arxiv.org/abs/2409.19650

Zobrazit plný text záznamu

Report

PEAR: Phrase-Based Hand-Object Interaction Anticipation

Autor: Zhang, Zichen, Luo, Hongchen, Zhai, Wei, Cao, Yang, Kang, Yu

First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-robot collaboration. The complete

Externí odkaz: http://arxiv.org/abs/2407.21510

Zobrazit plný text záznamu

Report

CrysToGraph: A Comprehensive Predictive Model for Crystal Materials Properties and the Benchmark

Autor: Wang, Hongyi, Sun, Ji, Liang, Jinzhe, Zhai, Li, Tang, Zitian, Li, Zijian, Zhai, Wei, Wang, Xusheng, Gao, Weihao, Gong, Sheng

The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exot

Externí odkaz: http://arxiv.org/abs/2407.16131

Zobrazit plný text záznamu

Report

EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views

Autor: Yang, Yuhang, Zhai, Wei, Wang, Chengfeng, Yu, Chengjun, Cao, Yang, Zha, Zheng-Jun

Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction

Externí odkaz: http://arxiv.org/abs/2405.13659

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání