Výsledky vyhledávání

Report

EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning

Autor: Ma, Mingjie, Yu, Zhihuan, Ma, Yichao, Li, Guohui

Visual Commonsense Reasoning (VCR) is a cognitive task, challenging models to answer visual questions requiring human commonsense, and to provide rationales explaining why the answers are correct. With emergence of Large Language Models (LLMs), it is

Externí odkaz: http://arxiv.org/abs/2404.13847

Zobrazit plný text záznamu

Report

LMD: Faster Image Reconstruction with Latent Masking Diffusion

Autor: Ma, Zhiyuan, yu, zhihuan, Li, Jianjun, Zhou, Bowen

As a class of fruitful approaches, diffusion probabilistic models (DPMs) have shown excellent advantages in high-resolution image reconstruction. On the other hand, masked autoencoders (MAEs), as popular self-supervised vision learners, have demonstr

Externí odkaz: http://arxiv.org/abs/2312.07971

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání