Výsledky vyhledávání

Report

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

Autor: Ding, Yanbo, Zhuang, Shaobin, Li, Kunchang, Yue, Zhengrong, Qiao, Yu, Wang, Yali

Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, f

Externí odkaz: http://arxiv.org/abs/2408.10605

Zobrazit plný text záznamu

Report

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

Autor: Pei, Baoqi, Chen, Guo, Xu, Jilan, He, Yuping, Liu, Yicheng, Pan, Kanghua, Huang, Yifei, Wang, Yali, Lu, Tong, Wang, Limin, Qiao, Yu

In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulo

Externí odkaz: http://arxiv.org/abs/2406.18070

Zobrazit plný text záznamu

Report

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data ai

Externí odkaz: http://arxiv.org/abs/2406.08418

Zobrazit plný text záznamu

Report

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks te

Externí odkaz: http://arxiv.org/abs/2404.16006

Zobrazit plný text záznamu

Report

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Autor: Huang, Yifei, Chen, Guo, Xu, Jilan, Zhang, Mingfang, Yang, Lijin, Pei, Baoqi, Zhang, Hongjie, Dong, Lu, Wang, Yali, Wang, Limin, Qiao, Yu

Being able to map the activities of others into one's own point of view is one fundamental human skill even from a very early age. Taking a step toward understanding this human ability, we introduce EgoExoLearn, a large-scale dataset that emulates th

Externí odkaz: http://arxiv.org/abs/2403.16182

Zobrazit plný text záznamu

Report

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue. Our core design is a progressive training approach that unifies th

Externí odkaz: http://arxiv.org/abs/2403.15377

Zobrazit plný text záznamu

Report

VideoMamba: State Space Model for Efficient Video Understanding

Autor: Li, Kunchang, Li, Xinhao, Wang, Yi, He, Yinan, Wang, Yali, Wang, Limin, Qiao, Yu

Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain. The proposed VideoMamba overcomes the limitations of existing 3D convolution neural networ

Externí odkaz: http://arxiv.org/abs/2403.06977

Zobrazit plný text záznamu

Report

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

Autor: Chen, Boyu, Chen, Siran, Li, Kunchang, Xu, Qinglin, Qiao, Yu, Wang, Yali

Open-world video recognition is challenging since traditional networks are not generalized well on complex environment variations. Alternatively, foundation models with rich knowledge have recently shown their generalization power. However, how to ap

Externí odkaz: http://arxiv.org/abs/2402.18951

Zobrazit plný text záznamu

Akademický článek

Causal Effect of Honorary Titles on Physicians’ Service Volumes in Online Health Communities: Retrospective Study

Autor: Yu, Haiyan, Wang, Yali, Wang, Jying-Nan, Chiu, Ya-Ling, Qiu, Hang, Gao, Mingyue

Publikováno v: Journal of Medical Internet Research, Vol 22, Iss 7, p e18527 (2020)

BackgroundAn OHC online health community (OHC) is an interactive platform for virtual communication between patients and physicians. Patients can typically search, seek, and share their experience and rate physicians, who may be involved in giving ad

Externí odkaz: https://doaj.org/article/61636ffcb8554de7b0b38c10c3519d8b

Zobrazit plný text záznamu

Plný text ve formátu HTML

Vyhledávací nástroje:

Upřesnit hledání