Výsledky vyhledávání - "Lei, Weixian"

Report

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Autor: Lin, Kevin Qinghong, Li, Linjie, Gao, Difei, Yang, Zhengyuan, Wu, Shiwei, Bai, Zechen, Lei, Weixian, Wang, Lijuan, Shou, Mike Zheng

Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. While most agents are language-based, relying on closed-source API with text-rich meta-information (e.g., HTML or accessibility tr

Externí odkaz: http://arxiv.org/abs/2411.17465

Zobrazit plný text záznamu

Report

ViT-Lens: Towards Omni-modal Representations

Autor: Lei, Weixian, Ge, Yixiao, Yi, Kun, Zhang, Jianfeng, Gao, Difei, Sun, Dylan, Ge, Yuying, Shan, Ying, Shou, Mike Zheng

Aiming to advance AI agents, large foundation models significantly improve reasoning and instruction execution, yet the current focus on vision and language neglects the potential of perceiving diverse modalities in open-world environments. However,

Externí odkaz: http://arxiv.org/abs/2311.16081

Zobrazit plný text záznamu

Report

ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights

Autor: Lei, Weixian, Ge, Yixiao, Zhang, Jianfeng, Sun, Dylan, Yi, Kun, Shan, Ying, Shou, Mike Zheng

Though the success of CLIP-based training recipes in vision-language models, their scalability to more modalities (e.g., 3D, audio, etc.) is limited to large-scale data, which is expensive or even inapplicable for rare modalities. In this paper, we p

Externí odkaz: http://arxiv.org/abs/2308.10185

Zobrazit plný text záznamu

Report

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Autor: Wu, Jay Zhangjie, Ge, Yixiao, Wang, Xintao, Lei, Weixian, Gu, Yuchao, Shi, Yufei, Hsu, Wynne, Shan, Ying, Qie, Xiaohu, Shou, Mike Zheng

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such paradigm is computationally expensive. In this work, we propose

Externí odkaz: http://arxiv.org/abs/2212.11565

Zobrazit plný text záznamu

Report

Learning to Learn: How to Continuously Teach Humans and Machines

Autor: Singh, Parantak, Li, You, Sikarwar, Ankur, Lei, Weixian, Gao, Daniel, Talbot, Morgan Bruce, Sun, Ying, Shou, Mike Zheng, Kreiman, Gabriel, Zhang, Mengmi

Curriculum design is a fundamental component of education. For example, when we learn mathematics at school, we build upon our knowledge of addition to learn multiplication. These and other concepts must be mastered before our first algebra lesson, w

Externí odkaz: http://arxiv.org/abs/2211.15470

Zobrazit plný text záznamu

Report

PCCT: Progressive Class-Center Triplet Loss for Imbalanced Medical Image Classification

Autor: Chen, Kanghao, Lei, Weixian, Zhang, Rong, Zhao, Shen, Zheng, Wei-shi, Wang, Ruixuan

Imbalanced training data is a significant challenge for medical image classification. In this study, we propose a novel Progressive Class-Center Triplet (PCCT) framework to alleviate the class imbalance issue particularly for diagnosis of rare diseas

Externí odkaz: http://arxiv.org/abs/2207.04793

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Learning to Learn: How to Continuously Teach Humans and Machines.

Autor: Singh P; Nanyang Technological University (NTU), Singapore.; CFAR and I2R, Agency for Science, Technology and Research, Singapore., Li Y; CFAR and I2R, Agency for Science, Technology and Research, Singapore.; University of Wisconsin-Madison, USA., Sikarwar A; Nanyang Technological University (NTU), Singapore.; CFAR and I2R, Agency for Science, Technology and Research, Singapore., Lei W; Show Lab, National University of Singapore, Singapore., Gao D; Show Lab, National University of Singapore, Singapore., Talbot MB; Boston Children's Hospital, Harvard Medical School, USA.; Harvard-MIT Health Sciences and Technology, MIT., Sun Y; CFAR and I2R, Agency for Science, Technology and Research, Singapore., Shou MZ; Show Lab, National University of Singapore, Singapore., Kreiman G; Boston Children's Hospital, Harvard Medical School, USA., Zhang M; Nanyang Technological University (NTU), Singapore.; CFAR and I2R, Agency for Science, Technology and Research, Singapore.

Publikováno v: ... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision [IEEE Int Conf Comput Vis Workshops] 2023 Oct; Vol. 2023, pp. 11674-11685. Date of Electronic Publication: 2024 Jan 15.

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání