Výsledky vyhledávání - "Huang, Shiyuan"

Report

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

Autor: Ma, Jiawei, Niu, Yulei, Huang, Shiyuan, Han, Guangxing, Chang, Shih-Fu

Language has been useful in extending the vision encoder to data from diverse distributions without empirical discovery in training domains. However, as the image description is mostly at coarse-grained level and ignores visual details, the resulted

Externí odkaz: http://arxiv.org/abs/2405.18405

Zobrazit plný text záznamu

Dissertation/ Thesis

A General Framework for Model Adaptation to Meet Practical Constraints in Computer Vision

Autor: Huang, Shiyuan

Recent advances in deep learning models have shown impressive capabilities in various computer vision tasks, which encourages the integration of these models into real-world vision systems such as smart devices. This integration presents new challeng

Zobrazit plný text záznamu

Report

Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

Autor: Huang, Shiyuan, Mamidanna, Siddarth, Jangam, Shreedhar, Zhou, Yilun, Gilpin, Leilani H.

Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks including sentiment analysis, mathematical reasoning and summarization. Furthermore, since these models are in

Externí odkaz: http://arxiv.org/abs/2310.11207

Zobrazit plný text záznamu

Report

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Autor: Lin, Han, Han, Guangxing, Ma, Jiawei, Huang, Shiyuan, Lin, Xudong, Chang, Shih-Fu

Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features. However, under few-shot learning (FSL) settings on small datasets with only a f

Externí odkaz: http://arxiv.org/abs/2303.15466

Zobrazit plný text záznamu

Report

DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

Autor: Ma, Jiawei, Niu, Yulei, Xu, Jincheng, Huang, Shiyuan, Han, Guangxing, Chang, Shih-Fu

Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data. Existing approaches enhance few-shot generalization with the sacrifice of base-class

Externí odkaz: http://arxiv.org/abs/2303.09674

Zobrazit plný text záznamu

Report

TempCLR: Temporal Alignment Representation with Contrastive Learning

Autor: Yang, Yuncong, Ma, Jiawei, Huang, Shiyuan, Chen, Long, Lin, Xudong, Han, Guangxing, Chang, Shih-Fu

Video representation learning has been successful in video-text pre-training for zero-shot transfer, where each sentence is trained to be close to the paired video clips in a common feature space. For long videos, given a paragraph of description whe

Externí odkaz: http://arxiv.org/abs/2212.13738

Zobrazit plný text záznamu

Report

Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy

Autor: Huang, Shiyuan, Piramuthu, Robinson, Chang, Shih-Fu, Sigurdsson, Gunnar A.

In Video Question Answering (VideoQA), answering general questions about a video requires its visual information. Yet, video often contains redundant information irrelevant to the VideoQA task. For example, if the task is only to answer questions sim

Externí odkaz: http://arxiv.org/abs/2210.08391

Zobrazit plný text záznamu

Report

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

Autor: Lin, Xudong, Tiwari, Simran, Huang, Shiyuan, Li, Manling, Shou, Mike Zheng, Ji, Heng, Chang, Shih-Fu

Multi-channel video-language retrieval require models to understand information from different channels (e.g. video$+$question, video$+$speech) to correctly link a video with a textual response or query. Fortunately, contrastive multimodal models are

Externí odkaz: http://arxiv.org/abs/2206.02082

Zobrazit plný text záznamu

Report

Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting

Autor: Han, Guangxing, Chen, Long, Ma, Jiawei, Huang, Shiyuan, Chellappa, Rama, Chang, Shih-Fu

We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection, which are complementary to each other by definition. Most of the previous works on multi-modal FSOD

Externí odkaz: http://arxiv.org/abs/2204.07841

Zobrazit plný text záznamu

Report

Few-Shot Object Detection with Fully Cross-Transformer

Autor: Han, Guangxing, Ma, Jiawei, Huang, Shiyuan, Chen, Long, Chang, Shih-Fu

Few-shot object detection (FSOD), with the aim to detect novel objects using very few training examples, has recently attracted great research interest in the community. Metric-learning based methods have been demonstrated to be effective for this ta

Externí odkaz: http://arxiv.org/abs/2203.15021

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání