Výsledky vyhledávání

Report

MammothModa: Multi-Modal Large Language Model

Autor: She, Qi, Pan, Junwen, Wan, Xin, Zhang, Rui, Lu, Dawei, Huang, Kai

In this report, we introduce MammothModa, yet another multi-modal large language model (MLLM) designed to achieve state-of-the-art performance starting from an elementary baseline. We focus on three key design insights: (i) Integrating Visual Capabil

Externí odkaz: http://arxiv.org/abs/2406.18193

Zobrazit plný text záznamu

Report

PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs

Autor: Shen, Zhengyang, Hong, Tao, She, Qi, Ma, Jinwen, Lin, Zhouchen

Steerable models can provide very general and flexible equivariance by formulating equivariance requirements in the language of representation theory and feature fields, which has been recognized to be effective for many vision tasks. However, derivi

Externí odkaz: http://arxiv.org/abs/2208.03720

Zobrazit plný text záznamu

Report

On Learning Contrastive Representations for Learning with Noisy Labels

Autor: Yi, Li, Liu, Sheng, She, Qi, McLeod, A. Ian, Wang, Boyu

Deep neural networks are able to memorize noisy labels easily with a softmax cross-entropy (CE) loss. Previous studies attempted to address this issue focus on incorporating a noise-robust loss function to the CE loss. However, the memorization issue

Externí odkaz: http://arxiv.org/abs/2203.01785

Zobrazit plný text záznamu

Report

Weakly Supervised Object Localization as Domain Adaption

Autor: Zhu, Lei, She, Qi, Chen, Qian, You, Yunfei, Wang, Boyu, Lu, Yanye

Weakly supervised object localization (WSOL) focuses on localizing objects only with the supervision of image-level classification masks. Most previous WSOL methods follow the classification activation map (CAM) that localizes objects based on the cl

Externí odkaz: http://arxiv.org/abs/2203.01714

Zobrazit plný text záznamu

Report

Background-aware Classification Activation Map for Weakly Supervised Object Localization

Autor: Zhu, Lei, She, Qi, Chen, Qian, Meng, Xiangxi, Geng, Mufeng, Jin, Lujia, Jiang, Zhe, Qiu, Bin, You, Yunfei, Zhang, Yibao, Ren, Qiushi, Lu, Yanye

Weakly supervised object localization (WSOL) relaxes the requirement of dense annotations for object localization by using image-level classification masks to supervise its learning process. However, current WSOL methods suffer from excessive activat

Externí odkaz: http://arxiv.org/abs/2112.14379

Zobrazit plný text záznamu

Report

Learning from Temporal Gradient for Semi-supervised Action Recognition

Autor: Xiao, Junfei, Jing, Longlong, Zhang, Lin, He, Ju, She, Qi, Zhou, Zongwei, Yuille, Alan, Li, Yingwei

Semi-supervised video action recognition tends to enable deep neural networks to achieve remarkable performance even with very limited labeled data. However, existing methods are mainly transferred from current image-based methods (e.g., FixMatch). W

Externí odkaz: http://arxiv.org/abs/2111.13241

Zobrazit plný text záznamu

Report

TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding

Autor: Wang, Zhengwei, She, Qi, Smolic, Aljosa

Most of existing video action recognition models ingest raw RGB frames. However, the raw video stream requires enormous storage and contains significant temporal redundancy. Video compression (e.g., H.264, MPEG-4) reduces superfluous information by r

Externí odkaz: http://arxiv.org/abs/2110.08814

Zobrazit plný text záznamu

Report

3rd Place Solution to Google Landmark Recognition Competition 2021

Autor: Xu, Cheng, Wang, Weimin, Liu, Shuai, Wang, Yong, Tang, Yuxiang, Bian, Tianling, Yan, Yanyu, She, Qi, Yang, Cheng

In this paper, we show our solution to the Google Landmark Recognition 2021 Competition. Firstly, embeddings of images are extracted via various architectures (i.e. CNN-, Transformer- and hybrid-based), which are optimized by ArcFace loss. Then we ap

Externí odkaz: http://arxiv.org/abs/2110.02794

Zobrazit plný text záznamu

Report

MT-ORL: Multi-Task Occlusion Relationship Learning

Autor: Feng, Panhe, She, Qi, Zhu, Lei, Li, Jiaxin, Zhang, Lin, Feng, Zijian, Wang, Changhu, Li, Chunpeng, Kang, Xuejing, Ming, Anlong

Retrieving occlusion relation among objects in a single image is challenging due to sparsity of boundaries in image. We observe two key issues in existing works: firstly, lack of an architecture which can exploit the limited amount of coupling in the

Externí odkaz: http://arxiv.org/abs/2108.05722

Zobrazit plný text záznamu

Report

Unifying Nonlocal Blocks for Neural Networks

Autor: Zhu, Lei, She, Qi, Li, Duo, Lu, Yanye, Kang, Xuejing, Hu, Jie, Wang, Changhu

The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performance, they still lack the mechanism to encode the rich, structured information among elemen

Externí odkaz: http://arxiv.org/abs/2108.02451

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání