Výsledky vyhledávání

Report

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Autor: Wan, Jianqiang, Song, Sibo, Yu, Wenwen, Liu, Yuliang, Cheng, Wenqing, Huang, Fei, Bai, Xiang, Yao, Cong, Yang, Zhibo

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-bas

Externí odkaz: http://arxiv.org/abs/2403.19128

Zobrazit plný text záznamu

Report

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

Autor: Yang, Zhibo, Long, Rujiao, Wang, Pengfei, Song, Sibo, Zhong, Humen, Cheng, Wenqing, Bai, Xiang, Yao, Cong

Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However,

Externí odkaz: http://arxiv.org/abs/2303.13095

Zobrazit plný text záznamu

Report

Vision-Language Pre-Training for Boosting Scene Text Detectors

Autor: Song, Sibo, Wan, Jianqiang, Yang, Zhibo, Tang, Jun, Cheng, Wenqing, Bai, Xiang, Yao, Cong

Recently, vision-language joint representation learning has proven to be highly effective in various scenarios. In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-mod

Externí odkaz: http://arxiv.org/abs/2204.13867

Zobrazit plný text záznamu

Report

Deep Adaptive Temporal Pooling for Activity Recognition

Autor: Song, Sibo, Cheung, Ngai-Man, Chandrasekhar, Vijay, Mandal, Bappaditya

Deep neural networks have recently achieved competitive accuracy for human activity recognition. However, there is room for improvement, especially in modeling long-term temporal importance and determining the activity relevance of different temporal

Externí odkaz: http://arxiv.org/abs/1808.07272

Zobrazit plný text záznamu

Report

Defense Against Adversarial Attacks with Saak Transform

Autor: Song, Sibo, Chen, Yueru, Cheung, Ngai-Man, Kuo, C. -C. Jay

Deep neural networks (DNNs) are known to be vulnerable to adversarial perturbations, which imposes a serious threat to DNN-based decision systems. In this paper, we propose to apply the lossy Saak transform to adversarially perturbed images as a prep

Externí odkaz: http://arxiv.org/abs/1808.01785

Zobrazit plný text záznamu

Report

Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Autor: Wang, Zhe, Kuan, Kingsley, Ravaut, Mathieu, Manek, Gaurav, Song, Sibo, Fang, Yuan, Kim, Seokhwan, Chen, Nancy, D'Haro, Luis Fernando, Tuan, Luu Anh, Zhu, Hongyuan, Zeng, Zeng, Cheung, Ngai Man, Piliouras, Georgios, Lin, Jie, Chandrasekhar, Vijay

The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond

Externí odkaz: http://arxiv.org/abs/1706.05461

Zobrazit plný text záznamu

Report

On Classification of Distorted Images with Deep Convolutional Neural Networks

Autor: Zhou, Yiren, Song, Sibo, Cheung, Ngai-Man

Image blur and image noise are common distortions during image acquisition. In this paper, we systematically study the effect of image distortions on the deep neural network (DNN) image classifiers. First, we examine the DNN classifier performance un

Externí odkaz: http://arxiv.org/abs/1701.01924

Zobrazit plný text záznamu

Report

Egocentric Activity Recognition with Multimodal Fisher Vector

Autor: Song, Sibo, Cheung, Ngai-Man, Chandrasekhar, Vijay, Mandal, Bappaditya, Lin, Jie

With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently. In this paper, we build a Multimodal Egocentric Activity dataset which includes egocentric videos and sensor data

Externí odkaz: http://arxiv.org/abs/1601.06603

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání