Zobrazeno 1 - 10
of 47
pro vyhledávání: '"Song, Sibo"'
Autor:
Wan, Jianqiang, Song, Sibo, Yu, Wenwen, Liu, Yuliang, Cheng, Wenqing, Huang, Fei, Bai, Xiang, Yao, Cong, Yang, Zhibo
Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-bas
Externí odkaz:
http://arxiv.org/abs/2403.19128
Autor:
Yang, Zhibo, Long, Rujiao, Wang, Pengfei, Song, Sibo, Zhong, Humen, Cheng, Wenqing, Bai, Xiang, Yao, Cong
Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However,
Externí odkaz:
http://arxiv.org/abs/2303.13095
Recently, vision-language joint representation learning has proven to be highly effective in various scenarios. In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-mod
Externí odkaz:
http://arxiv.org/abs/2204.13867
Deep neural networks have recently achieved competitive accuracy for human activity recognition. However, there is room for improvement, especially in modeling long-term temporal importance and determining the activity relevance of different temporal
Externí odkaz:
http://arxiv.org/abs/1808.07272
Deep neural networks (DNNs) are known to be vulnerable to adversarial perturbations, which imposes a serious threat to DNN-based decision systems. In this paper, we propose to apply the lossy Saak transform to adversarially perturbed images as a prep
Externí odkaz:
http://arxiv.org/abs/1808.01785
Autor:
Wang, Zhe, Kuan, Kingsley, Ravaut, Mathieu, Manek, Gaurav, Song, Sibo, Fang, Yuan, Kim, Seokhwan, Chen, Nancy, D'Haro, Luis Fernando, Tuan, Luu Anh, Zhu, Hongyuan, Zeng, Zeng, Cheung, Ngai Man, Piliouras, Georgios, Lin, Jie, Chandrasekhar, Vijay
The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond
Externí odkaz:
http://arxiv.org/abs/1706.05461
Image blur and image noise are common distortions during image acquisition. In this paper, we systematically study the effect of image distortions on the deep neural network (DNN) image classifiers. First, we examine the DNN classifier performance un
Externí odkaz:
http://arxiv.org/abs/1701.01924
With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently. In this paper, we build a Multimodal Egocentric Activity dataset which includes egocentric videos and sensor data
Externí odkaz:
http://arxiv.org/abs/1601.06603
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.