Zobrazeno 1 - 10
of 181
pro vyhledávání: '"Jiang, Xinghua"'
Autor:
Liu, Chaohu, Yin, Kun, Cao, Haoyu, Jiang, Xinghua, Li, Xin, Liu, Yinsong, Jiang, Deqiang, Sun, Xing, Xu, Linli
Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document und
Externí odkaz:
http://arxiv.org/abs/2404.06918
Autor:
Li, Xin, Wu, Yunfei, Jiang, Xinghua, Guo, Zhihao, Gong, Mingming, Cao, Haoyu, Liu, Yinsong, Jiang, Deqiang, Sun, Xing
Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifi
Externí odkaz:
http://arxiv.org/abs/2402.19014
Bubble bursting on water surfaces is believed to be a main mechanism to produce submicron drops, including sea spray aerosols, which play a critical role in forming cloud and transferring various biological and chemical substances from water to the a
Externí odkaz:
http://arxiv.org/abs/2310.16551
We propose a method named AudioFormer,which learns audio feature representations through the acquisition of discrete acoustic codes and subsequently fine-tunes them for audio classification tasks. Initially,we introduce a novel perspective by conside
Externí odkaz:
http://arxiv.org/abs/2308.07221
Collaborative security assessment of cloud-edge-device distributed systems based on order parameters
Publikováno v:
网络与信息安全学报, Vol 10, Iss 3, Pp 38-51 (2024)
Distributed computing systems based on cloud-edge-device have been successfully serving thousands of applications and have become mainstream, characterized by a wide audience, high user experience requirements, and high security expectations. However
Externí odkaz:
https://doaj.org/article/769dda6cf8ce4a3282921393c177dcbc
Scene segmentation and classification (SSC) serve as a critical step towards the field of video structuring analysis. Intuitively, jointly learning of these two tasks can promote each other by sharing common information. However, scene segmentation c
Externí odkaz:
http://arxiv.org/abs/2207.01241
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of l
Externí odkaz:
http://arxiv.org/abs/2204.08227
Autor:
Zhang, Tao, Lu, Xiaohui, Zhang, Ruoyu, Jiang, Xinghua, Yang, Shanye, Ma, Xiewen, Gao, Qianqian, Wang, Xiaofei
Publikováno v:
In Journal of Environmental Sciences February 2025 148:298-305
Recently, Vision Transformers (ViT), with the self-attention (SA) as the de facto ingredients, have demonstrated great potential in the computer vision community. For the sake of trade-off between efficiency and performance, a group of works merely p
Externí odkaz:
http://arxiv.org/abs/2111.12994
Publikováno v:
In Chemical Engineering Journal 15 June 2024 490