Výsledky vyhledávání - "Zhang, Jianshu"

Report

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

Autor: Ma, Jiefeng, Wang, Yan, Liu, Chenyu, Du, Jun, Hu, Yu, Zhang, Zhenrong, Hu, Pengfei, Wang, Qing, Zhang, Jianshu

Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks

Externí odkaz: http://arxiv.org/abs/2406.08757

Zobrazit plný text záznamu

Report

Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

Autor: Pi, Renjie, Zhang, Jianshu, Zhang, Jipeng, Pan, Rui, Chen, Zhekai, Zhang, Tong

Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, image description datasets primarily originate from two sources. One

Externí odkaz: http://arxiv.org/abs/2406.07502

Zobrazit plný text záznamu

Report

CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay

Autor: Zhang, Jianshu, Fu, Yankai, Peng, Ziheng, Yao, Dongyu, He, Kun

This paper introduces a novel perspective to significantly mitigate catastrophic forgetting in continuous learning (CL), which emphasizes models' capacity to preserve existing knowledge and assimilate new information. Current replay-based methods tre

Externí odkaz: http://arxiv.org/abs/2402.01348

Zobrazit plný text záznamu

Report

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

Autor: Pi, Renjie, Han, Tianyang, Zhang, Jianshu, Xie, Yueqi, Pan, Rui, Lian, Qing, Dong, Hanze, Zhang, Jipeng, Zhang, Tong

The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates the novel challenge of defending MLLMs against such attacks. Compa

Externí odkaz: http://arxiv.org/abs/2401.02906

Zobrazit plný text záznamu

Report

FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models

Autor: Yao, Dongyu, Zhang, Jianshu, Harris, Ian G., Carlsson, Marcel

Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit meticulously crafted prompts to elicit content that violates service guidelines, have captured the attention of research communities. While model owners can defend against indiv

Externí odkaz: http://arxiv.org/abs/2309.05274

Zobrazit plný text záznamu

Report

Count, Decode and Fetch: A New Approach to Handwritten Chinese Character Error Correction

Autor: Hu, Pengfei, Ma, Jiefeng, Zhang, Zhenrong, Du, Jun, Zhang, Jianshu

Recently, handwritten Chinese character error correction has been greatly improved by employing encoder-decoder methods to decompose a Chinese character into an ideographic description sequence (IDS). However, existing methods implicitly capture and

Externí odkaz: http://arxiv.org/abs/2307.16253

Zobrazit plný text záznamu

Report

HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures

Autor: Ma, Jiefeng, Du, Jun, Hu, Pengfei, Zhang, Zhenrong, Zhang, Jianshu, Zhu, Huihui, Liu, Cong

The problem of document structure reconstruction refers to converting digital or scanned documents into corresponding semantic structures. Most existing works mainly focus on splitting the boundary of each element in a single document page, neglectin

Externí odkaz: http://arxiv.org/abs/2303.13839

Zobrazit plný text záznamu

Report

SEMv2: Table Separation Line Detection Based on Instance Segmentation

Autor: Zhang, Zhenrong, Hu, Pengfei, Ma, Jiefeng, Du, Jun, Zhang, Jianshu, Zhu, Huihui, Yin, Baocai, Yin, Bing, Liu, Cong

Table structure recognition is an indispensable element for enabling machines to comprehend tables. Its primary purpose is to identify the internal structure of a table. Nevertheless, due to the complexity and diversity of their structure and style,

Externí odkaz: http://arxiv.org/abs/2303.04384

Zobrazit plný text záznamu

Report

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

Autor: Hu, Pengfei, Zhang, Zhenrong, Zhang, Jianshu, Du, Jun, Wu, Jiajia

Table of contents (ToC) extraction aims to extract headings of different levels in documents to better understand the outline of the contents, which can be widely used for document understanding and information retrieval. Existing works often use han

Externí odkaz: http://arxiv.org/abs/2212.02896

Zobrazit plný text záznamu

Report

Multimodal Pre-training Based on Graph Attention Network for Document Understanding

Autor: Zhang, Zhenrong, Ma, Jiefeng, Du, Jun, Wang, Licheng, Zhang, Jianshu

Document intelligence as a relatively new research topic supports many business applications. Its main task is to automatically read, understand, and analyze documents. However, due to the diversity of formats (invoices, reports, forms, etc.) and lay

Externí odkaz: http://arxiv.org/abs/2203.13530

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání