Zobrazeno 1 - 10
of 27 719
pro vyhledávání: '"Document Classification"'
Long document classification presents challenges in capturing both local and global dependencies due to their extensive content and complex structure. Existing methods often struggle with token limits and fail to adequately model hierarchical relatio
Externí odkaz:
http://arxiv.org/abs/2410.02930
Autor:
Xia, Bolun "Namir", Gupta, Aparna, Zaki, Mohammed J.
The advent of large language models (LLMs) has initiated much research into their various financial applications. However, in applying LLMs on long documents, semantic relations are not explicitly incorporated, and a full or arbitrarily sparse attent
Externí odkaz:
http://arxiv.org/abs/2410.02024
Autor:
Hossain, Elias, Nuzhat, Tasfia, Masum, Shamsul, Rahimi, Shahram, Mittal, Sudip, Golilarz, Noorbakhsh Amiri
Accurate classification of cancer-related medical abstracts is crucial for healthcare management and research. However, obtaining large, labeled datasets in the medical domain is challenging due to privacy concerns and the complexity of clinical data
Externí odkaz:
http://arxiv.org/abs/2410.15198
Detecting out-of-distribution (OOD) data is crucial in machine learning applications to mitigate the risk of model overconfidence, thereby enhancing the reliability and safety of deployed systems. The majority of existing OOD detection methods predom
Externí odkaz:
http://arxiv.org/abs/2408.11237
Large semantic knowledge bases are grounded in factual knowledge. However, recent approaches to dense text representations (i.e. embeddings) do not efficiently exploit these resources. Dense and robust representations of documents are essential for e
Externí odkaz:
http://arxiv.org/abs/2408.09794
Long Document Classification (LDC) has gained significant attention recently. However, multi-modal data in long documents such as texts and images are not being effectively utilized. Prior studies in this area have attempted to integrate texts and im
Externí odkaz:
http://arxiv.org/abs/2407.10105
Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption. While existing state-of-the-art (SOTA) models segment long texts into equal-length snippets (e.g., 128 tokens per snippet) or deploy sparse att
Externí odkaz:
http://arxiv.org/abs/2405.07052
Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all toke
Externí odkaz:
http://arxiv.org/abs/2406.01283
The International Classification of Diseases (ICD) is an authoritative medical classification system of different diseases and conditions for clinical and management purposes. ICD indexing assigns a subset of ICD codes to a medical record. Since huma
Externí odkaz:
http://arxiv.org/abs/2405.19084
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.