Zobrazeno 1 - 10
of 32
pro vyhledávání: '"GRALIńSKI, FILIP"'
Though large language models (LLMs) have demonstrated exceptional performance across numerous problems, their application to predictive tasks in relational databases remains largely unexplored. In this work, we address the notion that LLMs cannot yie
Externí odkaz:
http://arxiv.org/abs/2411.11829
We present a new method to detect anomalies in texts (in general: in sequences of any data), using language models, in a totally unsupervised manner. The method considers probabilities (likelihoods) generated by a language model, but instead of focus
Externí odkaz:
http://arxiv.org/abs/2409.03046
Autor:
Dzienisiewicz, Daniel, Graliński, Filip, Jabłoński, Piotr, Kubis, Marek, Skórzewski, Paweł, Wierzchoń, Piotr
This paper presents the POLygraph dataset, a unique resource for fake news detection in Polish. The dataset, created by an interdisciplinary team, is composed of two parts: the "fake-or-not" dataset with 11,360 pairs of news articles (identified by t
Externí odkaz:
http://arxiv.org/abs/2407.01393
This paper discusses two approaches to the diachronic normalization of Polish texts: a rule-based solution that relies on a set of handcrafted patterns, and a neural normalization model based on the text-to-text transfer transformer architecture. The
Externí odkaz:
http://arxiv.org/abs/2402.01300
In recent years, the field of document understanding has progressed a lot. A significant part of this progress has been possible thanks to the use of language models pretrained on large amounts of documents. However, pretraining corpora used in the d
Externí odkaz:
http://arxiv.org/abs/2304.14953
Autor:
Stanisławek, Tomasz, Graliński, Filip, Wróblewska, Anna, Lipiński, Dawid, Kaliska, Agnieszka, Rosalska, Paulina, Topolski, Bartosz, Biecek, Przemysław
Publikováno v:
International Conference on Document Analysis and Recognition ICDAR 2021
The relevance of the Key Information Extraction (KIE) task is increasingly important in natural language processing problems. But there are still only a few well-defined problems that serve as benchmarks for solutions in this area. To bridge this gap
Externí odkaz:
http://arxiv.org/abs/2105.05796
This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introdu
Externí odkaz:
http://arxiv.org/abs/2011.03228
The paper presents a novel method of finding a fragment in a long temporal sequence similar to the set of shorter sequences. We are the first to propose an algorithm for such a search that does not rely on computing the average sequence from query ex
Externí odkaz:
http://arxiv.org/abs/2010.14464
We propose a differentiable successive halving method of relaxing the top-k operator, rendering gradient-based optimization possible. The need to perform softmax iteratively on the entire vector of scores is avoided by using a tournament-style select
Externí odkaz:
http://arxiv.org/abs/2010.15552
In this paper, we investigate the Dual-source Transformer architecture on the WikiReading information extraction and machine reading comprehension dataset. The proposed model outperforms the current state-of-the-art by a large margin. Next, we introd
Externí odkaz:
http://arxiv.org/abs/2006.08281