Zobrazeno 1 - 10
of 3 565
pro vyhledávání: '"Masked language modeling"'
In this paper, we present a system that generates synthetic free-text medical records, such as discharge summaries, admission notes and doctor correspondences, using Masked Language Modeling (MLM). Our system is designed to preserve the critical info
Externí odkaz:
http://arxiv.org/abs/2409.09831
Guitar tablatures enrich the structure of traditional music notation by assigning each note to a string and fret of a guitar in a particular tuning, indicating precisely where to play the note on the instrument. The problem of generating tablature fr
Externí odkaz:
http://arxiv.org/abs/2408.05024
Autor:
Li, Yuchen, Kirchmeyer, Alexandre, Mehta, Aashay, Qin, Yilong, Dadachev, Boris, Papineni, Kishore, Kumar, Sanjiv, Risteski, Andrej
Autoregressive language models are the currently dominant paradigm for text generation, but they have some fundamental limitations that cannot be remedied by scale-for example inherently sequential and unidirectional generation. While alternate class
Externí odkaz:
http://arxiv.org/abs/2407.21046
Autor:
An, Seunghwan, Woo, Gyeongdong, Lim, Jaesung, Kim, ChangHyun, Hong, Sungchul, Jeon, Jong-June
In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Since the MLu performance depends on accurately approximating the conditional distributions, we focus on d
Externí odkaz:
http://arxiv.org/abs/2405.20602
Ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic enzymes often exhibit promiscuous substrate preferences that cannot be reduced to simple rules. Large language models are promising tools for predicting such peptid
Externí odkaz:
http://arxiv.org/abs/2402.15181
Autor:
Liang, Wen, Liang, Youzhi
BERT (Bidirectional Encoder Representations from Transformers) has revolutionized the field of natural language processing through its exceptional performance on numerous tasks. Yet, the majority of researchers have mainly concentrated on enhancement
Externí odkaz:
http://arxiv.org/abs/2401.15861
While (large) language models have significantly improved over the last years, they still struggle to sensibly process long sequences found, e.g., in books, due to the quadratic scaling of the underlying attention mechanism. To address this, we propo
Externí odkaz:
http://arxiv.org/abs/2402.17682
We present a fast and high-quality codec language model for parallel audio generation. While SoundStorm, a state-of-the-art parallel audio generation model, accelerates inference speed compared to autoregressive models, it still suffers from slow inf
Externí odkaz:
http://arxiv.org/abs/2401.01099
Publikováno v:
Communications in Computer and Information Science, vol. 1983, 450-463, Springer, 2023
Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in natural language processing (NLP) as it has in computer vision. In this paper, we propose a nove
Externí odkaz:
http://arxiv.org/abs/2401.01830
Publikováno v:
IEEE Access, Vol 12, Pp 14248-14259 (2024)
In this study, a steganography method based on BERT transformer model is proposed for hiding text data in cover text. The aim is to hide information by replacing specific words within the text using BERT’s masked language modeling (MLM) feature. In
Externí odkaz:
https://doaj.org/article/bf6dd50624b44765a1d534ddb0526317