Zobrazeno 1 - 10
of 17 689
pro vyhledávání: '"Pinter, A."'
Using language models as a remote service entails sending private information to an untrusted provider. In addition, potential eavesdroppers can intercept the messages, thereby exposing the information. In this work, we explore the prospects of avoid
Externí odkaz:
http://arxiv.org/abs/2407.01334
Autor:
Szalontai, Balázs, Szalay, Gergő, Márton, Tamás, Sike, Anna, Pintér, Balázs, Gregorics, Tibor
Recently, there has been increasing activity in using deep learning for software engineering, including tasks like code generation and summarization. In particular, the most recent coding Large Language Models seem to perform well on these problems.
Externí odkaz:
http://arxiv.org/abs/2405.19032
Autor:
Gudarzi, Mohsen Moazzami, Slizovskiy, Sergey, Mao, Boyang, Tóvári, Endre, Pinter, Gergo, Sanderson, David, Asaad, Maryana, Xiang, Ying, Wang, Zhiyuan, Guo, Jianqiang, Spencer, Ben F., Geim, Alexandra A., Fal'ko, Vladimir I., Kretinin, Andrey V.
Understanding and controlling the electrical properties of solution-processed 2D materials is key to further printed electronics progress. Here we demonstrate that the thermolysis of the aromatic intercalants utilized in nanosheet exfoliation for gra
Externí odkaz:
http://arxiv.org/abs/2404.17738
Autor:
Batsuren, Khuyagbaatar, Vylomova, Ekaterina, Dankers, Verna, Delgerbaatar, Tsetsuukhei, Uzan, Omri, Pinter, Yuval, Bella, Gábor
The popular subword tokenizers of current language models, such as Byte-Pair Encoding (BPE), are known not to respect morpheme boundaries, which affects the downstream performance of the models. While many improved tokenization algorithms have been p
Externí odkaz:
http://arxiv.org/abs/2404.13292
We explore threshold vocabulary trimming in Byte-Pair Encoding subword tokenization, a postprocessing step that replaces rare subwords with their component subwords. The technique is available in popular tokenization libraries but has not been subjec
Externí odkaz:
http://arxiv.org/abs/2404.00397
Autor:
Naselli, Gabriele, Frank, György, Varjas, Dániel, Fulga, Ion Cosma, Pintér, Gergő, Pályi, András, Könye, Viktor
Changes in the number of Weyl nodes in Weyl semimetals occur through merging processes, usually involving a pair of oppositely charged nodes. More complicated processes involving multiple Weyl nodes are also possible, but they typically require fine
Externí odkaz:
http://arxiv.org/abs/2403.08518
Autor:
Cherf, Carinne, Pinter, Yuval
Neural machine translation (NMT) has progressed rapidly in the past few years, promising improvements and quality translations for different languages. Evaluation of this task is crucial to determine the quality of the translation. Overall, insuffici
Externí odkaz:
http://arxiv.org/abs/2403.03521
While subword tokenizers such as BPE and WordPiece are typically used to build vocabularies for NLP models, the method of decoding text into a sequence of tokens from these vocabularies is often left unspecified, or ill-suited to the method in which
Externí odkaz:
http://arxiv.org/abs/2403.01289
Autor:
Schmidt, Craig W., Reddy, Varshini, Zhang, Haoran, Alameddine, Alec, Uzan, Omri, Pinter, Yuval, Tanner, Chris
Tokenization is a foundational step in Natural Language Processing (NLP) tasks, bridging raw text and language models. Existing tokenization approaches like Byte-Pair Encoding (BPE) originate from the field of data compression, and it has been sugges
Externí odkaz:
http://arxiv.org/abs/2402.18376
Autor:
Schneider, Nadav, Hasabnis, Niranjan, Vo, Vy A., Kadosh, Tal, Krien, Neva, Capotă, Mihai, Tamir, Guy, Willke, Ted, Ahmed, Nesreen, Pinter, Yuval, Mattson, Timothy, Oren, Gal
The imperative need to scale computation across numerous nodes highlights the significance of efficient parallel computing, particularly in the realm of Message Passing Interface (MPI) integration. The challenging parallel programming task of generat
Externí odkaz:
http://arxiv.org/abs/2402.09126