Výsledky vyhledávání

Report

Decoding BACnet Packets: A Large Language Model Approach for Packet Interpretation

Autor: Sharma, Rashi, Okada, Hiroyuki, Oba, Tatsumi, Subramanian, Karthikk, Yanai, Naoto, Pranata, Sugiri

The Industrial Control System (ICS) environment encompasses a wide range of intricate communication protocols, posing substantial challenges for Security Operations Center (SOC) analysts tasked with monitoring, interpreting, and addressing network ac

Externí odkaz: http://arxiv.org/abs/2407.15428

Zobrazit plný text záznamu

Report

Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

Autor: Antoniades, Antonis, Wang, Xinyi, Elazar, Yanai, Amayuelas, Alfonso, Albalak, Alon, Zhang, Kexun, Wang, William Yang

Despite the proven utility of large language models (LLMs) in real-world applications, there remains a lack of understanding regarding how they leverage their large-scale pretraining text corpora to achieve such capabilities. In this work, we investi

Externí odkaz: http://arxiv.org/abs/2407.14985

Zobrazit plný text záznamu

Report

Detection and Measurement of Syntactic Templates in Generated Text

Autor: Shaib, Chantal, Elazar, Yanai, Li, Junyi Jessy, Wallace, Byron C.

Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features. Here we offer an analysis of syntactic features to characterize general repetition in models, beyond frequent n-grams. Specifically, we define synta

Externí odkaz: http://arxiv.org/abs/2407.00211

Zobrazit plný text záznamu

Report

Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG

Autor: Merrill, William, Smith, Noah A., Elazar, Yanai

How novel are texts generated by language models (LMs) relative to their training corpora? In this work, we investigate the extent to which modern LMs generate $n$-grams from their training data, evaluating both (i) the probability LMs assign to comp

Externí odkaz: http://arxiv.org/abs/2406.13069

Zobrazit plný text záznamu

Report

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Autor: Iluz, Bar, Elazar, Yanai, Yehudai, Asaf, Stanovsky, Gabriel

Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applicat

Externí odkaz: http://arxiv.org/abs/2406.00787

Zobrazit plný text záznamu

Report

A Survey on Data Selection for Language Models

Autor: Albalak, Alon, Elazar, Yanai, Xie, Sang Michael, Longpre, Shayne, Lambert, Nathan, Wang, Xinyi, Muennighoff, Niklas, Hou, Bairu, Pan, Liangming, Jeong, Haewon, Raffel, Colin, Chang, Shiyu, Hashimoto, Tatsunori, Wang, William Yang

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the qualit

Externí odkaz: http://arxiv.org/abs/2402.16827

Zobrazit plný text záznamu

Report

Calibrating Large Language Models with Sample Consistency

Autor: Lyu, Qing, Shridhar, Kumar, Malaviya, Chaitanya, Zhang, Li, Elazar, Yanai, Tandon, Niket, Apidianaki, Marianna, Sachan, Mrinmaya, Callison-Burch, Chris

Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional calibration techniques due to their proprietary nat

Externí odkaz: http://arxiv.org/abs/2402.13904

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání