Zobrazeno 1 - 10
of 281
pro vyhledávání: '"Rumshisky, A."'
The ability of large language models (LLMs) to $``$learn in context$"$ based on the provided prompt has led to an explosive growth in their use, culminating in the proliferation of AI assistants such as ChatGPT, Claude, and Bard. These AI assistants
Externí odkaz:
http://arxiv.org/abs/2404.02054
Large language models can solve new tasks without task-specific fine-tuning. This ability, also known as in-context learning (ICL), is considered an emergent ability and is primarily seen in large language models with billions of parameters. This stu
Externí odkaz:
http://arxiv.org/abs/2404.02204
Autor:
Qiang, Yao, Nandi, Subhrangshu, Mehrabi, Ninareh, Steeg, Greg Ver, Kumar, Anoop, Rumshisky, Anna, Galstyan, Aram
Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classifica
Externí odkaz:
http://arxiv.org/abs/2402.15833
While recent advances have boosted LM proficiency in linguistic benchmarks, LMs consistently struggle to reason correctly on complex tasks like mathematics. We turn to Reinforcement Learning from Human Feedback (RLHF) as a method with which to shape
Externí odkaz:
http://arxiv.org/abs/2311.05821
Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparameterized models remains poorly understood, while training costs grow exponentially. In this paper
Externí odkaz:
http://arxiv.org/abs/2307.05695
Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from
Externí odkaz:
http://arxiv.org/abs/2306.08756
In recent years, language models have drastically grown in size, and the abilities of these models have been shown to improve with scale. The majority of recent scaling laws studies focused on high-compute high-parameter count settings, leaving the q
Externí odkaz:
http://arxiv.org/abs/2305.17266
Publikováno v:
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)
Scaling up weakly-supervised datasets has shown to be highly effective in the image-text domain and has contributed to most of the recent state-of-the-art computer vision and multimodal neural networks. However, existing large-scale video-text datase
Externí odkaz:
http://arxiv.org/abs/2304.02080
Language model probing is often used to test specific capabilities of models. However, conclusions from such studies may be limited when the probing benchmarks are small and lack statistical power. In this work, we introduce new, larger datasets for
Externí odkaz:
http://arxiv.org/abs/2303.16445
This paper presents a systematic overview of parameter-efficient fine-tuning methods, covering over 50 papers published between early 2019 and mid-2024. These methods aim to address the challenges of fine-tuning large language models by training only
Externí odkaz:
http://arxiv.org/abs/2303.15647