Zobrazeno 1 - 10
of 2 052
pro vyhledávání: '"An, Sukyung"'
Creating high-quality, large-scale datasets for large language models (LLMs) often relies on resource-intensive, GPU-accelerated models for quality filtering, making the process time-consuming and costly. This dependence on GPUs limits accessibility
Externí odkaz:
http://arxiv.org/abs/2411.11289
The Open Ko-LLM Leaderboard has been instrumental in benchmarking Korean Large Language Models (LLMs), yet it has certain limitations. Notably, the disconnect between quantitative improvements on the overly academic leaderboard benchmarks and the qua
Externí odkaz:
http://arxiv.org/abs/2410.12445
The rapid advancement of large language models (LLMs) has highlighted the need for robust evaluation frameworks that assess their core capabilities, such as reasoning, knowledge, and commonsense, leading to the inception of certain widely-used benchm
Externí odkaz:
http://arxiv.org/abs/2410.04795
Autor:
Park, Chanjun, Ha, Hyunsoo, Kim, Jihoo, Kim, Yungi, Kim, Dahyun, Lee, Sukyung, Yang, Seonghoon
In this paper, we propose the 1 Trillion Token Platform (1TT Platform), a novel framework designed to facilitate efficient data sharing with a transparent and equitable profit-sharing mechanism. The platform fosters collaboration between data contrib
Externí odkaz:
http://arxiv.org/abs/2409.20149
With the increasing demand for substantial amounts of high-quality data to train large language models (LLMs), efficiently filtering large web corpora has become a critical challenge. For this purpose, KenLM, a lightweight n-gram-based language model
Externí odkaz:
http://arxiv.org/abs/2409.09613
Autor:
Park, Chanjun, Kim, Hyeonwoo, Kim, Dahyun, Cho, Seonghwan, Kim, Sanghoon, Lee, Sukyung, Kim, Yungi, Lee, Hwalsuk
This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust ev
Externí odkaz:
http://arxiv.org/abs/2405.20574
To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core. Easy addition of custom pr
Externí odkaz:
http://arxiv.org/abs/2403.19340
Autor:
Kim, Dahyun, Park, Chanjun, Kim, Sanghoon, Lee, Wonsung, Song, Wonho, Kim, Yunsu, Kim, Hyeonwoo, Kim, Yungi, Lee, Hyeonju, Kim, Jihoo, Ahn, Changbae, Yang, Seonghoon, Lee, Sukyung, Park, Hyunbyung, Gim, Gyoungjin, Cha, Mikyoung, Lee, Hwalsuk, Kim, Sunghun
We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method f
Externí odkaz:
http://arxiv.org/abs/2312.15166
Autor:
Ilkoo Ahn, Younghwa Baek, Bok-Nam Seo, Su Eun Lim, Kyoungsik Jung, Ho Seok Kim, Jeongkyun Kim, Sukyung Lee, Siwoo Lee
Publikováno v:
Scientific Reports, Vol 14, Iss 1, Pp 1-12 (2024)
Abstract Biological age is an indicator of whether an individual is experiencing rapid, slowing, or normal aging. Perceived age is highly correlated with biological age, which reflects health appraisal and is often used as a clinical marker of aging.
Externí odkaz:
https://doaj.org/article/cbb93aa2991347b38263939b60eadd09
Autor:
Hanieh Tajdozian, Hoonhee Seo, Yoonkyoung Jeong, Fatemeh Ghorbanian, Chae-eun Park, Faezeh Sarafraz, Md Abdur Rahim, Youngkyoung Lee, Sukyung Kim, Saebim Lee, Jung-Hyun Ju, Chul-Ho Kim, Ho-Yeon Song
Publikováno v:
Annals of Microbiology, Vol 74, Iss 1, Pp 1-20 (2024)
Abstract Background Antimicrobial resistance is considered one of the greatest threats to human health, according to the World Health Organization (WHO). Gram-negative bacteria, especially carbapenem-resistant Enterobacteriaceae (CRE), have become a
Externí odkaz:
https://doaj.org/article/e982c0f66eda48f191e7fc8586ee2dc7