Best Practices for Text Annotation with Large Language Models

Autor:	Petter Törnberg
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	text labeling classification data annotation large language models text-as-data Social Sciences Sociology (General) HM401-1281
Zdroj:	Sociologica, Vol 18, Iss 2, Pp 67-85 (2024)
Druh dokumentu:	article
ISSN:	1971-8853
DOI:	10.6092/issn.1971-8853/19461
Popis:	Large Language Models (LLMs) have ushered in a new era of text annotation, as their ease-of-use, high accuracy, and relatively low costs have meant that their use has exploded in recent months. However, the rapid growth of the field has meant that LLM-based annotation has become something of an academic Wild West: the lack of established practices and standards has led to concerns about the quality and validity of research. Researchers have warned that the ostensible simplicity of LLMs can be misleading, as they are prone to bias, misunderstandings, and unreliable results. Recognizing the transformative potential of LLMs, this essay proposes a comprehensive set of standards and best practices for their reliable, reproducible, and ethical use. These guidelines span critical areas such as model selection, prompt engineering, structured prompting, prompt stability analysis, rigorous model validation, and the consideration of ethical and legal implications. The essay emphasizes the need for a structured, directed, and formalized approach to using LLMs, aiming to ensure the integrity and robustness of text annotation practices, and advocates for a nuanced and critical engagement with LLMs in social scientific research.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/3ec44d709af14213aa1128732441e612 Zobrazit plný text záznamu View record in DOAJ