Abstrakt: |
This study evaluates the effectiveness of Large Language Models (LLMs) for automated legal outcome extraction in a zero-shot setting. Two open-source LLMs were used - Meta-Llama3 (70B) which is considered state-of-the-art and a less performant Mixtral (8x7B) - comparing the accurracy of the data extracted. These models were selected for their accessibility using consumer-grade hardware, ensuring reproducibility of our research. The experiment utilized a dataset of 400 manually annotated decisions from French Courts of Appeal, spanning four categories: psychiatric commitment, undocumented immigrant detention, wrongful termination damages, and workplace harassment damages. For each decision, we extracted two critical data points: the trial court outcome and the appellate court ruling. Results demonstrate that Llama3 achieves exceptional accuracy (1 error in 100 documents) in data extraction when provided with domainspecific prompts in JSON format. Prompt engineering can yield highly accurate legal data extraction without requiring expensive model fine-tuning or access to proprietary state-of-the-art models. This research contributes to the growing body of evidence supporting LLMs as reliable tools for legal information extraction and offers practical insights for researchers to craft effective instructions for their specific needs. [ABSTRACT FROM AUTHOR] |