Text2Cypher: Bridging Natural Language and Graph Databases

Autor:	Ozsoy, Makbule Gulcin, Messallem, Leila, Besga, Jon, Minneci, Gianandrea
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	Knowledge graphs use nodes, relationships, and properties to represent arbitrarily complex data. When stored in a graph database, the Cypher query language enables efficient modeling and querying of knowledge graphs. However, using Cypher requires specialized knowledge, which can present a challenge for non-expert users. Our work Text2Cypher aims to bridge this gap by translating natural language queries into Cypher query language and extending the utility of knowledge graphs to non-technical expert users. While large language models (LLMs) can be used for this purpose, they often struggle to capture complex nuances, resulting in incomplete or incorrect outputs. Fine-tuning LLMs on domain-specific datasets has proven to be a more promising approach, but the limited availability of high-quality, publicly available Text2Cypher datasets makes this challenging. In this work, we show how we combined, cleaned and organized several publicly available datasets into a total of 44,387 instances, enabling effective fine-tuning and evaluation. Models fine-tuned on this dataset showed significant performance gains, with improvements in Google-BLEU and Exact Match scores over baseline models, highlighting the importance of high-quality datasets and fine-tuning in improving Text2Cypher performance.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2412.10064 Zobrazit plný text záznamu View this record from Arxiv