Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

Autor:	Georg Buchgeher, Hannes Thaller, Lukas Linsbauer, Rudolf Ramler, Christian Salomon, Michael Pfeiffer, Claus Klammer
Rok vydání:	2018
Předmět:	Structure (mathematical logic) Dependency (UML) Source code Source lines of code Graph database Computer science business.industry media_common.quotation_subject 020206 networking & telecommunications 020207 software engineering 02 engineering and technology computer.software_genre Knowledge extraction 0202 electrical engineering electronic engineering information engineering Use case Software system Software engineering business computer media_common
Zdroj:	Lecture Notes in Business Information Processing ISBN: 9783030057664 SWQD
DOI:	10.1007/978-3-030-05767-1_9
Popis:	Source code and related artifacts of software systems encode valuable expert knowledge accumulated over many person-years of development. Analyzing software systems and extracting this knowledge requires processing the source code and reconstructing structure and dependency information. In analysis projects over the last years, we have created tools and services using graph databases for representing and analyzing source code and other software engineering artifacts as well as their dependencies. Graph databases such as Neo4j are optimized for storing, traversing, and manipulating data in the form of nodes and relationships. They are scalable, extendable, and can quickly be adapted for different application scenarios. In this paper, we share our insights and experience from five different cases where graph databases have been used as a common solution concept for analyzing source code and related artifacts. They cover a broad spectrum of use cases from industry and research, ranging from lightweight dependency analysis to analyzing the architecture of a large-scale software system with 44 million lines of code. We discuss the benefits and drawbacks of using graph databases in the reported cases. The benefits are related to representing dependencies between source code elements and other artifacts, the support for rapid prototyping of analysis solutions, and the power and flexibility of the graph query language. The drawbacks concern the generic frontends of graph databases and the lack of support for time series data. A summary of application scenarios for using graph databases concludes the paper.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f052df201c57aef3f2a79cdb09442824 https://doi.org/10.1007/978-3-030-05767-1_9 Zobrazit plný text záznamu