Incremental Map-Reduce on Repository History
Autor: | Johannes Härtel, Ralf Lämmel |
---|---|
Rok vydání: | 2020 |
Předmět: |
Functional programming
Computer science business.industry Search engine indexing 020207 software engineering 02 engineering and technology 020204 information systems Map reduce Scalability 0202 electrical engineering electronic engineering information engineering Redundancy (engineering) Homomorphism Software engineering business Mining software repositories |
Zdroj: | SANER |
Popis: | Work on Mining Software Repositories typically involves processing abstractions of resources on individual revisions. A corresponding processing of abstractions of resource changes often depends on working with all revisions of the repository history to guarantee a high resolution of the measured changes. Abstractions of resources and abstractions of resource changes are often very related up to the point that they can be used interchangeably in the processing. In practice, approaches working with abstractions processed over high revision counts face a scalability challenge. In this work, we contribute to the challenge by incrementalizing the processing of repository resources and the corresponding abstractions. Our work is inspired by incrementalization theory including insights on Abelian groups, group homomorphisms and indexing. We provide a map-reduce interface that enables calls to foreign functionality and convenient operations for processing abstractions, such as mapping, filtering, group-wise aggregation and joining. Apache Spark is used for distribution. We compare the scalability of our approach with available MSR approaches, i.e., with LISA that reduces redundancy and with DJ-Rex that migrates an analysis to a distributed map-reduce framework. |
Databáze: | OpenAIRE |
Externí odkaz: |