What is the Message About? Automatic Multi-label Classification of Open Source Repository Messages into Content Types

Autor: Yannis Korkontzelos, Luis Adrián Cabrera-Diego, Daniel Campbell
Rok vydání: 2020
Předmět:
Zdroj: Advances in Intelligent Systems and Computing ISBN: 9783030442880
AICV
DOI: 10.1007/978-3-030-44289-7_49
Popis: Users of Open Source Software (OSS) projects discuss a diverse range of topics online. The content of a post often corresponds to one or more context-sensitive content types, e.g. a suggestion for a solution, a request for further clarification or indication that a proposed solution did not work. The detection of content types can provide several benefits for software developers. For instance, content types can be used as indicators that summarise the content of the messages. These indicators can be exploited as part of a developer-centric knowledge mining platform allowing developers and project managers to create action alerts concerning new bugs found outside of a bug tracker or they can be combined with other metrics to assess the quality of an OSS project. We present a multi-label classifier, able to classify messages exchanged on communication means about OSS, and detailed evaluation results. We experimented with two state-of-the-art multi-label classification approaches HOMER (Hierarchy Of Multilabel classifiER) and RAkEL (RAndom k-labELsets) as these met the technical requirements of the CROSSMINER project. A manually-annotated threaded corpus of posts form newsgroups discussions, bug tracking systems and forums related to Eclipse projects was also used. The results are promising and indicate the potential to attract novel and deeper research for this task.
Databáze: OpenAIRE