Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?
Autor: | Keheliya Gallaba, Shane McIntosh, Oliver E. Clark, Durham Abric, Matthew Caminiti |
---|---|
Rok vydání: | 2019 |
Předmět: |
Questions and answers
Information retrieval business.industry Computer science media_common.quotation_subject Software development 020207 software engineering 02 engineering and technology Software 0202 electrical engineering electronic engineering information engineering Stack overflow 020201 artificial intelligence & image processing Degree of similarity Heuristics business Reputation media_common |
Zdroj: | MSR |
Popis: | Duplicate questions on Stack Overflow are questions that are flagged as being conceptually equivalent to a previously posted question. Stack Overflow suggests that duplicate questions should not be discussed by users, but rather that attention should be redirected to their previously posted counterparts. Roughly 53% of closed Stack Overflow posts are closed due to duplication. Despite their supposed overlapping content, user activity suggests duplicates may generate additional or superior answers. Approximately 9% of duplicates receive more views than their original counterparts despite being closed. In this paper, we analyze duplicate questions from two perspectives. First, we analyze the experience of those who post duplicates using activity and reputation-based heuristics. Second, we compare the content of duplicates both in terms of their questions and answers to determine the degree of similarity between each duplicate pair. Through analysis of the MSR challenge dataset, we find that although duplicate questions are more likely to be created by inexperienced users, they often receive dissimilar answers to their original counterparts. Indeed, supplementary textual analysis using Natural Language Processing (NLP) techniques suggests duplicate questions provide additional information about the underlying concepts being discussed. We recommend that the Stack Overflow's duplication policy be revised to account for the benefits that leaving duplicate questions open may have for the developer community. |
Databáze: | OpenAIRE |
Externí odkaz: |