Applying a dynamic threshold to improve cluster detection of LSI
Autor: | Steven Klusener, Pieter van der Spek |
---|---|
Přispěvatelé: | Software and Sustainability (S2), Network Institute, Information Systems, Information Management & Software Engineering |
Jazyk: | angličtina |
Rok vydání: | 2011 |
Předmět: | |
Zdroj: | Science of Computer Programming, 76(12), 1261-1274. Elsevier van der Spek, P N & Klusener, A S 2011, ' Applying a dynamic threshold to improve cluster detection of LSI ', Science of Computer Programming, vol. 76, no. 12, pp. 1261-1274 . https://doi.org/10.1016/j.scico.2010.12.004 |
ISSN: | 0167-6423 |
DOI: | 10.1016/j.scico.2010.12.004 |
Popis: | Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the clusters, which identify the concerns. In this contribution the authors compare two tree cutting strategies: the Dynamic Hybrid cut and the commonly used fixed height threshold. Two case studies have been performed on the source code of Philips Healthcare to compare the results using both approaches. While some of the settings are particular to the Philips-case, the results show that applying a dynamic threshold, implemented by the Dynamic Hybrid cut, is an improvement over the fixed height threshold in the detection of clusters representing relevant concerns. This makes the approach as a whole more usable in practice. © 2010 Elsevier B.V. All rights reserved. |
Databáze: | OpenAIRE |
Externí odkaz: |