Logical analysis of built-in DBSCAN Functions in Popular Data Science Programming Languages
Autor: | Md Amiruzzaman, Rashik Rahman, Md. Rajibul Islam, Rizal Mohd Nor |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | MIST INTERNATIONAL JOURNAL OF SCIENCE AND TECHNOLOGY. 10:25-32 |
ISSN: | 2707-7365 2224-2007 |
DOI: | 10.47981/j.mijst.10(01)2022.349(25-32) |
Popis: | DBSCAN algorithm is a location-based clustering approach; it is used to find relationships and patterns in geographical data. Because of its widespread application, several data science-based programming languages include the DBSCAN method as a built-in function. Researchers and data scientists have been clustering and analyzing their study data using the built-in DBSCAN functions. All implementations of the DBSCAN functions require user input for radius distance (i.e., $\epsilon$) and a minimum number of samples for a cluster (i.e., min\_sample). As a result, the result of all built-in DBSCAN functions is believed to be the same. However, the DBSCAN Python built-in function yields different results than the other programming languages those are analyzed in this study. We propose a scientific way to assess the results of DBSCAN built-in function, as well as output inconsistencies. This study's research reveals various differences and advises caution when working with built-in functionality. |
Databáze: | OpenAIRE |
Externí odkaz: |