Popis: |
A limitation of many clustering algorithms is the requirement to tune adjustable parameters for each application or even for each dataset. Some techniques require an \emph{a priori} estimate of the number of clusters while density-based techniques usually require a scale parameter. Other parametric methods, such as mixture modeling, make assumptions about the underlying cluster distributions. Here we introduce a non-parametric clustering method that does not involve tunable parameters and only assumes that clusters are unimodal, in the sense that they have a single point of maximal density when projected onto any line, and that clusters are separated from one another by a separating hyperplane of relatively lower density. The technique uses a non-parametric variant of Hartigan's dip statistic using isotonic regression as the kernel operation repeated at every iteration. We compare the method against k-means++, DBSCAN, and Gaussian mixture methods and show in simulations that it performs better than these standard methods in many situations. The algorithm is suited for low-dimensional datasets with a large number of observations, and was motivated by the problem of "spike sorting" in neural electrical recordings. Source code is freely available. |