Down-sampled and Under-sampled Data sets in Feature Selective Validation (FSV)

Autor: Lixin Wang, Gang Zhang, Alistair Duffy, Karol Aniserowicz, Hugh Sasse, Antonio Orlandi, Danilo Di Febo
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Popis: The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link. Feature Selective Validation (FSV) is a heuristic method for quantifying the (dis)similarity of two data sets. The computational burden of obtaining the FSV values might be unnecessarily high if data sets with large numbers of points are used. While this may not be an important issue per se it is an important issue for future developments in FSV such as real-time processing or where multi-dimensional FSV is needed. Coupled with the issue of data set size, is the issue of data sets having ‘missing’ values. This may come about because of a practical difficulty or because of noise or other confounding factors making some data points unreliable. These issues relate to the question “what is the effect on FSV quantification of reducing or removing data points from a comparison – i.e. down- or under-sampling data?” This paper uses three strategies to achieve this from known data sets. This paper demonstrates, through a representative sample of 16 pairs of data sets, that FSV is robust to changes providing a minimum data set size of approximately 200 points is maintained. It is robust also for up to approximately 10% ‘missing’ data, providing this does not result in a continuous region of missed data.
Databáze: OpenAIRE