Similarity Search for an Extreme Application: Experience and Implementation
Autor: | Pavel Zezula, Vladimir Mic, Aleš Křenek, Tomáš Raček |
---|---|
Rok vydání: | 2021 |
Předmět: |
Minimisation (psychology)
Computer science Orders of magnitude (acceleration) Nearest neighbor search 02 engineering and technology Variance (accounting) computer.software_genre Search engine Metric space Similarity (network science) 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining computer Curse of dimensionality |
Zdroj: | Similarity Search and Applications ISBN: 9783030896560 SISAP |
DOI: | 10.1007/978-3-030-89657-7_20 |
Popis: | Contemporary challenges for efficient similarity search include complex similarity functions, the curse of dimensionality, and large sizes of descriptive features of data objects. This article reports our experience with a database of protein chains which form (almost) metric space and demonstrate the following extreme properties. Evaluation of the pairwise similarity of protein chains can take even tens of minutes, and has a variance of six orders of magnitude. The minimisation of a number of similarity comparisons is thus crucial, so we propose a generic three stage search engine to solve it. We improve the median searching time 73 times in comparison with the search engine currently employed for the protein database in practice. |
Databáze: | OpenAIRE |
Externí odkaz: |