Detecting duplicate biological entities using Shortest Path Edit Distance
Autor: | James Geller, Min Song, Alex Rudniy |
---|---|
Rok vydání: | 2010 |
Předmět: |
Markov random field
Theoretical computer science Histogram matching Computational Biology Library and Information Sciences General Biochemistry Genetics and Molecular Biology Pattern Recognition Automated Histogram Shortest path problem Graph (abstract data type) Data Mining Edit distance Computational problem tf–idf Algorithms MathematicsofComputing_DISCRETEMATHEMATICS Information Systems Mathematics |
Zdroj: | International journal of data mining and bioinformatics. 4(4) |
ISSN: | 1748-5673 |
Popis: | Duplicate entity detection in biological data is an important research task. In this paper, we propose a novel and context-sensitive Shortest Path Edit Distance (SPED) extending and supplementing our previous work on Markov Random Field-based Edit Distance (MRFED). SPED transforms the edit distance computational problem to the calculation of the shortest path among two selected vertices of a graph. We produce several modifications of SPED by applying Levenshtein, arithmetic mean, histogram difference and TFIDF techniques to solve subtasks. We compare SPED performance to other well-known distance algorithms for biological entity matching. The experimental results show that SPED produces competitive outcomes. |
Databáze: | OpenAIRE |
Externí odkaz: |