Detecting duplicate biological entities using Shortest Path Edit Distance

Autor: James Geller, Min Song, Alex Rudniy
Rok vydání: 2010
Předmět:
Zdroj: International journal of data mining and bioinformatics. 4(4)
ISSN: 1748-5673
Popis: Duplicate entity detection in biological data is an important research task. In this paper, we propose a novel and context-sensitive Shortest Path Edit Distance (SPED) extending and supplementing our previous work on Markov Random Field-based Edit Distance (MRFED). SPED transforms the edit distance computational problem to the calculation of the shortest path among two selected vertices of a graph. We produce several modifications of SPED by applying Levenshtein, arithmetic mean, histogram difference and TFIDF techniques to solve subtasks. We compare SPED performance to other well-known distance algorithms for biological entity matching. The experimental results show that SPED produces competitive outcomes.
Databáze: OpenAIRE