Research on string similarity algorithm based on Levenshtein Distance

Autor: Yan Hu, Guangrong Bian, Shengnan Zhang
Rok vydání: 2017
Předmět:
Zdroj: 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC).
DOI: 10.1109/iaeac.2017.8054419
Popis: The application of string similarity is very extensive, and the algorithm based on Levenshtein Distance is particularly classic, but it is still insufficient in the aspect of universal applicability and accuracy of results. Combined with the Longest Common Subsequence (LCS) and Longest Common Substring (LCCS), similarity algorithm based on Levenshtein Distance is improved, and the string similarity result of the improved algorithm is more distinct, reasonable and accurate, and also has a better universal applicability. What's more in the process of similarity calculation, the Solving algorithm of the LD and LCS has been optimized in the data structure, reduce the space complexity of the algorithm from the order of magnitude. And the experimental results are analyzed in detail, which proves the feasibility and correctness of the results.
Databáze: OpenAIRE