Impact of Hash Value Length on Document Comparison System’s Performance

Autor: Juričić, Vedran, Soleša, Dragan, Dunđer, Ivan
Přispěvatelé: Damir Boras, Nives Mikelić Preradović, Francisco Moya, Mohamed Roushdy, Abdel-Badeeh M. Salem
Jazyk: angličtina
Rok vydání: 2013
Předmět:
Popis: This paper analyses the changes that occur in a document comparison system when changing the length of hash values of documents’ n-grams, that is, when changing the number of bits that are used to store hash values. A hash-based document comparison system was developed and used to perform different analyses. The authors analyzed dependencies between hash value length and disk space requirements, comparison process time and F-measure, in order to find the optimum length, a balance between the best performance and the lowest space and time requirements. Because of the regularity of those dependencies, the authors tried to approximate values obtained by testing with exponential and trigonometric functions.
Databáze: OpenAIRE