GigaHash
Autor: | Denis X. Charles, Kumar Chellapilla, Anton Mityagin |
---|---|
Rok vydání: | 2007 |
Předmět: |
Primary clustering
Theoretical computer science Computer science Universal hashing Dynamic perfect hashing Hash function Hash buster Byte 2-choice hashing Linear hashing Rolling hash Hash table K-independent hashing Hash tree SHA-2 Cryptographic hash function Hash chain Consistent hashing Perfect hash function Double hashing |
Zdroj: | WWW |
Popis: | A minimal perfect function maps a static set of n keys on to the range of integers {0,1,2,...,n - 1}. We present a scalable high performance algorithm based on random graphs for constructing minimal perfect hash functions (MPHFs). For a set of n keys, our algorithm outputs a description of h in expected time O(n). The evaluation of h(x) requires three memory accesses for any key x and the description of h takes up 0.89n bytes (7.13n bits). This is the best (most space efficient) known result to date. Using a simple heuristic and Huffman coding, the space requirement is further reduced to 0.79n bytes (6.86n bits). We present a high performance architecture that is easy to parallelize and scales well to very large data sets encountered in internet search applications. Experimental results on a one billion URL dataset obtained from Live Search crawl data, show that the proposed algorithm (a)finds an MPHF for one billion URLs in less than 4 minutes, and (b) requires only 6.86 bits/key for the description of h. |
Databáze: | OpenAIRE |
Externí odkaz: |