Popis: |
This paper demonstrates a fast Okapi's BM25 term weighting method on GPUs for information retrieval by combining a GPU-based dictionary using a succinct data structure and data parallel primitives. The problem of handling documents on GPUs is to processing variable length strings such as a document itself and a word. Processing variable size of data causes many idle cores, i.e., load imbalances among threads, due to the SIMD nature of GPU architecture. Our term weighting method is carefully composed of efficient data parallel primitives to avoid load imbalance. Additionally, we implemented a haigh performance compressed dictionary on GPUs. By using this dictionary, words are converted into IDs so that costly string comparisons can be avoided. Our experimental results revealed that the proposed term weighting method on GPUs performs up to 5x faster than the MapReduce-based one on multi-core CPUs. |