Popis: |
Big Data -A, an acceleration framework that optimizes Big Data with plug-in components for fast data movement, overcoming the existing limitations. A novel network-levitated merge algorithm is introduced to merge data without repetition and disk access. In addition, a full pipeline is designed to overlap the shuffle, merge, and reduce phases. Our experimental results show that Big Data -A significantly speeds up data movement in Map Reduce and doubles the throughput of Big Data. In addition, Big Data -A significantly reduces disk accesses caused by intermediate data. In this paper, we propose, APSO, a distributed frequent sub graph mining method over Map Reduce. Given a graph database, and a minimum support threshold, APSO generates a complete set of frequent sub graphs. To overcome the dependency among the states of a mining process, APSO runs in an iterative fashion, where the output from the reducers of iteration i−1 is used as an input for the mappers in the iteration i. The mappers of iteration i generate candidate sub graphs of size i (number of edge), and also compute the local support of the candidate pattern. The reducers of iteration i then find the true frequent sub graphs (of size i) by aggregating their local supports. They also write the data in disk that are processed in subsequent iterations. |