Popis: |
Financial institutions are struggling with the challenges of managing and analyzing large dataset, especially if the dataset are accessed and shared among various subsidiaries from different location. These existing organizations, in the past, often employed a centralized control over their dataset as a way to curb data integrity and data solidarity. In this instance, a central processing unit, often a single machine, could consume all resources, else not affording to process the data altogether. To solve these incumbent problems, distributed computing techniques are being introduced to improve the efficiency of data processing. In this project, we present a framework to distribute processing resources for data mining tool, i.e. to provide the ability to distribute the processes, over a cloud software stack using open source tools and platforms. As a conclusion to this framework, we discussed the open source tools and platforms along with our preliminary investigation results. |