Divide and recombine (D&R) data science projects for deep analysis of big data and high computational complexity
Autor: | Matthew C. Bowers, John Gerth, William S. Cleveland, Yuying Song, Ashrith Barthur, Wen-wen Tung |
---|---|
Rok vydání: | 2018 |
Předmět: |
Statistics and Probability
Complex data type Focus (computing) Service (systems architecture) 010504 meteorology & atmospheric sciences Computational complexity theory Computer science business.industry Big data 01 natural sciences Data science 010104 statistics & probability Computational Theory and Mathematics Categorization Blacklisting Granularity 0101 mathematics business 0105 earth and related environmental sciences |
Zdroj: | Japanese Journal of Statistics and Data Science. 1:139-156 |
ISSN: | 2520-8764 2520-8756 |
Popis: | The focus of data science is data analysis. This article begins with a categorization of the data science technical areas that play a direct role in data analysis. Next, big data are addressed, which create computational challenges due to the data size, as does the computational complexity of many analytic methods. Divide and recombine (DR deep analysis of the data, which means analysis of the detailed data at their finest granularity; easy programming of analyses; and high computational performance. To succeed, D&R requires research in all of the technical areas of data science. Network cybersecurity and climate science are two subject-matter areas with big, complex data benefiting from D&R. We illustrate this by discussing two datasets, one from each area. The first is the measurements of 13 variables for each of 10,615,054,608 queries to the Spamhaus IP address blacklisting service. The second has 50,632 3-hourly satellite rainfall estimates at 576,000 locations. |
Databáze: | OpenAIRE |
Externí odkaz: |