Towards Exploratory Data Analysis for Pharo

Autor:	Oleksandr Zaytsev, Nick Papoulias, Serge Stinckwich
Rok vydání:	2017
Předmět:	Exploratory data analysis Information retrieval Descriptive statistics Polymath Computer science Pharo Data structure computer Language Integrated Query Smalltalk Associative property computer.programming_language
Zdroj:	IWST
DOI:	10.1145/3139903.3139918
Popis:	Data analysis and visualizations techniques (such as split-apply-combine) make extensive use of associative tabular data-structures that are cumbersome to use with common aggregation APIs (for arrays, lists or dictionaries). In these cases a fluent API for querying associative tabular data (like the ones provided by Pandas, Mathematica or LINQ) is more appropriate for interactive exploration environments. In Smalltalk despite the fact that many important analysis tools are already present (for e.g., in the PolyMath library), we are still missing this essential part of the data science toolkit. These specialized data structures for tabular datasets can provide us with a simple and powerful API for summarizing, cleaning, and manipulating a wealth of data-sources that are currently cumbersome to use. In this paper we introduce the DataFrame and DataSeries collections - that are specifically designed for working with structured data. We demonstrate how these tools can be used for descriptive statistics and Exploratory Data Analysis (EDA) - the critical first step of data analysis which allows us to get the summary of a dataset, detect mistakes, determine the relations, and select the appropriate model for further confirmatory analysis. We then detail the implementation trade-offs that we are currently facing in our implementation for Pharo and discuss future perspectives.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::845e86f0e237ab677f7e6c7929bc97de https://doi.org/10.1145/3139903.3139918 Zobrazit plný text záznamu