Abstrakt: |
The available memory on a desktop or laptop computer can easily be exceeded by a large data set/stream when performing analytics or training machine learning algorithms. In this paper we propose simple methods to recalculate and update the value of descriptive statistics (averages, variance, coefficients of skewness and kurtosis) after rescaling, adding or excluding some observations. The coefficient of determination from regression analysis is also recalculated after deleting each sample observation. The objective is to avoid including all the observations (especially in a big data set/stream) when those statistics are recalculated. For this, only its previous value updated by the changed observations is needed, saving time in the respective calculation. Applications include GDP, stock prices of the 'Magnificent Seven' and Price-to-Earnings ratio. [ABSTRACT FROM AUTHOR] |