On the selection of optimal subdata for big data regression based on leverage scores

Autor: Chasiotis, Vasilis, Karlis, Dimitris
Rok vydání: 2023
Předmět:
Druh dokumentu: Working Paper
Popis: The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper, we explore an existing approach based on leverage scores, proposed for subdata selection in linear model discrimination. Our objective is to propose the aforementioned approach for selecting the most informative data points to estimate unknown parameters in both the first-order linear model and a model with interactions. We conclude that the approach based on leverage scores improves existing approaches, providing simulation experiments as well as a real data application.
Comment: arXiv admin note: text overlap with arXiv:2305.00218
Databáze: arXiv