Projection-Uniform Subsampling Methods for Big Data

Autor: Yuxin Sun, Wenjun Liu, Ye Tian
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Mathematics, Vol 12, Iss 19, p 2985 (2024)
Druh dokumentu: article
ISSN: 2227-7390
DOI: 10.3390/math12192985
Popis: The idea of experimental design has been widely used in subsampling algorithms to extract a small portion of big data that carries useful information for statistical modeling. Most existing subsampling algorithms of this kind are model-based and designed to achieve the corresponding optimality criteria for the model. However, data generating models are frequently unknown or complicated. Model-free subsampling algorithms are needed for obtaining samples that are robust under model misspecification and complication. This paper introduces two novel algorithms, called the Projection-Uniform Subsampling algorithm and its extension. Both algorithms aim to extract a subset of samples from big data that are space-filling in low-dimensional projections. We show that subdata obtained from our algorithms perform superiorly under the uniform projection criterion and centered L2-discrepancy. Comparisons among our algorithms, model-based and model-free methods are conducted through two simulation studies and two real-world case studies. We demonstrate the robustness of our proposed algorithms in building statistical models in scenarios involving model misspecification and complication.
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje