Výsledky vyhledávání - "Saulpic, David"

Report

Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means

Autor: la Tour, Max Dupré, Saulpic, David

Clustering is one of the staples of data analysis and unsupervised learning. As such, clustering algorithms are often used on massive data sets, and they need to be extremely fast. We focus on the Euclidean $k$-median and $k$-means problems, two of t

Externí odkaz: http://arxiv.org/abs/2407.11217

Zobrazit plný text záznamu

Report

Fully Dynamic k-Means Coreset in Near-Optimal Update Time

Autor: la Tour, Max Dupré, Henzinger, Monika, Saulpic, David

We study in this paper the problem of maintaining a solution to $k$-median and $k$-means clustering in a fully dynamic setting. To do so, we present an algorithm to efficiently maintain a coreset, a compressed version of the dataset, that allows easy

Externí odkaz: http://arxiv.org/abs/2406.19926

Zobrazit plný text záznamu

Report

Making Old Things New: A Unified Algorithm for Differentially Private Clustering

Autor: la Tour, Max Dupré, Henzinger, Monika, Saulpic, David

As a staple of data analysis and unsupervised learning, the problem of private clustering has been widely studied under various privacy models. Centralized differential privacy is the first of them, and the problem has also been studied for the local

Externí odkaz: http://arxiv.org/abs/2406.11649

Zobrazit plný text záznamu

Report

Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds

Autor: Bansal, Nikhil, Cohen-Addad, Vincent, Prabhu, Milind, Saulpic, David, Schwiegelshohn, Chris

Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $\Omega$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ up to

Externí odkaz: http://arxiv.org/abs/2405.01339

Zobrazit plný text záznamu

Report

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

Autor: Draganov, Andrew, Saulpic, David, Schwiegelshohn, Chris

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly compress th

Externí odkaz: http://arxiv.org/abs/2404.01936

Zobrazit plný text záznamu

Report

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

Autor: Axiotis, Kyriakos, Cohen-Addad, Vincent, Henzinger, Monika, Jerome, Sammy, Mirrokni, Vahab, Saulpic, David, Woodruff, David, Wunder, Michael

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and sensitivity s

Externí odkaz: http://arxiv.org/abs/2402.17327

Zobrazit plný text záznamu

Report

Experimental Evaluation of Fully Dynamic k-Means via Coresets

Autor: Henzinger, Monika, Saulpic, David, Sidl, Leonhard

For a set of points in $\mathbb{R}^d$, the Euclidean $k$-means problems consists of finding $k$ centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently

Externí odkaz: http://arxiv.org/abs/2310.18034

Zobrazit plný text záznamu

Report

Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation

Autor: Cohen-Addad, Vincent, Saulpic, David, Schwiegelshohn, Chris

In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic $k$-median and $k$-means problems, there are no kno

Externí odkaz: http://arxiv.org/abs/2310.04076

Zobrazit plný text záznamu

Report

Differential Privacy for Clustering Under Continual Observation

Autor: la Tour, Max Dupré, Henzinger, Monika, Saulpic, David

We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objective under con

Externí odkaz: http://arxiv.org/abs/2307.03430

Zobrazit plný text záznamu

Report

Improved Coresets for Euclidean $k$-Means

Autor: Cohen-Addad, Vincent, Larsen, Kasper Green, Saulpic, David, Schwiegelshohn, Chris, Sheikh-Omar, Omar Ali

Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest cent

Externí odkaz: http://arxiv.org/abs/2211.08184

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání