Fitting Metrics and Ultrametrics with Minimum Disagreements

Autor: Vincent Cohen-Addad, Chenglin Fan, Euiwoong Lee, Arnaud De Mesmay
Rok vydání: 2022
Předmět:
DOI: 10.48550/arxiv.2208.13920
Popis: Given $x \in (\mathbb{R}_{\geq 0})^{\binom{[n]}{2}}$ recording pairwise distances, the METRIC VIOLATION DISTANCE (MVD) problem asks to compute the $\ell_0$ distance between $x$ and the metric cone; i.e., modify the minimum number of entries of $x$ to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied recently. We present an $O(\log n)$-approximation algorithm for MVD, exponentially improving the previous best approximation ratio of $O(OPT^{1/3})$ of Fan et al. [ SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity and running time. We also study the related problem of ULTRAMETRIC VIOLATION DISTANCE (UMVD), where the goal is to compute the $\ell_0$ distance to the cone of ultrametrics, and achieve a constant factor approximation algorithm. The UMVD can be regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar [SIAM J. Computing, 2011] and by Cohen-Addad et al. [FOCS, 2021] from $\ell_1$ norm to $\ell_0$ norm. We show that this problem can be favorably interpreted as an instance of Correlation Clustering with an additional hierarchical structure, which we solve using a new $O(1)$-approximation algorithm for correlation clustering that has the structural property that it outputs a refinement of the optimum clusters. An algorithm satisfying such a property can be considered of independent interest. We also provide an $O(\log n \log \log n)$ approximation algorithm for weighted instances. Finally, we investigate the complementary version of these problems where one aims at choosing a maximum number of entries of $x$ forming an (ultra-)metric. In stark contrast with the minimization versions, we prove that these maximization versions are hard to approximate within any constant factor assuming the Unique Games Conjecture.
Comment: To appear at FOCS 2022 (Full version)
Databáze: OpenAIRE