Partitioned Persistent Homology

Autor: Malott, Nicholas O.
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Druh dokumentu: Text
Popis: In an era of increasing data size and complexity novel algorithms to mine embedded relationships within big data are being studiedthroughout high performance computing. One such approach, Topological Data Analysis (TDA), studies homologies in apoint cloud. Identified homologies represent topological features --- loops, holes, and voids --- independent of continuousdeformation and resilient to noise. These topological features identify a generalized structure of input data. TDA has shownpromise in several applications including studying various networks, digital images and movies, protein analysis and classification, and genomic sequences. TDA provides an alternativeapproach to classification for general data mining applications.The principle tool of TDA that examines the topological structures in a point cloud is Persistent Homology. Persistenthomology identifies the underlying homologies of the point cloud as the connectedness of the point cloud increases, or persists.The output of persistent homology can be used to classify embedded structures in high dimensional spaces,enable automated classification from connected component analysis, identify abnormalities in data streams, classify structure of protein chains, along with many other scienceand biological applications. Although the uses are vast, persistent homology suffers from exponential growth in both run-time andmemory complexity leading to limitations for evaluating large datasets.Persistent homology cannot be directly applied to large point clouds; modern tools are limited to the analysis of a few thousandpoints in R3. Optimization of data structures and enhancements to the algorithm have led to significant performanceimprovements, but the approaches still suffer in relation to the size of the input point cloud. One approach studied over recentyears is to reduce the input point cloud, either in size or dimensions. These reductions gain obvious improvement, enabling computation on data sets 3-4 magnitudes larger than theexact approach. Salient topological features have been shown to be preserved with the reduced point cloud and can reasonablyapproximate the large topological features of the original point cloud. However, the loss of smaller features in approximateapproaches are still a significant problem for some application areas.A partitioned technique to compute persistent homology that pairs the approximation of the salient topological features with thereconstruction of the smaller features embedded into the partitions is a natural extension to reduction of the point cloud.Several other studies implement this approach by using partitioning algorithms to approximate thelarge topological features while simultaneously reconstructing the smaller topological features from the individual partitions.This technique, developed and formalized in this thesis, is called Partitioned Persistent Homology (PPH).This thesis outlines the theory and approach of Partitioned Persistent Homology coupled with limitations of thereconstructed results. The technique is detailed and integrated into the Lightweight Homology Framework (LHF) for demonstration.Current optimization algorithms are implemented into the framework and detailed to provide a current state of tools for persistenthomology and usage in the LHF application. Analysis of the approach against theory, experimental results, and proofs provides anencompassing overview of Partitioned Persistent Homology. Detail of the considerations for parameter selection frame the use ofPartitioned Persistent Homology for big data research.
Databáze: Networked Digital Library of Theses & Dissertations