Data profiling with metanome

Autor: Moritz Finke, Jakob Zwiener, Tanja Bergmann, Thorsten Papenbrock, Felix Naumann
Rok vydání: 2015
Předmět:
Zdroj: Proceedings of the VLDB Endowment. 8:1860-1863
ISSN: 2150-8097
DOI: 10.14778/2824032.2824086
Popis: Data profiling is the discipline of discovering metadata about given datasets. The metadata itself serve a variety of use cases, such as data integration, data cleansing, or query optimization. Due to the importance of data profiling in practice, many tools have emerged that support data scientists and IT professionals in this task. These tools provide good support for profiling statistics that are easy to compute, but they are usually lacking automatic and efficient discovery of complex statistics, such as inclusion dependencies, unique column combinations, or functional dependencies. We present Metanome, an extensible profiling platform that incorporates many state-of-the-art profiling algorithms. While Metanome is able to calculate simple profiling statistics in relational data, its focus lies on the automatic discovery of complex metadata. Metanome's goal is to provide novel profiling algorithms from research, perform comparative evaluations, and to support developers in building and testing new algorithms. In addition, Metanome is able to rank profiling results according to various metrics and to visualize the, at times, large metadata sets.
Databáze: OpenAIRE