COUNTATA
Autor: | Yuval Moskovitch, H. V. Jagadish |
---|---|
Rok vydání: | 2020 |
Předmět: |
Estimation
Profiling (computer programming) Computer science business.industry General Engineering Pattern recognition 02 engineering and technology Function (mathematics) Data set 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business |
Zdroj: | Proceedings of the VLDB Endowment. 13:2829-2832 |
ISSN: | 2150-8097 |
DOI: | 10.14778/3415478.3415486 |
Popis: | Information regarding the counts of attributes combination is central to the profiling of a data set. It may reveal bias; it can help determine fitness for use. While counts of individual attribute values may be stored in some data set profiles, there are too many combinations of attributes for it to be practical to store counts for each combination. To this end, we present the notion of storing a "label" of limited size that can be used to obtain good estimates for these counts. A label contains information regarding the count of selected patterns-attributes values combinations-in the data. We define an estimation function, that uses this label to estimate the count of every pattern. Intuitively, there is a trade-off between the label size and its estimation error. We propose a demonstration of Countata, a system that allows the user to examine this trade-off as well as the label's count information. We will demonstrate the usefulness of Countata using real-life data, and illustrate the effectiveness of our estimation paradigm. |
Databáze: | OpenAIRE |
Externí odkaz: |