Cloudmatcher

Autor: Govind, Yash, Paulson, Erik, Nagarajan, Palaniappan, C., Paul Suganthan G., Doan, AnHai, Park, Youngchoon, Fung, Glenn M., Conathan, Devin, Carter, Marshall, Sun, Mingju
Zdroj: Proceedings of the VLDB Endowment; August 2018, Vol. 11 Issue: 12 p2042-2045, 4p
Abstrakt: As data science applications proliferate, more and more lay users must perform data integration (DI) tasks, which used to be done by sophisticated CS developers. Thus, it is increasingly critical that we develop hands-off DI services, which lay users can use to perform such tasks without asking for help from developers. We propose to demonstrate such a service. Specifically, we will demonstrate CloudMatcher, a hands-off cloud/crowd service for entity matching (EM). To use CloudMatcher to match two tables, a lay user only needs to upload them to the CloudMatcher's Web page then iteratively label a set of tuple pairs as match/no-match. Alternatively, the user can enlist a crowd of workers to label the pairs. In either case, the lay user can easily perform EM end-to-end without having to involve any developers. Cloud-Matcher has been used in several domain science projects at UW-Madison and at several organizations, and is scheduled to be deployed in a large company in Summer 2018. In the demonstration we will show how easy it is for lay users to perform EM (either via interactive labeling or crowdsourcing), how users can easily create and experiment with a range of EM workflows, and how CloudMatcher can scale to many concurrent users and large datasets.
Databáze: Supplemental Index