Zobrazeno 1 - 7
of 7
pro vyhledávání: '"David Menestrina"'
Autor:
John Cieslewicz, Chad Whipkey, David Menestrina, Himani Apte, Traian Stancescu, Jeff Shute, Kyle Littlefield, Radek Vingralek, Eric Rollins, Stephan Ellner, Ben Handy, Mircea Oancea, Bart Samwel, Ian Rae
Publikováno v:
Proceedings of the VLDB Endowment. 6:1068-1079
F1 is a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional
Publikováno v:
Proceedings of the VLDB Endowment. 3:208-219
Entity Resolution (ER) is the process of identifying groups of records that refer to the same real-world entity. Various measures (e.g., pairwise F 1 , cluster F 1 ) have been used for evaluating ER results. However, ER measures tend to be chosen in
Autor:
Hector Garcia-Molina, David Menestrina, Qi Su, Omar Benjelloun, Steven Euijong Whang, Jennifer Widom
Publikováno v:
The VLDB Journal. 18:255-276
We consider the entity resolution (ER) problem (also known as deduplication, or merge---purge), in which records determined to represent the same real-world entity are successively located and merged. We formalize the generic ER problem, treating the
Publikováno v:
ICDE
Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms for finding all pairs of elements from an input set that meet a similarity
Autor:
Georgia Koutrika, Hector Garcia-Molina, Steven Euijong Whang, David Menestrina, Martin Theobald
Publikováno v:
SIGMOD Conference
Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large dataset
Autor:
Tait Eliott Larson, David Menestrina, Hector Garcia-Molina, Omar Benjelloun, Heng Gong, Hideki Kawai, S. Thavisomboon
Publikováno v:
ICDCS
Entity resolution (ER) matches and merges records that refer to the same real-world entities, and is typically a compute-intensive process due to complex matching functions and high data volumes. We present a family of algorithms, D-Swoosh, for distr
Publikováno v:
VLDB Journal International Journal on Very Large Data Bases; Jan2009, Vol. 18 Issue 1, p255-276, 22p