Zobrazeno 1 - 10
of 38
pro vyhledávání: '"Nargesian, Fatemeh"'
Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the mat
Externí odkaz:
http://arxiv.org/abs/2404.07354
Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topi
Externí odkaz:
http://arxiv.org/abs/2307.02726
We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the ove
Externí odkaz:
http://arxiv.org/abs/2304.10572
Data scientists often draw on multiple relational data sources for analysis. A standard assumption in learning and approximate query answering is that the data is a uniform and independent sample of the underlying distribution. To avoid the cost of j
Externí odkaz:
http://arxiv.org/abs/2303.00940
The large size and fast growth of data repositories, such as data lakes, has spurred the need for data discovery to help analysts find related data. The problem has become challenging as (i) a user typically does not know what datasets exist in an en
Externí odkaz:
http://arxiv.org/abs/2301.04901
A climate network represents the global climate system by the interactions of a set of anomaly time-series. Network science has been applied on climate data to study the dynamics of a climate network. The core task and first step to enable interactiv
Externí odkaz:
http://arxiv.org/abs/2203.16457
Analyzing patterns in a sequence of events has applications in text analysis, computer programming, and genomics research. In this paper, we consider the all-window-length analysis model which analyzes a sequence of events with respect to windows of
Externí odkaz:
http://arxiv.org/abs/2011.14460
We introduce Kensho, a tool for generating mapping rules between two Knowledge Bases (KBs). To create the mapping rules, Kensho starts with a set of correspondences and enriches them with additional semantic information automatically identified from
Externí odkaz:
http://arxiv.org/abs/2008.01208
We consider the problem of creating a navigation structure that allows a user to most effectively navigate a data lake. We define an organization as a graph that contains nodes representing sets of attributes within a data lake and edges indicating s
Externí odkaz:
http://arxiv.org/abs/1812.07024
Publikováno v:
VLDB Journal International Journal on Very Large Data Bases; Sep2024, Vol. 33 Issue 5, p1283-1306, 24p