Zobrazeno 1 - 10
of 181
pro vyhledávání: '"Dennis Fetterly"'
Publikováno v:
KDD
The convergence behavior of many distributed machine learning (ML) algorithms can be sensitive to the number of machines being used or to changes in the computing environment. As a result, scaling to a large number of machines can be challenging. In
Publikováno v:
SOSP
Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes,
Publikováno v:
WSDM
Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites. Rewriting algorithms bring linguistic datasets to bear without the need for iterative relevance feedbac
Publikováno v:
Information Retrieval Technology ISBN: 9783642450679
AIRS
AIRS
In this paper, we investigate near-duplicate detection, particularly looking at the detection of evolving news stories. These stories often consist primarily of syndicated information, with local replacement of headlines, captions, and the addition o
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::e62b5f1e1d7180c55ba88147c95cd452
https://doi.org/10.1007/978-3-642-45068-6_18
https://doi.org/10.1007/978-3-642-45068-6_18
Publikováno v:
WSDM
Many phenomena and artifacts such as road networks, social networks and the web can be modeled as large graphs and analyzed using graph algorithms. However, given the size of the underlying graphs, efficient implementation of basic operations such as
The main motivation behind the development of DryadLINQ was to make it easier for non-specialists to write general purpose, scalable programs that can operate on very large input datasets. In order to appeal to non-specialists we designed the program
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::adf2e7c83e19c56ca7106c3ad8127392
https://doi.org/10.1017/cbo9781139042918.004
https://doi.org/10.1017/cbo9781139042918.004
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783642201608
ECIR
ECIR
We present a study of the contributions of three classes of ranking signals: BM25F, a retrieval function that is based on words in the content of web pages and the anchors that link to them; SALSA, a link-based feature that takes all or part of the r
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::fc1d84e2d49a06afd47f7d464ca72809
https://doi.org/10.1007/978-3-642-20161-5_49
https://doi.org/10.1007/978-3-642-20161-5_49
Autor:
Frank McSherry, Dennis Fetterly
Publikováno v:
SIGIR
Due to the explosive growth of the web that has occurred throughout its history, many researchers working on web corpora have begun to move toward distributed, data parallel computing. The size of the ClueWeb09 [2] corpus, at approximately one billio
Publikováno v:
SIGIR
Crawl selection policy has a direct influence on Web search effectiveness, because a useful page that is not selected for crawling will also be absent from search results. Yet there has been little or no work on measuring this effect. We introduce an
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783642009570
ECIR
ECIR
Previous scalability experiments found that early precision improves as collection size increases. However, that was under the assumption that a collection's documents are all sampled with uniform probability from the same population. We contrast thi
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::01121647ea19c89b60199ab5d5422646
https://doi.org/10.1007/978-3-642-00958-7_35
https://doi.org/10.1007/978-3-642-00958-7_35