Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Hsin-Tsang Lee"'
Publikováno v:
ACM Transactions on the Web. 12:1-29
With the proliferation of web spam and infinite autogenerated web content, large-scale web crawlers require low-complexity ranking methods to effectively budget their limited resources and allocate bandwidth to reputable sites. In this work, we assum
Publikováno v:
INFOCOM
Exponential growth of the web continues to present challenges to the design and scalability of web crawlers. Our previous work on a high-performance platform called IRLbot [28] led to the development of new algorithms for realtime URL manipulation, d
Publikováno v:
INFOCOM
With the proliferation of web spam and questionable content with virtually infinite auto-generated structure, large-scale web crawlers now require low-complexity ranking methods to effectively budget their limited resources and allocate the majority
Publikováno v:
WWW
This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with the quadratically increasing complexity of verifying URL uniqueness, BF
Publikováno v:
INFOCOM, 2011 Proceedings IEEE; 2011, p811-819, 9p
Publikováno v:
ACM Transactions on the Web; Jun2009, Vol. 3 Issue 3, p8-1-8-34, 34p, 5 Diagrams, 9 Charts, 2 Graphs