Zobrazeno 1 - 10
of 35
pro vyhledávání: '"Eugene J. Shekita"'
Publikováno v:
KDD
The convergence behavior of many distributed machine learning (ML) algorithms can be sensitive to the number of machines being used or to changes in the computing environment. As a result, scaling to a large number of machines can be challenging. In
Autor:
Mohamed Y. Eltabakh, Kevin Scott Beyer, Rainer Gemulla, Vuk Ercegovac, Carl-Christian Kanne, Andrey Balmin, Fatma Ozcan, Eugene J. Shekita
Publikováno v:
Scopus-Elsevier
This paper describes Jaql, a declarative scripting language for analyzing large semistructured datasets in parallel using Hadoop's MapReduce framework. Jaql is currently used in IBM's InfoSphere BigInsights [5] and Cognos Consumer Insight [9] product
Publikováno v:
Proceedings of the VLDB Endowment. 4:419-429
Users of MapReduce often run into performance problems when they scale up their workloads. Many of the problems they encounter can be overcome by applying techniques learned from over three decades of research on parallel DBMSs. However, translating
Publikováno v:
IBM Systems Journal. 41:616-641
XML (Extensible Markup Language) has emerged as the standard data-exchange format for Internet-based business applications. These applications introduce a new set of data management requirements involving XML. However, for the foreseeable future, a s
Autor:
Rimon Barr, Michael J. Carey, Bruce G. Lindsay, Berthold Reinwald, Jayavel Shanmugasundaram, Eugene J. Shekita, Hamid Pirahesh
Publikováno v:
VLDB
XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in relational database systems. Consequently, if XML is to fulfill its potent
Autor:
Jayavel Shanmugasundaram, Rajasekar Krishnamurthy, Eugene J. Shekita, Igor Tatarinov, Jeffrey F. Naughton, Efstratios Viglas, Jerry Kiernan
Publikováno v:
ACM SIGMOD Record. 30:20-26
There has been recent interest in using relational database systems to store and query XML documents. Each of the techniques proposed in this context works by (a) creating tables for the purpose of storing XML documents (also called relational schema
Publikováno v:
SIGMOD Conference
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has ne
Publikováno v:
EDBT
MapReduce has emerged as a promising architecture for large scale data analytics on commodity clusters. The rapid adoption of Hive, a SQL-like data processing language on Hadoop (an open source implementation of MapReduce), shows the increasing impor
Spinnaker is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter. It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose eith
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0db3ca23737edbc3b7f8763c9b485dcc
http://arxiv.org/abs/1103.2408
http://arxiv.org/abs/1103.2408
Publikováno v:
SIGMOD Conference
The MapReduce framework is increasingly being used to analyze large volumes of data. One important type of data analysis done with MapReduce is log processing, in which a click-stream or an event log is filtered, aggregated, or mined for patterns. As