Big Data technologies

Autor: Dundović, Ivan
Přispěvatelé: Pintar, Damir
Jazyk: chorvatština
Rok vydání: 2015
Předmět:
Popis: Veliki podatci uključuju skupove podataka koji veličinom nadilaze mogućnosti uobičajeno korištenih računalnih alata da prikupe, upravljaju i procesiraju te podatke dovoljno brzo. Definiraju ih tri dimenzije: veličina, brzina i raznovrsnost. Relacijske baze podataka pokazale su se neprikladnim za rad s velikim podatcima. Nerelacijske baze podataka, s druge strane su skalabilnije, pokazuju bolje performanse i rješavaju mnoge probleme za čije rješavanje relacijske baze podataka nisu dizajnirane, poput upravljanja velikim, često nestrukturiranim, podacima. Za rad s velikim podatcima razvijene su mnoge tehnike, tehnologije i alati, među njima i Apache Hadoop. Apache Hadoop, ili kraće Hadoop, je radni okvir za distribuirano spremanje i procesiranje velikih količina podataka na spletu običnih računala kojih može biti od jednog do nekoliko tisuća. Hadoop datotečni sustav, YARN platforma za upravljanje resursima te MapReduce programski model, koji su u radu detaljno objašnjeni, čine osnovu Hadoopa. U petom poglavlju, koristeći Hadoop radni okvir na malom spletu računala, implementiran je MapReduce program koji sortira skup cijelih brojeva iz ulazne datoteke. Big Data usually includes data sets whose size exceeds the capabilities of commonly used computer tools to capture, manage, and process data in a reasonable amount of time. Big Data is defined as three-dimensional, with those being volume, velocity, and variety. Relational databases are inadequate for processing such large data sets. Non-relational databases, on the other hand, are more scalable, show better performance and solve many problems that relational databases are not designed to solve. Many techiques, technologies, and tools were developed to manage Big Data, one of them being Apache Hadoop. Apache Hadoop, or just Hadoop, is framework for distributed storage and processing of large data sets on computer clusters composed of one to several thousand computers built from commodity hardware. Hadoop filesystem (HDFS), YARN platform for resource management and MapReduce programming model, which are covered in the paper in detail, form the base of Hadoop. In the fifth chapter, using Hadoop framework and MapReduce programming model on a small computer cluster, MapReduce software is implemented for sorting integer numbers that are read from input file.
Databáze: OpenAIRE