Efficient handling of heterogeneous file formats in HDFS

Autor: Suhas D. Raut, More Vaishali Prashant
Rok vydání: 2015
Předmět:
Zdroj: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT).
DOI: 10.1109/icecct.2015.7226034
Popis: The amount of data in our industry and the world is exploding. Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. In an Organization, there are multiple types of documents collected from the different sources. This documents that needs to be accessible immediately; documents that needs to be accessed within a few seconds or minutes; and documents that is accessed in frequently. While these types of documents play different roles within an organization, each is valuable. These different types of documents require different kinds of storage solutions. For handling of such heterogeneous file format we use Hadoop. In Hadoop, storage of different documents is provided by HDFS (Hadoop Distributed File System). Also in educational organization, documents categorization is one of the most important tasks. Availability of a document and need of providing a category to a document motivated for implementing this project.
Databáze: OpenAIRE