top of page
Writer's pictureVj

Big Data : Hadoop and MapReduce?

Updated: Dec 26, 2019

Big-data, Hadoop, and Map-reduce used as synonymous to explain the current trend of data.



Welcome to your blog post Hadoop and Map-reduce.


What is Hadoop? Simple answer, Hadoop lets you store files bigger than what stored on one particular node or server. So you can store very, very large files and many files on multiple servers/computers in a distributed fashion.”

Advantages of Hadoop include affordability (it runs on industry-standard hardware and agility (store any data, run any analysis).


Apache Hadoop

#Hadoop is an Apache open-source project that provides a parallel storage and processing framework. Its primary purpose is to run #MapReduce batch programs in parallel on tens to thousands of server nodes.


Hadoop scales out to large clusters of servers and storage using the Hadoop Distributed File System (#HDFS) to manage massive data sets and spread them across the servers.

Hadoop comes with libraries and utilities needed by other Hadoop modules. Hadoop consists of the Hadoop Common package, which provides file system and OS level abstractions, a MapReduce engine. The Hadoop Common package contains the necessary JAVA files and scripts needed to start Hadoop. The package also provides source code, documentation, and a contribution section that includes projects from the Hadoop Community.


Hadoop Distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. Hadoop scales out to large clusters of servers and storage using the #HadoopDistributedFileSystem (HDFS) to manage massive data sets and spread them across the servers.


HDFS designed to be a scalable, fault-tolerant, distributed storage system that works closely with MapReduce. HDFS will "just work" under a variety of physical and systemic circumstances. By distributing storage and computation across many servers, the combined storage resource can grow with demand while remaining economical at every size.


Map-Reduce

Map-reduce is a framework for processing the data. The data is not moved in a conventional fashion using the network because it is slow for massive amounts of data and media. #MapReduce uses a better approach to fit well with big data sets. So rather than move the data to the software, MapReduce moves the processing software to the data.

Map Reduce – a programming model for large scale data processing. MapReduce refers to the application modules written by a programmer that runs in two phases: first mapping the data (extract) then reducing it (transform).

2 views0 comments

Comments


bottom of page