Learning

Introduction to Hadoop & Big Data

introduction to hadoop-Developers Nation
Written by Naveen

Hadoop

Hadoop is an open-source framework that allows to gather and process BIG DATA in an allocated environment over groups of computers using simple programming models. It is designed to scale up from individual web servers to countless numbers of devices, each providing local calculations and storage space.

Introduction to Hadoop and Big data

BIG DATA

Big data is a set of large datasets that cannot be processed using conventional processing techniques. It is not only one method or a device; rather it provides many areas of business and technology.

What Comes Under Big Data?

 Social Media Data

 Transportation Data

 Search Engine Data

 Power Grid Data

 Stock Exchange data etc.

Big data contains vast quantity, great speed, and an extensible variety of data. The data in it will be of three types.

 Semi-Structured data: XML data.

 Unstructured data: Word, PDF, Written text, Press Logs

 structured data: Relational data.

Big data Challenges

The significant challenges associated with big data are as follows:

 Catching data

 Curation

 Storage

 Searching

 Sharing

 Transfer

 Analysis

 Presentation

Hadoop Architecture

At its primary, Hadoop has two important levels namely:

(a)Processing/Computation part (MapReduce), and

(b)Storage layer (Hadoop Distributed Data File System).

hadoop2-developers-nation

MapReduce

MapReduce is a development design made for handling considerable amounts of data in similar by splitting the work into a set of separate projects. MapReduce programs are written in a particular style dependent efficient development constructs, specifically idioms for handling details of data

HDFS(Hadoop Distributed Data File System)

HDFS, the Hadoop Distributed Data File Program, is an allocated file system meant to hold very considerable amounts of data (terabytes or even petabytes) and provide high-throughput access to this information. Data files are stored in a repetitive fashion across several machines to ensure their durability to failure as well as accessibility to very similar programs.

Advantages of Hadoop

 Hadoop framework allows users to create quickly and tests distributed systems.

 Servers can be added or taken off the group dynamically, and Hadoop continues to operate without interruption.

 Another significant advantage of Hadoop is that apart from being free.

 It works with on all the systems since it is java based.

 HDFS store a great deal of data.

Disadvantages of Hadoop

 A cause of the software under efficient development, a Hadoop Map-reduce, and HDFS are difficult in manner.

 Cluster management is hard,

 Connects of several datasets, nodes are challenging and slowly.

 The programming model is very limited.


 

That was all about Hadoop and Big Data. Hope you enjoyed reading it and got what you were looking. Please leave your feedback & for any queries kindly ask us in comments or on our discussion forum.

About the author

Naveen