Sunday, 2 December 2012

Apache Hadoop Big Data Analytics Tool and Technology: Defination, Advantages, Disadvantages

Apache Hadoop Big Data Analytics Tool and Technology: Defination, Advantages, Disadvantages

In this article, we will discuss some point about Apache Hadoop Big Data Tool and Technology available to us for managing the real time big data. We will also look on advantages and disadvantages of Apache Hadoop Big Data Analytics.

What is Hadoop Big Data Analytics Tool and Technology

Hadoop is the best tool available today for processing and storing herculean amounts of big data . Hadoop throws hundreds or thousands of computers at the big data problem, rather than using single computer.

Hadoop makes data mining, analytics, and processing of big data cheap and fast. Hadoop can take most of your big data problems and unlock the answers, because you can keep all your data, including all of your historical data, and get an answer before your children graduate college.

Apache Hadoop is an open-source project inspired by research of Google. Since you were wondering, Hadoop is named after the stuffed toy elephant of the lead programmer's son. This explains the preponderance of pachyderms wherever Hadoop is mentioned.

In Hadoop parlance, the group of coordinated computers is called a cluster, and the individual computers in the cluster are called nodes.

Advantages of Hadoop Big Data Analytics Tool and Technology

1. Hadoop is cheap. Hadoop is an open-source Apache project, which means anybody is free to use it. Hadoop runs on commodity hardware (i.e. normal everyday computers), so you don't have to buy million-dollar specialized database machines.

2. Hadoop is fast. Hadoop can deal with terabytes of data in minutes, and with petabytes in hours. Hadoop is the only way that companies with gigantic amounts of data like Facebook, Twitter, Yahoo, eBay, and Amazon can cost-effectively and quickly make decisions.

3. Hadoop scales to large amounts of big data storage. Need to add more space? Just add more hard drives to a node, or even add more nodes to your cluster. You never shut down Hadoop.

4. Hadoop scales to large amounts of big data computation. Is your cluster slow? Just add more nodes to spread out the computation. Hadoop scales almost linearly in many cases - this means you can halve the time it takes to do a job by doubling the number of compute nodes.

5. Hadoop is flexible with types of big data. Are you dealing with structured data? Great. Do you have semi-structured or unstructured (document-oriented) data? Lovely. Hadoop stores and processes any kind of data.

6. Hadoop is flexible with programming languages. Hadoop is natively written in Java, but you can access your data in a SQL-inspired language called Apache Hive. If you want a more procedural language for analysis, there is Apache Pig. If you want to get deep into the framework, you can custom-analyse your data by writing code in Java, C/C++, Ruby, Python, C#, QBASIC or anything else.

Disadvantages of Hadoop Big Data Analytics Tool and Technology

1. Plain Hadoop is hard to to set up. Have you tried setting up this thing? Your best bet may be to kidnap some professors and press them into your service.

2. Plain Hadoop is hard to manage. How do you do anything? Where is the graphical user interface? Oh, there is none.

3. Plain Hadoop is hard to keep alive. Hadoop has various single points of failure. When Hadoop collapses, you lose data and you lose time. That hurts.

4. Plain Hadoop is hard to use. Seriously, this is not a joke. Even adding up a list of numbers is painful.

5. Plain Hadoop is not secure. Your files are not secure and users can easily corrupt or steal data. I hope you trust everybody.

6. Plain Hadoop is not optimized for your hardware. Hadoop does not run at full capacity for your hardware, which is like being stuck in second gear.

5 comments:

  1. One of the disadvantages is single point of failure for the name space node.

    ReplyDelete
  2. While many enterprises are addressing the storage and access challenges around the volume, variety and velocity of big data. Big Data trainings

    ReplyDelete
  3. Big Data Brings Big Opportunities for a data mining job and data mining careers.
    Big Data trainings

    ReplyDelete
  4. Good analysis Naresh. As an alternative to Hadoop, LexisNexis has open sourced its HPCC Systems big data platform which represents more than a decade of internal research and development in the big data analytics field. The HPCC Systems platform is a complete enterprise-ready solution. Designed by data scientists, it provides for a single architecture, a consistent data-centric programming language (ECL), and two data processing clusters. Their built-in analytics libraries for Machine Learning and BI integration provide a complete integrated solution from data ingestion and data processing to data delivery. This all in one platform means only one thing to support and from a significant lower number of resources. As your article states, the complexity of the Hadoop ecosystem requires a huge investment in technology and resources up front and throughout. More at http://hpccsystems.com

    ReplyDelete