We may earn an affiliate commission when you visit our partners.
Course image
Big Data In Real World

From the creators of the successful Hadoop Starter Kit course hosted in Udemy, comes Hadoop In Real World course. This course is designed for anyone who aspire a career as a Hadoop developer. In this course we have covered all the concepts that every aspiring Hadoop developer must know to SURVIVE in

Read more

From the creators of the successful Hadoop Starter Kit course hosted in Udemy, comes Hadoop In Real World course. This course is designed for anyone who aspire a career as a Hadoop developer. In this course we have covered all the concepts that every aspiring Hadoop developer must know to SURVIVE in

The course covers all the must know topics like HDFS, MapReduce, YARN, Apache Pig and Hive etc. and we go deep in exploring the concepts. We just don’t stop with the easy concepts, we take it a step further and cover important and complex topics like file formats, custom Writables, input/output formats, troubleshooting, optimizations etc.

All concepts are backed by interesting hands-on projects like analyzing million song dataset to find less familiar artists with hot songs, ranking pages with page dumps from wikipedia, simulating mutual friends functionality in Facebook just to name a few.

Enroll now

Here's a deal for you

We found an offer that may be relevant to this course.
Save money when you learn. All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

What's inside

Learning objectives

  • Understand what is big data, the challenges with big data and how hadoop propose a solution for the big data problem
  • Work and navigate hadoop cluster with ease
  • Install and configure a hadoop cluster on cloud services like amazon web services (aws)
  • Understand the difference phases of mapreduce in detail
  • Write optimized pig latin instruction to perform complex data analysis
  • Write optimized hive queries to perform data analysis on simple and nested datasets
  • Work with file formats like sequencefile, avro etc
  • Understand hadoop architecture, single point of failures (spof), secondary/checkpoint/backup nodes, ha configuration and yarn
  • Tune and optimize slowing running mapreduce jobs, pig instructions and hive queries
  • Understand how joins work behind the scenes and will be able to write optimized join statements
  • Wherever possible, students will be introduced to difficult questions that are asked in real hadoop interviews
  • Show more
  • Show less

Syllabus

In this section, we will learn about the course structure and the prerequisites for the course. We will help you setup all the necessary things needed to proceed with the course.
Read more
Course Structure
Tools & Setup (Windows)
Tools & Setup (Linux)
In this section, we will learn about what is Big Data and the problems with Big Data. We will explain the solution proposed by Hadoop to Big Data problems.
What is Big Data?
Understanding Big Data Problem
History of Hadoop
Test your understanding of Big Data
In this section, we will learn the need for another filesystem like HDFS. In addition to working with HDFS, we will see the significance of blocks and the read/write internal workings.
HDFS - Why Another Filesystem?
Blocks
Working With HDFS
HDFS - Read & Write
HDFS - Read & Write (Program)
Test your understanding of HDFS
HDFS Assignment
In this section, we will learn about the phases and components involved in MapReduce programming model. We will cover in-depth concepts using exciting projects.
Introduction to MapReduce
Dissecting MapReduce Components
Dissecting MapReduce Program (Part 1)
Dissecting MapReduce Program (Part 2)
Combiner
Counters
Facebook - Mutual Friends
New York Times - Time Machine
Test your understanding of MapReduce
MapReduce Assignment
This section covers a must know tool in Hadoop ecosystem - Apache Pig. We will explore Million Song dataset and Wikipedia dataset using Pig in this section.
Introduction to Apache Pig
Loading & Projecting Datasets
Solving a Problem
Complex Types
Pig Latin - Joins
Million Song Dataset (Part 1)
Million Song Dataset (Part 2)
Page Ranking (Part 1)
Page Ranking (Part 2)
Page Ranking (Part 3)
Test your understanding of Apache Pig
Apache Pig Assignment
We cover all the existing functionalities that Hive has to offer in this section plus we will stream live tweets from Twitter and analyze them using Hive.
Introduction to Apache Hive
Dissect a Hive Table
Loading Hive Tables
Simple Selects
Managed Table vs. External Table
Order By vs. Sort By vs. Cluster By
Partitions
Buckets
Hive QL - Joins
Twitter (Part 1)
Twitter (Part 2)
Test your understanding of Apache Hive
Apache Hive Assignment
Hive Window and Analytical Functions
Introduction to Hive Window and Analytical functions
Kickstarter campaign duplicates and top campaigns
Kickstarter campaign bands and user sessions
We start with the basic HDFS architecture and move on to cover some advanced concepts in SPOF, HA etc. We will also cover MRv1 and YARN architecture in this section.
HDFS Architechture
Secondary Namenode
Highly Available Hadoop
MRv1 Architechture
YARN
Test your understanding of Hadoop Architechture
Now we have all the required knowledge, we will move on to set up a 3-node CDH 5.4 Hadoop cluster on Amazon Web Services (AWS)
Vendors & Hosting
Cluster Setup (Part 1)
Cluster Setup (Part 2)
Cluster Setup (Part 3)

With Amazon EMR we can start a brand new Hadoop cluster and run MapReduce jobs in matter of minutes. This lecture will walk through step by step how to set up a Hadoop cluster and run MapReduce jobs in it.

Test your understanding of Cluster Setup
Hadoop Administrator In Real World (Preview)

In this lecture we will learn about the benefits of Cloudera Manager, differences between Packages and Parcels and lifecycle of Parcels.

In this lecture we will see how to install a 3 node Hadoop cluster on AWS using Cloudera Manager

Knowing to work just with text files is just not going to cut it on a real Hadoop production environment. So in this section we will learn some important file formats that are widely used.
Compression
Sequence File
AVRO
File Formats - Pig
File Formats - Hive
Introduction to RCFile
Working with RCFile
Introduction to ORC
Working with ORC
Parquet - Another Columnar Format
Avro Schema and It's Importance
Schema Evolution in Avro (Part 1)
Schema Evolution in Avro (Part 2)
Test your understanding of File Formats
Concepts we learn in this section will come in handy when things does not go the way we planned in the Hadoop ecosystem. Learn to debug and optimize Hadoop jobs in this section.
Exploring Logs
MRUnit
MapReduce Tuning
Pig Join Optimizations (Part 1)
Pig Join Optimizations (Part 2)
Hive Join Optimizations
Test your understanding of Troubleshooting & Optimizations
Apache Sqoop

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Covers several facets of Big Data, introducing learners to multiple core pieces of the Big Data ecosystem, including HDFS, MapReduce, Apache Pig, and Apache Hive
Teaches complex topics in depth, going beyond simple, easy-to-understand concepts
Provides relevant introduction to Big Data for those looking to begin their career as Hadoop developers
Offers hands-on practice through projects focused on real-world tasks, such as analyzing large datasets to identify underrepresented artists with popular songs
Introduces learners to Hadoop, a widely used framework for processing and managing Big Data
Covers core concepts like HDFS, MapReduce, Apache Pig, and Hive, essential for aspiring Hadoop developers

Save this course

Save Hadoop Developer In Real World to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Hadoop Developer In Real World with these activities:
Attend a Hadoop Meetup
Connect with other Hadoop professionals.
Browse courses on Hadoop
Show steps
  • Find a Hadoop Meetup in your area.
  • Attend the Meetup.
  • Network with other attendees.
Follow a Hadoop tutorial series
Build a strong foundation in Hadoop.
Browse courses on Hadoop
Show steps
  • Find a Hadoop tutorial series.
  • Follow the tutorial series.
  • Complete the assignments.
Review 'Hadoop: The Definitive Guide' by Tom White
Review the concepts covered in the course.
Show steps
  • Read the first three chapters of the book.
  • Summarize the key concepts covered in each chapter.
  • Identify any areas where you need further clarification.
Six other activities
Expand to see all activities and additional details
Show all nine activities
Attend a Hadoop workshop
Gain hands-on experience with Hadoop.
Browse courses on Hadoop
Show steps
  • Find a Hadoop workshop in your area.
  • Attend the workshop.
  • Complete the exercises.
Create a Hadoop cluster on AWS
Gain hands-on experience with Hadoop.
Browse courses on Hadoop
Show steps
  • Sign up for an AWS account.
  • Launch an EC2 instance.
  • Install Hadoop on the EC2 instance.
  • Configure Hadoop.
  • Run a Hadoop job.
Practice MapReduce programming
Develop proficiency in MapReduce programming.
Browse courses on Hadoop
Show steps
  • Find a MapReduce programming tutorial.
  • Complete the tutorial.
  • Practice writing MapReduce programs.
Write a blog post about Hadoop
Deepen your understanding of Hadoop by explaining it to others.
Browse courses on Hadoop
Show steps
  • Choose a topic related to Hadoop.
  • Research the topic.
  • Write a blog post about the topic.
  • Publish the blog post.
Contribute to the Hadoop open source project
Gain experience with open source development and contribute to the Hadoop community.
Browse courses on Hadoop
Show steps
  • Find a Hadoop open source project.
  • Review the project documentation.
  • Make a contribution to the project.
Develop a data analytics application using Hadoop
Apply your Hadoop skills to solve a real-world problem.
Browse courses on Hadoop
Show steps
  • Identify a problem that can be solved using Hadoop.
  • Design a data analytics application.
  • Develop the application.
  • Test the application.
  • Deploy the application.

Career center

Learners who complete Hadoop Developer In Real World will develop knowledge and skills that may be useful to these careers:
Hadoop Developer
Hadoop Developers work with Big Data, and this course will be the perfect way to get started in this in-demand career. Graduates of this course will have the skills needed to succeed in this role and help contribute to organizational success.
Data Analyst
Data Analysts help businesses discover patterns in data to make better decisions.  With its focus on analyzing large datasets using MapReduce, Pig, and Hive, this course can serve as a helpful stepping stone for a career in this field.
Big Data Architect
Big Data Architects design and oversee the deployment of data-related systems. The knowledge of HDFS, Hadoop architecture, and YARN taught in this course will be key to understanding this field and designing effective architecture for Big Data environments.
Data Engineer
Data Engineers create and maintain data pipelines to ensure the integrity and quality of data in an organization. This course's coverage of HDFS, MapReduce, and YARN will provide a helpful foundation for a successful career in this role.
Software Engineer
Software Engineers can work in a variety of domains, including Big Data. This course will help those with an interest in the latter build foundational knowledge of Hadoop ecosystem tools like HDFS, MapReduce, Pig, and Hive.
Cloud Engineer
Cloud Engineers manage the deployment and operation of cloud-based systems, and part of this role may include working with Big Data technologies like Hadoop. This course's coverage of Amazon Web Services will be particularly helpful for those who plan to specialize in cloud-based Big Data solutions.
Data Scientist
Data Scientists use scientific methods to extract knowledge from data. While this course does not cover all topics necessary for a career in this field, the focus on analyzing large datasets using MapReduce, Pig, and Hive is a useful starting point.
Database Administrator
Database Administrators ensure the efficient operation of database systems. This course will provide an introduction to HDFS, a distributed file system used to store Big Data. While not all Database Administrators work with Big Data, this knowledge will be useful for those who do.
Business Analyst
Business Analysts help businesses make informed decisions by analyzing data. This course covers how to analyze large datasets using MapReduce, Pig, and Hive, which will be in-demand skills for Business Analysts working in data-intensive industries.
Statistician
Statisticians collect and analyze data to provide insights and support decision-making. This course's focus on analyzing large datasets using MapReduce, Pig, and Hive is highly relevant to Statisticians who want to work with Big Data.
Operations Research Analyst
Operations Research Analysts use mathematical models to optimize operations and decision-making. This course will help build a foundation for understanding and working with Big Data, which is increasingly being used for operations research.
Quantitative Analyst
Quantitative Analysts develop and use mathematical models to analyze financial data. This course will provide an introduction to HDFS, a distributed file system used to store Big Data, which will be useful for those who want to work with Big Data in the financial industry.
Machine Learning Engineer
Machine Learning Engineers develop and deploy machine learning models to solve business problems. While this course does not cover all topics necessary for this role, the focus on analyzing large datasets using MapReduce, Pig, and Hive will be helpful for those who want to specialize in Big Data machine learning.
Data Warehouse Developer
Data Warehouse Developers design and build data warehouses to store and manage large amounts of data. This course's focus on HDFS, MapReduce, Pig, and Hive will be useful for those who want to specialize in Big Data data warehousing.

Reading list

We've selected 11 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Hadoop Developer In Real World.
Provides comprehensive coverage of Hadoop, including its architecture, components, and use cases. It valuable reference for those looking to gain a deeper understanding of Hadoop.
Provides a hands-on approach to learning Hadoop, with a focus on practical examples and use cases. It good choice for those looking to get started with Hadoop.
Provides a collection of design patterns for MapReduce programming. It valuable resource for those looking to write efficient and scalable MapReduce programs.
Covers both Hadoop and Spark, providing a comprehensive overview of big data analytics. It good choice for those looking to learn about both technologies.
Provides a comprehensive guide to using Hadoop in enterprise environments. It covers topics such as security, governance, and integration with other systems.
Covers Pig, a high-level data processing language for Hadoop. It valuable resource for those looking to use Pig for data analysis and transformation tasks.
Covers Hive, a data warehouse system for Hadoop. It valuable resource for those looking to use Hive for data analysis and reporting tasks.
Provides a practical guide to operating and managing Hadoop clusters. It valuable resource for those responsible for managing Hadoop infrastructure.
Provides a clear and concise introduction to Hadoop, suitable for beginners. It good starting point for those who are new to Hadoop.
Provides a concise overview of Hadoop, suitable for beginners. It good starting point for those who are new to Hadoop.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Hadoop Developer In Real World.
Learning Apache Hadoop EcoSystem- Hive
Most relevant
The Building Blocks of Hadoop - HDFS, MapReduce, and YARN
Most relevant
Introduction to Big Data with Spark and Hadoop
Most relevant
Hadoop Quick Start
Most relevant
Master Big Data - Apache...
Most relevant
Hadoop for .NET Developers
Most relevant
Data Engineering using Kafka and Spark Structured...
Most relevant
Data Transformations with Apache Pig
Most relevant
Big Data Essentials
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser