We may earn an affiliate commission when you visit our partners.
Course image
Ivan Puzyrevskiy, Alexey A. Dral, Emeli Dral, Evgeniy Ryabenko, Evgeniy Riabenko, and Pavel Mezentsev
Have you ever heard about such technologies as HDFS, MapReduce, Spark? Always wanted to learn these new tools but missed concise starting material? Don’t miss this course either! In this 6-week course you will: - learn some basic technologies of the...
Read more
Have you ever heard about such technologies as HDFS, MapReduce, Spark? Always wanted to learn these new tools but missed concise starting material? Don’t miss this course either! In this 6-week course you will: - learn some basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce and Spark; - be guided both through systems internals and their applications; - learn about distributed file systems, why they exist and what function they serve; - grasp the MapReduce framework, a workhorse for many modern Big Data applications; - apply the framework to process texts and solve sample business cases; - learn about Spark, the next-generation computational framework; - build a strong understanding of Spark basic concepts; - develop skills to apply these tools to creating solutions in finance, social networks, telecommunications and many other fields. Your learning experience will be as close to real life as possible with the chance to evaluate your practical assignments on a real cluster. No mocking, a friendly considerate atmosphere to make the process of your learning smooth and enjoyable. Get ready to work with real datasets alongside with real masters! Special thanks to: - Prof. Mikhail Roytberg, APT dept., MIPT, who was the initial reviewer of the project, the supervisor and mentor of half of the BigData team. He was the one, who helped to get this show on the road. - Oleg Sukhoroslov (PhD, Senior Researcher at IITP RAS), who has been teaching MapReduce, Hadoop and friends since 2008. Now he is leading the infrastructure team. - Oleg Ivchenko (PhD student APT dept., MIPT), Pavel Akhtyamov (MSc. student at APT dept., MIPT) and Vladimir Kuznetsov (Assistant at P.G. Demidov Yaroslavl State University), superbrains who have developed and now maintain the infrastructure used for practical assignments in this course. - Asya Roitberg, Eugene Baulin, Marina Sudarikova. These people never sleep to babysit this course day and night, to make your learning experience productive, smooth and exciting.
Enroll now

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Introduces HDFS, MapReduce, and Spark, which are industry standard in Big Data
Applies MapReduce framework to process texts and solve sample business cases
Provides a strong foundation for understanding Spark basic concepts
Taught by instructors who are recognized for their work in the Big Data field
Uses real datasets and a real cluster for practical assignments
May require students to come in with some basic understanding of programming and data analysis

Save this course

Save Big Data Essentials: HDFS, MapReduce and Spark RDD to your list so you can find it easily later:
Save

Reviews summary

Big data fundamentals with hdfs, mapreduce and spark rdd

This course offers a comprehensive introduction to Big Data technologies, including HDFS, MapReduce, and Spark RDD. Students learn the fundamentals of big data architecture and apply them to real-world problems. While some reviewers mention challenges with understanding the instructors' accents and the autograder system, the course is generally well-received for its informative content and practical assignments.
Hands-on assignments
"You can spend many hours just "fixing" your code despite having the good result"
"While the theory is great and the course is fully loaded with the information, the Spark grading is very buggy and unpredictable."
In-depth coverage of concepts
"Course Content is good, but some times the grading tool is getting unresponsive."
Course content needs updating
"Really great course, but needs an update!"
Challenges with autograder
"The Grader was very bad and not easy to debug"
"Bugged grader and complete lack of support from admins of the course"
"Assignments level was good but jupyter sandbox gives problem while downloading notebook and also the result checker gives some issu, produces different output many times , different from what we get in notebook"
Difficulty understanding instructors
"Very difficult to understand the instructors accents"
"The pronunciation of some of the teachers was so bad I had to switch to transcripts."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data Essentials: HDFS, MapReduce and Spark RDD with these activities:
Reach out to experienced Spark users
Connect with experienced Spark users who can provide guidance and support.
Browse courses on Mentorship
Show steps
  • Identify experienced Spark users in your network or online
  • Reach out and schedule a time to connect
Complete the Spark Quick Start tutorial
Get started with Spark quickly by following a hands-on tutorial.
Browse courses on Spark
Show steps
  • Follow the steps in the Spark Quick Start tutorial
Practice HDFS commands
Practice using HDFS commands to reinforce your understanding of how distributed file systems work.
Browse courses on HDFS
Show steps
  • Work through examples in the Apache HDFS tutorial
  • Create a sample dataset and practice HDFS commands on it
Four other activities
Expand to see all activities and additional details
Show all seven activities
Participate in a Spark discussion forum
Connect with other Spark users and learn from their experiences.
Browse courses on Spark
Show steps
  • Find a Spark discussion forum
  • Join the forum and participate in discussions
Solve Spark coding problems
Test your Spark coding skills by solving problems.
Browse courses on Spark
Show steps
  • Register for a coding challenge platform
  • Filter and solve Spark-related coding problems
Write a blog post about MapReduce
Demonstrate your understanding of MapReduce by writing a blog post that explains its concepts and applications.
Browse courses on MapReduce
Show steps
  • Research MapReduce and its key concepts
  • Identify an application of MapReduce in your field of interest
  • Write a detailed blog post that explains the application using MapReduce
Develop a small-scale Spark application
Apply your Spark knowledge to a practical project.
Browse courses on Spark
Show steps
  • Identify a small-scale data processing task
  • Design and implement a Spark application to solve the task

Career center

Learners who complete Big Data Essentials: HDFS, MapReduce and Spark RDD will develop knowledge and skills that may be useful to these careers:
Big Data Engineer
Big Data Engineers will directly use the tools and systems taught in this course on a daily basis. For example, they may use Spark to analyze large datasets. By taking this course, one may gain the fundamental knowledge to pursue this role.
Data Scientist
Data Scientists directly use the tools and systems taught in this course on a regular basis. For example, they may use Spark to analyze large datasets. Gaining exposure to these fundamental tools before working as a Data Scientist may help someone succeed in this role.
Data Analyst
Data Analysts often use the tools taught in this course on a regular basis. Some Data Analysts use Spark to analyze large datasets. This course introduces the basics of Spark, and may help Data Analysts succeed.
Data Architect
Data Architects are likely to use the tools taught in this course on a regular basis. This course introduces basic technologies of the modern Big Data landscape, including HDFS, MapReduce, and Spark, all of which are commonly used by Data Architects.
Software Architect
Software Architects may find that this course is helpful for understanding the most widely-used frameworks and systems, like Hadoop, MapReduce, and Spark. These are crucial systems for working as a Software Architect.
Software Engineer
Software Engineers who work with Big Data will likely use the skills learned in this course. This course gives an introduction to handling the large datasets that are essential for Big Data. It will also teach how to use systems like HDFS and Hadoop to process and extract value from these large datasets.
Cloud Architect
Cloud Architects may find that this course is useful for understanding the most widely-used frameworks and systems, like Hadoop, MapReduce, and Spark. These are essential foundations for working as a Cloud Architect.
Data Engineer
Data Engineers will likely rely on some of the basics taught in this course, like HDFS, which is a distributed file system founded on Hadoop. Both HDFS and Hadoop are some of the most fundamental technologies used in Big Data.
Business Analyst
Business Analysts play a key role in many companies and some may find that it is useful to have an understanding of Big Data technologies. By taking this course, Business Analysts can learn about some of the most widely-used frameworks and systems, like Hadoop, MapReduce, and Spark.
Systems Administrator
Some Systems Administrators are responsible for handling Big Data and may find that this course is helpful for understanding the basics of HDFS, MapReduce, and Spark. Being familiar with these systems can help someone succeed in the role of a Systems Administrator.
Product Manager
Product Managers may find that this course introduces them to some of the most widely-used frameworks and systems, like Hadoop, MapReduce, and Spark, which are often used for product development and design.
Machine Learning Engineer
Machine Learning Engineers may find that this course is helpful for learning how to use Spark to analyze large datasets. Spark is one of the most common tools used to analyze big data, and therefore may be useful to Machine Learning Engineers.
Quantitative Analyst
Quantitative Analysts may find that this course is helpful for understanding the basics of HDFS, which is a distributed file system often employed by Quantitative Analysts.
Database Administrator
Database Administrators may find that this course is helpful for understanding the basics of HDFS, which is a distributed file system often employed by Database Administrators.
Statistician
Statisticians may find that this course is useful for understanding the basics of HDFS, which is a distributed file system. Some Statisticians may need to use HDFS in their daily work.

Reading list

We've selected eight books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Big Data Essentials: HDFS, MapReduce and Spark RDD.
Provides a comprehensive overview of data science and big data analytics, including how to use Hadoop and Spark for data processing. It valuable resource for anyone who wants to learn more about data science and big data analytics and how to use Hadoop and Spark effectively.
Provides a practical introduction to big data with Spark and Scala, including how to use Spark for data processing. It valuable resource for anyone who wants to learn more about big data with Spark and Scala and how to use it effectively.
Provides a comprehensive overview of big data analytics, including how to use Hadoop and Spark for data processing. It valuable resource for anyone who wants to learn more about big data analytics and how to use Hadoop and Spark effectively.
Provides a comprehensive overview of big data analytics with Java, including how to use Hadoop and Spark for data processing. It valuable resource for anyone who wants to learn more about big data analytics with Java and how to use Hadoop and Spark effectively.
Provides a comprehensive overview of Spark, including its architecture, components, and how to use it for data processing. It valuable resource for anyone who wants to learn more about Spark and how to use it effectively.
Provides a practical introduction to Spark, including how to use it for data processing, machine learning, and graph analysis. It valuable resource for anyone who wants to learn more about Spark and how to use it effectively.
Provides a comprehensive overview of Hadoop, including its architecture, components, and how to use it for data processing. It valuable resource for anyone who wants to learn more about Hadoop and how to use it effectively.
Provides a detailed introduction to MapReduce, including its programming model, how to use it to process data, and how to optimize MapReduce programs. It valuable resource for anyone who wants to learn more about MapReduce and how to use it effectively.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Big Data Essentials: HDFS, MapReduce and Spark RDD.
Big Data Analytics Using Spark
Most relevant
Introduction to Big Data with Spark and Hadoop
Most relevant
The Building Blocks of Hadoop - HDFS, MapReduce, and YARN
Most relevant
Hadoop Developer In Real World
Most relevant
Spark and Python for Big Data with PySpark
Most relevant
Apache Spark Fundamentals
Apache Spark with Scala - Hands On with Big Data!
Apache Spark 2.0 with Java -Learn Spark from a Big Data...
Data Science in the Games Industry
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser