We may earn an affiliate commission when you visit our partners.
Course image
Ke YI

Big data systems such as Hadoop and Spark emerge as enabling technologies in managing massive amounts of data across hundreds or even thousands of computing nodes. Meanwhile, cloud computing platforms have made these technologies easily accessible to individuals as well as large enterprises. This course is an online adaptation of the signature course MSBD 5003 Big Data Computing offered to our popular MSc Program in Big Data Technology. In addition to 20+ hours of lecture videos, the course contains 100+ multiple-choice questions and 20 coding questions, aimed at equipping learners with both the theory and practical skills of big data systems, using Spark as the exemplary platform.

What's inside

Learning objectives

  • Spark programming using both rdd and dataframe apis
  • Useful packages including ml, graphx/graphframes, and sparkstreaming
  • Spark internals and performance optimizations
  • Algorithm design for big data systems

Syllabus

Week 1: Overview, MapReduce, and Hadoop
Week 2-3: Spark Basics and RDD
Week 4: SparkSQL and MLib
Week 5: Spark internals
Read more
Week 6: Algorithm design for big data
Week 7: GraphX/GraphFrames
Week 8: Spark Streaming

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Covers the latest big data technologies, such as Spark and Hadoop, which are vital in today's data-driven world
Emphasizes hands-on practice with Spark programming using both RDD and DataFrame APIs, making it highly practical
Provides a strong foundation for understanding Spark internals and performance optimizations, equipping learners with technical expertise
Covers advanced topics like GraphX/GraphFrames, Spark Streaming, and algorithm design for big data, providing a comprehensive understanding
Taught by experienced instructors from the University of Auckland's MSc Program in Big Data Technology, ensuring high-quality content and instruction
Requires extensive background knowledge in programming and data analysis, making it more suitable for intermediate to advanced learners

Save this course

Save Big Data Computing with Spark to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data Computing with Spark with these activities:
Big Data Resources Compilation
Gather and organize useful resources, such as tutorials, articles, and code examples, on big data technologies.
Show steps
  • Search for resources related to Spark, Hadoop, and other big data tools
  • Categorize and annotate the resources for easy reference
  • Share the compilation with other students or the community
Review Hadoop Concepts
Refresh your understanding of Hadoop concepts, such as MapReduce, HDFS, and YARN, to strengthen your foundation for Spark.
Browse courses on Hadoop
Show steps
  • Revisit online tutorials or documentation on Hadoop
  • Practice setting up and configuring a Hadoop cluster
  • Review examples of MapReduce jobs and HDFS file operations
Spark RDD Practice
Practice working with Spark RDDs to solidify your understanding of data manipulation and transformation.
Browse courses on spark rdd
Show steps
  • Set up a Spark environment and create an RDD
  • Apply transformations to the RDD, such as filtering, mapping, and grouping
  • Persist the RDD for faster access
Four other activities
Expand to see all activities and additional details
Show all seven activities
Spark SQL Tutorial
Enhance your data analysis skills by following a guided tutorial on Spark SQL.
Browse courses on Spark SQL
Show steps
  • Install Spark SQL and set up a DataFrame
  • Use SQL queries to manipulate and analyze data
  • Optimize queries for better performance
Attend Spark Workshop
Attend a workshop on Spark to learn best practices, advanced techniques, and industry use cases.
Show steps
  • Research and register for a relevant Spark workshop
  • Attend the workshop and actively participate in discussions
  • Network with industry experts and fellow participants
Mentor Junior Students
Help reinforce your own understanding by mentoring junior students in big data concepts and Spark programming.
Show steps
  • Offer to assist students with coursework or projects
  • Provide guidance and support on Spark RDDs, DataFrame APIs, and other Spark concepts
  • Review and provide feedback on student code and projects
Contribute to Spark Open Source Projects
Engage with the Spark community by contributing to open source projects, reporting bugs, or providing documentation.
Browse courses on Open Source
Show steps
  • Identify a project or issue that aligns with your interests and skills
  • Follow the project's guidelines and code conventions
  • Submit your contributions for review and merge

Career center

Learners who complete Big Data Computing with Spark will develop knowledge and skills that may be useful to these careers:
Data Architect
Data Architects design and manage data systems and infrastructure. With topics such as Spark internals and performance optimizations, Algorithm design for big data systems, and useful packages including ML, GraphX/GraphFrames, and SparkStreaming, this course, _Big Data Computing with Spark_, can help lead to success in this role.
Software Engineer
Software Engineers design, develop, and maintain software systems. With topics such as Spark internals and performance optimizations, Algorithm design for big data systems, and useful packages including ML, GraphX/GraphFrames, and SparkStreaming, this course, _Big Data Computing with Spark_, can help lead to success in this role.
Machine Learning Engineer
Machine Learning Engineers design, develop, and implement machine learning models. With topics such as SparkSQL and MLib including Spark programming using both RDD and DataFrame APIs, this course, _Big Data Computing with Spark_, may be useful in this role.
Software Developer
Software Developers design, develop, and test software applications. With topics such as Spark internals and performance optimizations, Algorithm design for big data systems, and useful packages including ML, GraphX/GraphFrames, and SparkStreaming, this course, _Big Data Computing with Spark_, may be useful in this role.
Statistician
Statisticians collect, analyze, and interpret data. With topics such as SparkSQL and MLib including Spark programming using both RDD and DataFrame APIs, this course, _Big Data Computing with Spark_, can help lead to success in this role.
Actuary
Actuaries analyze and manage financial risks. With topics such as SparkSQL and MLib including Spark programming using both RDD and DataFrame APIs, this course, _Big Data Computing with Spark_, can help lead to success in this role.
Database Administrator
Database Administrators manage and maintain databases. With topics such as SparkSQL and MLib including Spark programming using both RDD and DataFrame APIs, this course, _Big Data Computing with Spark_, can help lead to success in this role.
Quantitative Analyst
Quantitative Analysts perform mathematical and statistical modeling for financial and risk analysis. With topics such as SparkSQL and MLib including Spark programming using both RDD and DataFrame APIs, this course, _Big Data Computing with Spark_, can help lead to success in this role.
Big Data Engineer
Big Data Engineers design, implement, and manage big data systems and infrastructure. With topics such as Spark internals and performance optimizations, Algorithm design for big data systems, and useful packages including ML, GraphX/GraphFrames, and SparkStreaming, this course, _Big Data Computing with Spark_, may be useful in this role.
Data Analyst
Data Analysts clean, prepare, and analyze data to provide actionable insights. With topics such as SparkSQL and MLib, this course, _Big Data Computing with Spark_, may be useful in this role.
Cloud Architect
Cloud Architects design and manage cloud computing systems and infrastructure. With topics such as Spark internals and performance optimizations, Algorithm design for big data systems, and useful packages including ML, GraphX/GraphFrames, and SparkStreaming, this course, _Big Data Computing with Spark_, may be useful in this role.
Systems Analyst
Systems Analysts analyze and design computer systems and networks. With topics such as Spark internals and performance optimizations, Algorithm design for big data systems, and useful packages including ML, GraphX/GraphFrames, and SparkStreaming, this course, _Big Data Computing with Spark_, may be useful in this role.
Operations Research Analyst
Operations Research Analysts apply analytical methods to solve business problems. With topics such as Spark internals and performance optimizations, Algorithm design for big data systems, and useful packages including ML, GraphX/GraphFrames, and SparkStreaming, this course, _Big Data Computing with Spark_, may be useful in this role.
Business Analyst
Business Analysts analyze business processes and systems to improve efficiency and effectiveness. With topics such as Spark internals and performance optimizations, Algorithm design for big data systems, and useful packages including ML, GraphX/GraphFrames, and SparkStreaming, this course, _Big Data Computing with Spark_, may be useful in this role.
Data Scientist
Data Scientists study trends and patterns of data to make effective decisions. This course, _Big Data Computing with Spark_, may be useful in this role since it includes practical skills including Spark programming using both RDD and DataFrame APIs, which are useful packages for Data Scientists.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Big Data Computing with Spark.
Comprehensive guide to Spark, covering everything from the basics to advanced topics. It valuable resource for anyone who wants to learn more about Spark and how to use it effectively.
Provides a comprehensive overview of Spark, covering both the fundamentals and advanced topics. It valuable resource for anyone who wants to learn more about Spark and how to use it effectively.
Is more advanced and covers topics such as Spark SQL, Spark Streaming, and Spark MLlib in greater depth.
Provides best practices for scaling and optimizing Spark applications. It valuable resource for anyone who wants to learn how to improve the performance of their Spark applications.
Save
Provides a comprehensive guide to Spark, covering both the fundamentals and advanced topics. It valuable resource for anyone who wants to learn more about Spark and how to use it effectively.
Comprehensive reference guide to Apache Spark. It covers a wide range of topics, including Spark's architecture, programming models, and various components.
Provides a comprehensive reference for Spark. It covers a wide range of topics, from basic to advanced, and valuable resource for anyone who wants to learn more about Spark.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Big Data Computing with Spark.
Apache Spark with Scala - Hands On with Big Data!
Most relevant
Cloud Computing Applications, Part 2: Big Data and...
Most relevant
Big Data Essentials
Most relevant
Apache Spark 2.0 with Java -Learn Spark from a Big Data...
Most relevant
Cloud Computing Applications, Part 1: Cloud Systems and...
Most relevant
Applying the Lambda Architecture with Spark, Kafka, and...
Most relevant
Introduction to Big Data with Spark and Hadoop
Most relevant
Distributed Computing with Spark SQL
Getting Started with Apache Spark on Databricks
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser