Big Data Analysis with Scala and Spark (Scala 2 version) from Coursera

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.

Learning Outcomes. By the end of this course you will be able to:

- read data from persistent storage and load it into Apache Spark,

- manipulate data with Spark and Scala,

- express algorithms for data analysis in a functional style,

- recognize how to avoid shuffles and recomputation in Spark,

Recommended background: You should have at least one year programming experience. Proficiency with Java or C# is ideal, but experience with other languages such as C/C++, Python, Javascript or Ruby is also sufficient. You should have some familiarity using the command line. This course is intended to be taken after Parallel Programming: https://www.coursera.org/learn/parprog1.

Note that this version of the course uses Scala 2.13. You can find a more recent version of the course that uses Scala 3 here: https://www.coursera.org/learn/scala-spark-big-data

What's inside

Syllabus

Getting Started + Spark Basics

Get up and running with Scala on your computer. Complete an example assignment to familiarize yourself with our unique way of submitting assignments. In this week, we'll bridge the gap between data parallelism in the shared memory scenario (learned in the Parallel Programming course, prerequisite) and the distributed scenario. We'll look at important concerns that arise in distributed systems, like latency and failure. We'll go on to cover the basics of Spark, a functionally-oriented framework for big data processing in Scala. We'll end the first week by exercising what we learned about Spark by immediately getting our hands dirty analyzing a real-world data set.

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Develops data analysis skills in Scala, a highly sought-after programming language

Covers Spark, the industry standard for big data processing and analytics

Taught by Professor Heather Miller, a respected researcher and educator in the field

Emphasis on hands-on practice through real-world data analysis assignments

Requires proficiency in Java or C#, which may be a barrier to some learners

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data Analysis with Scala and Spark (Scala 2 version) with these activities:

Organize Course Materials

Show steps

Stay organized by gathering and arranging all relevant course materials in one place.

Show steps

Create folders for lecture notes, assignments, and other resources.
Label and file materials systematically.

Review: Scala Basics

Show steps

Refresh your knowledge of Scala syntax and constructs before starting the course.

Browse courses on Scala

Show steps

Review online resources or tutorials on Scala.

Read: Learning Spark: Lightning-Fast Big Data Analytics

Show steps

Gain a deeper understanding of Spark concepts and best practices from a comprehensive book.

View Learning Spark: Lightning-Fast Big Data Analysis on Amazon

Show steps

Purchase or borrow the book.
Read and make notes on the relevant chapters.

Six other activities

Expand to see all activities and additional details

Show all nine activities

Apache Spark Resources Compilation

Show steps

Enhance your knowledge of Apache Spark by collecting and organizing relevant tools and resources.

Show steps

Search for articles, tutorials, and documentation on Apache Spark.
Bookmark or save useful resources.
Create a document or online repository for your compilation.

Attend Apache Spark Meetup or Conference

Show steps

Connect with professionals and learn about the latest advancements in Apache Spark by attending an industry event.

Show steps

Search for upcoming Apache Spark events in your area.
Register and attend the event.
Network with other attendees and speakers.

Spark Tutorial: Spark SQL, DataFrames, and Datasets

Show steps

Deepen your understanding of Apache Spark's SQL and DataFrames capabilities by following an in-depth tutorial.

Show steps

Visit the tutorial website and follow the instructions.

Spark Exercises: Transformations and Actions

Show steps

Reinforce your understanding of Spark transformations and actions by solving a series of challenging exercises.

Show steps

Download the exercise dataset.
Use Spark to load the dataset into an RDD.
Apply various transformations and actions to the RDD.
Verify the results of your operations.

Study Group: Spark RDDs and Transformations

Show steps

Collaborate with peers to strengthen your understanding of Spark RDDs and transformations through group discussions.

Show steps

Form a study group with other course participants.
Choose a topic related to Spark RDDs and transformations.
Meet regularly to discuss the topic, solve problems, and share insights.

Spark Project: Data Analysis on a Real-World Dataset

Show steps

Apply your Spark skills to a practical project, analyzing a real-world dataset to uncover insights.

Show steps

Identify a suitable dataset.
Load the dataset into Spark.
Explore and analyze the data using Spark operations.
Visualize and present your findings.

Career center

Learners who complete Big Data Analysis with Scala and Spark (Scala 2 version) will develop knowledge and skills that may be useful to these careers:

Data Scientist

Data Scientists develop new algorithms and solve complex problems by using large data sets. As a part of their work, Data Scientists collaborate with colleagues in the fields of engineering, statistics, and business to understand a company´s needs. This course would be useful, as it helps build a foundation in big data analysis, which is a core skill for Data Scientists. Skills learned in this course will assist in evaluating and implementing data sets more efficiently and effectively.

See salaries and explore the career path for Data Scientist

Data Analyst

Data Analysts collect, analyze, and interpret data to help organizations make informed decisions. Similar to Data Scientists, they collaborate with technical teams to translate their findings to actionable items. This course would be useful, as it teaches techniques and methods for data analysis that are in high demand.

See salaries and explore the career path for Data Analyst

Database Administrator

Database Administrators ensure that data storage systems operate smoothly. They work with data analysts, application developers, and end-users to provide access to data when needed. This course may be useful, as it provides a solid foundation in how data is stored and accessed.

See salaries and explore the career path for Database Administrator

Software Engineer

Software Engineers apply engineering principles to the design, development, deployment, and maintenance of software systems. This course may be useful, as it provides a solid foundation in data analysis and manipulation.

See salaries and explore the career path for Software Engineer

Cloud Architect

Cloud Architects design, build, and manage cloud computing systems. They work with clients to understand their business needs and then design and implement solutions that meet those needs. This course may be useful, as it provides a solid foundation in big data analysis and manipulation.

See salaries and explore the career path for Cloud Architect

Data Warehouse Architect

Data Warehouse Architects design and build data warehouses. They work with data analysts and other stakeholders to understand the business needs and then design and implement solutions that meet those needs. This course may be useful, as it provides a solid foundation in big data analysis and manipulation.

See salaries and explore the career path for Data Warehouse Architect

Business Analyst

Business Analysts identify and solve business problems by using data analysis and other techniques. They work with stakeholders to understand the business needs and then develop and implement solutions that meet those needs. This course may be useful, as it provides a solid foundation in data analysis and manipulation.

See salaries and explore the career path for Business Analyst

Data Engineer

Data Engineers build and maintain data pipelines. They work with data analysts and other stakeholders to understand the business needs and then design and implement solutions that meet those needs. This course may be useful, as it provides a solid foundation in big data analysis and manipulation.

See salaries and explore the career path for Data Engineer

Statistician

Statisticians collect, analyze, and interpret data to help organizations make informed decisions. They work with stakeholders to understand the business needs and then develop and implement solutions that meet those needs. This course may be useful, as it provides a solid foundation in data analysis and manipulation.

See salaries and explore the career path for Statistician

Quantitative Analyst

Quantitative Analysts use mathematical and statistical models to analyze financial data. They work with portfolio managers and other stakeholders to develop and implement investment strategies. This course may be useful, as it provides a solid foundation in data analysis and manipulation.

See salaries and explore the career path for Quantitative Analyst

Research Scientist

Research Scientists conduct research in a variety of fields, including science, engineering, and medicine. They develop and test new theories and technologies, and publish their findings in academic journals. This course may be useful, as it provides a solid foundation in data analysis and manipulation.

See salaries and explore the career path for Research Scientist

Machine Learning Engineer

Machine Learning Engineers develop and implement machine learning algorithms. They work with data scientists and other stakeholders to understand the business needs and then design and implement solutions that meet those needs. This course may be useful, as it provides a solid foundation in big data analysis and manipulation.

See salaries and explore the career path for Machine Learning Engineer

Data Architect

Data Architects design and build data architectures. They work with data analysts and other stakeholders to understand the business needs and then design and implement solutions that meet those needs. This course may be useful, as it provides a solid foundation in big data analysis and manipulation.

See salaries and explore the career path for Data Architect

Biostatistician

Biostatisticians apply statistical methods to solve problems in the field of biology. They work with biologists and other scientists to design and conduct studies, and to analyze and interpret data. This course may be useful, as it provides a solid foundation in data analysis and manipulation.

See salaries and explore the career path for Biostatistician

Epidemiologist

Epidemiologists study the distribution and determinants of health-related states or events in specified populations. They work with public health officials and other stakeholders to develop and implement programs to prevent and control disease. This course may be useful, as it provides a solid foundation in data analysis and manipulation.

See salaries and explore the career path for Epidemiologist

Big Data Analysis with Scala and Spark (Scala 2 version)

What's inside

Syllabus

Traffic lights

Save this course

Activities

Career center

Reading list

Share

Similar courses