We may earn an affiliate commission when you visit our partners.
Course image
Nick Falkner, Gary Glonek, Lingqiao Liu, Gavin Meredith, and Ian Knight

The Big Data Capstone Project will allow you to apply the techniques and theory you have gained from the four courses in this Big Data MicroMasters program to a medium-scale data science project.

Read more

The Big Data Capstone Project will allow you to apply the techniques and theory you have gained from the four courses in this Big Data MicroMasters program to a medium-scale data science project.

Working with organisations and stakeholders of your choice on a real-world dataset, you will further develop your data science skills and knowledge.

This project will give you the opportunity to deepen your learning by giving you valuable experience in evaluating, selecting and applying relevant data science techniques, principles and theory to a data science problem.

This project will see you plan and execute a reasonably substantial project and demonstrate autonomy, initiative and accountability.

You’ll deepen your learning of social and ethical concerns in relation to data science, including an analysis of ethical concerns and ethical frameworks in relation to data selection and data management.

By communicating the knowledge, skills and ideas you have gained to other learners through online collaborative technologies, you will learn valuable communication skills, important for any career. You’ll also deliver a written presentation of your project design, plan, methodologies, and outcomes.

Three deals to help you save

What's inside

Learning objectives

  • How to evaluate, select and apply data science techniques, principles and theory;
  • How to plan and execute a project;
  • Work autonomously using your own initiative;
  • Identify social and ethical concerns around your project;
  • Develop communication skills using online collaborative technologies.
  • The big data capstone project will give you the chance to demonstrate practically what you have learned in the big data micromasters program including:

Syllabus

Dataset overview, data selection and ethicsUnderstand ethical issues and concerns around big data projects;Describe how ethical issues apply to the sample dataset;Describe up to three ethical approaches;Apply ethical analysis to scenarios.
Read more
Exam (timed, proctored)The exam will cover content from the first four courses in the Big Data MicroMasters program, including the Ethics section of this capstone course, DataCapX. Itwill include questions on topics such as code structure and testing, variable types, graphs, big data algorithms, regression and ethics.
Project Task 1: Data cleaning and RegressionUnderstand the basic data cleaning and preprocessing steps required in the analysis of a real data set;Create computer code to read data and perform data cleaning and preprocessing;Judge the appropriateness of a fitted regression model to the data;Determine whether simplification of a regression model is appropriate;Apply a fitted regression model to obtain predictions for new observations.
Project Task 2: ClassificationBuild classifiers to predict the output of a desired factor;Analyse learned classifiers;Design a feature selection scheme;Design a scheme for evaluating the performance of classifiers.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Offers foundational grounding in essential big data science concepts, principles, and theory
Provides opportunities to deepen learning through a practical, medium-scale data science project
Taught by experienced instructors in the field of big data science
Focuses on real-world data and projects, enhancing the relevance of skills acquired
Builds essential communication skills for collaborating in data science teams
Project-centric approach provides hands-on experience and portfolio-building opportunities

Save this course

Save Big Data Capstone Project to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data Capstone Project with these activities:
Read 'Data Science for Business' by Provost and Fawcett
Provides a foundational understanding of data science concepts and their application in business contexts.
Show steps
  • Read the book and take notes on key concepts.
  • Summarize the main ideas of each chapter.
  • Apply the concepts to a real-world business scenario.
Complete the tutorial on Data Cleaning and Preprocessing Techniques
Solidify understanding of data cleaning and preprocessing techniques, crucial for real-world data science projects.
Browse courses on Data Cleaning
Show steps
  • Find the tutorial on data cleaning and preprocessing techniques.
  • Follow the tutorial steps to clean and preprocess a sample dataset.
  • Apply the techniques to a different dataset of your choice.
Develop a Data Science Project Proposal
Provides an opportunity to plan and outline a data science project, showcasing project management and communication skills.
Show steps
  • Define the project goals and objectives.
  • Identify the target audience and stakeholders.
  • Develop a methodology for data collection and analysis.
  • Create a timeline and budget for the project.
  • Write a clear and concise project proposal.
Five other activities
Expand to see all activities and additional details
Show all eight activities
Build a Linear Regression model using scikit-learn
Practice building regression models to improve understanding of regression techniques and scikit-learn library.
Browse courses on Regression
Show steps
  • Import the necessary libraries and load the dataset.
  • Split the data into training and testing sets.
  • Create a linear regression model.
  • Fit the model to the training data.
  • Evaluate the model on the test data and report the results.
Develop a Machine Learning Model for Predicting Customer Churn
Provides a hands-on experience in applying data science techniques to a real-world business problem.
Browse courses on Machine Learning
Show steps
  • Collect and clean the customer data.
  • Explore the data and identify potential features for prediction.
  • Develop and train a machine learning model.
  • Evaluate the model's performance.
  • Deploy the model to make predictions on new data.
Apply Decision Tree and Random Forest algorithms for Classification
Improve understanding of classification algorithms and their implementation using scikit-learn.
Browse courses on Classification
Show steps
  • Import the necessary libraries and load the dataset.
  • Split the data into training and testing sets.
  • Create decision tree and random forest models.
  • Fit the models to the training data.
  • Evaluate the models on the test data and report the results.
Write a Blog Post Summarizing Key Concepts from the Course
Reinforces understanding by requiring students to articulate concepts in their own words and share their knowledge with others.
Show steps
  • Review the course materials and identify key concepts.
  • Organize the concepts into a logical flow.
  • Write a clear and engaging blog post that explains the concepts.
  • Share the blog post with others and invite feedback.
Contribute to an Open-Source Data Science Project
Provides practical experience in contributing to real-world data science projects and fosters collaboration within the community.
Browse courses on Open Source
Show steps
  • Identify an open-source data science project to contribute to.
  • Review the project documentation and codebase.
  • Implement a new feature or fix a bug.
  • Submit a pull request to the project.
  • Collaborate with other contributors to refine and merge your changes.

Career center

Learners who complete Big Data Capstone Project will develop knowledge and skills that may be useful to these careers:
Data Scientist
A Data Scientist combines their understanding of math, statistics, and computer science to analyze large amounts of data, draw conclusions, and make predictions. The Big Data Capstone Project will give you the opportunity to apply the knowledge and skills you have learned in the Big Data MicroMasters program to a real-world dataset, which will help you build a strong foundation for a career as a Data Scientist.
Machine Learning Engineer
Machine Learning Engineers design, build, and maintain machine learning models. They use their understanding of data, algorithms, and programming to develop models that can learn from data and make predictions. The Big Data Capstone Project will help you develop the skills you need to become a Machine Learning Engineer, including the ability to evaluate, select, and apply data science techniques, principios, and theories.
Data Analyst
Data Analysts collect, clean, and analyze data to help businesses make informed decisions. They use their skills in data analysis, statistics, and programming to identify trends and patterns in data, and to develop insights that can help businesses improve their operations and performance. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Data Analyst, including the ability to work with large datasets, apply data analysis techniques, and communicate your findings effectively.
Business Analyst
Business Analysts use their understanding of business and data to help organizations improve their performance. They work with stakeholders to identify business problems, collect and analyze data, and develop recommendations for improvement. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Business Analyst, including the ability to work with stakeholders, analyze data, and communicate your findings effectively.
Statistician
Statisticians collect, analyze, and interpret data to help organizations make informed decisions. They use their skills in mathematics, statistics, and programming to design and conduct studies, analyze data, and draw conclusions. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Statistician, including the ability to work with large datasets, apply statistical techniques, and communicate your findings effectively.
Software Engineer
Software Engineers design, develop, and maintain software applications. They use their skills in programming, algorithms, and data structures to create software that meets the needs of users. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Software Engineer, including the ability to work with large datasets, apply programming techniques, and communicate your findings effectively.
Quantitative Analyst
Quantitative Analysts use their skills in mathematics, statistics, and programming to develop and apply mathematical models to financial data. They use these models to make predictions about the future performance of financial markets and to make investment decisions. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Quantitative Analyst, including the ability to work with large datasets, apply statistical techniques, and communicate your findings effectively.
Operations Research Analyst
Operations Research Analysts use their skills in mathematics, statistics, and programming to develop and apply mathematical models to operational problems. They use these models to optimize the performance of operations and to make decisions about the allocation of resources. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become an Operations Research Analyst, including the ability to work with large datasets, apply statistical techniques, and communicate your findings effectively.
Market Research Analyst
Market Research Analysts collect, analyze, and interpret data to help businesses understand their customers and make informed decisions. They use their skills in research methods, statistics, and data analysis to design and conduct studies, analyze data, and draw conclusions. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Market Research Analyst, including the ability to work with large datasets, apply statistical techniques, and communicate your findings effectively.
Data Architect
Data Architects design and build the infrastructure that stores and manages data. They work with stakeholders to understand the data needs of the organization and to design and implement data solutions that meet those needs. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Data Architect, including the ability to work with large datasets, apply data engineering techniques, and communicate your findings effectively.
Database Administrator
Database Administrators maintain and optimize databases. They work with stakeholders to understand the data needs of the organization and to design and implement database solutions that meet those needs. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Database Administrator, including the ability to work with large datasets, apply database administration techniques, and communicate your findings effectively.
Systems Analyst
Systems Analysts design and implement computer systems. They work with stakeholders to understand the business needs of the organization and to design and implement systems that meet those needs. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Systems Analyst, including the ability to work with large datasets, apply systems analysis techniques, and communicate your findings effectively.
IT Project Manager
IT Project Managers plan, execute, and close IT projects. They work with stakeholders to define the scope of the project, develop the project plan, and manage the project budget and timeline. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become an IT Project Manager, including the ability to work with large datasets, apply project management techniques, and communicate your findings effectively.
Data Engineer
Data Engineers design, build, and maintain the infrastructure that stores and manages data. They work with stakeholders to understand the data needs of the organization and to design and implement data solutions that meet those needs. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Data Engineer, including the ability to work with large datasets, apply data engineering techniques, and communicate your findings effectively.
Business Intelligence Analyst
Business Intelligence Analysts use their skills in data analysis, statistics, and programming to develop and implement business intelligence solutions. They use these solutions to help businesses make informed decisions and improve their performance. The Big Data Capstone Project will give you the opportunity to develop the skills you need to become a Business Intelligence Analyst, including the ability to work with large datasets, apply data analysis techniques, and communicate your findings effectively.

Reading list

We've selected 28 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Big Data Capstone Project.
Is an in-depth overview of the big data landscape, exploring the concepts, technologies, and tools involved in managing and analyzing large datasets. It provides a solid foundation for understanding the challenges and opportunities associated with big data.
Presents statistical learning methods, including supervised and unsupervised learning, model selection, and variable selection. Provides a solid understanding of the fundamental principles of machine learning.
Offers a comprehensive introduction to machine learning algorithms and techniques, using Python and popular machine learning libraries. Provides a practical foundation for understanding and applying data science concepts.
Explores ethical considerations in data science, coinciding with the capstone project's emphasis on ethics.
Applies machine learning techniques to financial data. Provides insights into the use of machine learning for financial risk management, trading, and forecasting.
Covers supervised and unsupervised machine learning algorithms, including regression, classification, and clustering. Provides a practical guide for building and deploying predictive models.
Covers the theory and practice of deep learning, including neural networks, convolutional neural networks, and recurrent neural networks. Provides a comprehensive overview of the latest advances in deep learning.
Introduces data science and machine learning concepts, using Python. Provides hands-on experience with data manipulation, visualization, and model building.
Provides a comprehensive introduction to machine learning algorithms and techniques, complementing the course's focus on data science.
Covers various aspects of machine learning using Python. Provides comprehensive coverage of supervised and unsupervised learning algorithms.
Introduces data mining concepts and techniques using R and the Rattle GUI. Provides a hands-on guide for exploring, analyzing, and visualizing data.
Introduces causal inference methods, enhancing the course's focus on data analysis and interpretation.
Presents real-world examples of successful big data applications, providing additional context for the capstone project.
Provides a Bayesian perspective on statistical modeling, adding depth to the course's coverage of data analysis.
This comprehensive book offers a deep dive into Hadoop, the leading framework for big data processing. It covers topics such as Hadoop architecture, data storage, and query processing, providing a valuable reference for anyone working with Hadoop.
Provides a gentle introduction to machine learning for beginners. Offers a clear and accessible explanation of machine learning concepts and techniques.
Provides a comprehensive overview of Spark, a popular framework for big data processing. It covers topics such as Spark architecture, data processing, and machine learning, offering a valuable reference for anyone working with Spark.
Focuses on using R and Hadoop for big data analysis. It covers topics such as data import and cleaning, data visualization, and machine learning, offering practical guidance for anyone working with big data in R.
Provides advanced techniques for data analytics using Spark. It covers topics such as data engineering, machine learning, and deep learning, offering valuable insights for anyone looking to enhance their skills in big data analytics.
Provides a broad overview of big data analytics. It covers topics such as data sources, data processing, and visualization, offering a foundational understanding of the big data landscape.
Offers a comprehensive overview of data mining techniques. It covers topics such as data preparation, clustering, and classification, providing valuable insights for anyone looking to extract knowledge from data.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser