We may earn an affiliate commission when you visit our partners.
Course image
Bill Howe

In the capstone, students will engage on a real world project requiring them to apply skills from the entire data science pipeline: preparing, organizing, and transforming data, constructing a model, and evaluating results. Through a collaboration with Coursolve, each Capstone project is associated with partner stakeholders who have a vested interest in your results and are eager to deploy them in practice. These projects will not be straightforward and the outcome is not prescribed -- you will need to tolerate ambiguity and negative results! But we believe the experience will be rewarding and will better prepare you for data science projects in practice.

Enroll now

What's inside

Syllabus

Project A: Blight Fight
In this project, you will build a model to predict when a building is likely to be condemned. The data is real, the problem is real, and the impact is real.
Read more
Week 2: Derive a list of buildings
You are given sets of incidents with location information; you need to use some assumptions to group these incidents by location to identify specific buildings.
Week 3: Construct a training dataset
Construct a training set by associating each of your buildings with a ground truth label derived from the permit data.
Week 4: Train and evaluate a simple model
Use a trivial feature set to train and evaluate a simple model
Week 5: Feature Engineering
Derive additional features and retrain to improve the efficacy of your model.
Week 6: Final Report
Enter your final report for grading.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Requires learners to have a strong foundation in data science and its associated mathematical concepts
Emphasizes real-world problem-solving and collaboration with industry professionals
Offers hands-on experience through a project-based approach
Engages learners in a real-world project that requires the application of various data science techniques
Provides learners with exposure to real-world data and industry practices
Suitable for learners aspiring to become data scientists or enhance their data science skills

Save this course

Save Data Science at Scale - Capstone Project to your list so you can find it easily later:
Save

Reviews summary

Guided capstone project

Learners say this capstone project has a great topic and milestone-based schedule. They feel free to take many decisions and get enough guidance to avoid straying away from their project goal. However, both instructor and student commitment are a bit low with little feedback in the forums and shallow submissions sometimes adding little value to the reviewer.
Learners can make their own decisions but still get guidance to avoid straying away from the project goal.
"you are free to take many decisions but still get enough guidance to avoid straying away from the project goal."
There is little feedback in the forums and submissions are sometimes shallow, adding little value to the reviewer.
"there is very little feedback in the forums"
"submissions are sometimes so shallow that add little value to the reviewer."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Data Science at Scale - Capstone Project with these activities:
Course Resource Compilation
Organize and review key resources from the course to enhance retention and accessibility.
Show steps
  • Gather and organize course materials, such as notes, slides, and assignments.
  • Review and summarize the key concepts from each module.
Statistics and Probability Concepts
Reinforce your understanding of statistical and probability concepts, which are essential for data analysis and modeling.
Browse courses on Statistics
Show steps
  • Review basic concepts of statistics, including measures of central tendency, variability, and probability distributions.
  • Apply statistical techniques to analyze data and draw meaningful conclusions.
Classification of Buildings
Practice identifying and classifying buildings based on their characteristics, preparing you for the project.
Browse courses on Buildings
Show steps
  • Gather images or data of various buildings.
  • Study the features of each building, such as height, shape, materials, and architectural style.
  • Classify the buildings into different categories, such as residential, commercial, industrial, or historical.
  • Analyze the differences and similarities between the categories.
Seven other activities
Expand to see all activities and additional details
Show all ten activities
Data Visualization Techniques
Enhance your data visualization skills to effectively communicate insights from the project findings.
Browse courses on Data Visualization
Show steps
  • Explore different data visualization techniques, such as bar charts, scatterplots, and heat maps.
  • Use data visualization tools to create clear and impactful visualizations.
Practice Identifying and Cleaning Data
Reinforce your understanding of data preparation and cleaning techniques by working through practice drills, helping you develop proficiency in these essential tasks.
Browse courses on Data Preparation
Show steps
  • Review the course materials on data preparation and cleaning.
  • Search online for practice exercises or quizzes on data preparation.
  • Work through the exercises, applying the techniques you've learned.
  • Check your answers and identify areas where you need additional practice.
Data Preparation for Building Analysis
Develop your data handling skills by preparing the project dataset, ensuring data quality and accuracy.
Browse courses on Data Preparation
Show steps
  • Review the raw building data and identify inconsistencies, missing values, and errors.
  • Clean the data by correcting errors, removing duplicates, and handling missing values.
  • Transform the data into a suitable format for analysis, such as converting categorical variables into numerical ones.
  • Explore the cleaned data to gain insights and identify patterns.
Feature Engineering for Building Condemnation Prediction
Enhance your understanding of feature engineering techniques by following tutorials on how to derive meaningful features from the building data.
Browse courses on Feature Engineering
Show steps
  • Identify potential features that could influence building condemnation.
  • Explore different feature engineering techniques, such as feature selection, transformation, and creation.
  • Apply feature engineering techniques to the building dataset.
  • Evaluate the effectiveness of the engineered features using statistical or machine learning methods.
Tutorial on Feature Engineering Techniques
Enhance your understanding of feature engineering by following guided tutorials, enabling you to explore various techniques and their impact on model performance.
Browse courses on Feature Engineering
Show steps
  • Identify a publicly available dataset related to the course topic.
  • Follow a tutorial on feature engineering techniques, implementing them on your chosen dataset.
  • Analyze the results and assess the effectiveness of the techniques.
Building Risk Assessment
Undertake a project that involves developing a predictive model to assess the risk of building condemnation, applying the skills you learned throughout the course.
Browse courses on Risk Assessment
Show steps
  • Define the problem statement and data requirements.
  • Collect and prepare the necessary data.
  • Explore the data and identify patterns and trends.
  • Develop and train predictive models.
  • Evaluate the performance of the models and select the best performing model.
Contribute to the Blight Fight Project
Engage with the community by contributing to open-source projects related to blight reduction, reinforcing your understanding of data science and its practical applications.
Browse courses on Open Source
Show steps
  • Identify open-source projects related to blight fight.
  • Review the documentation and codebase of the chosen project.
  • Identify areas where you can contribute, such as data analysis, model development, or documentation improvement.
  • Submit a pull request with your contributions.

Career center

Learners who complete Data Science at Scale - Capstone Project will develop knowledge and skills that may be useful to these careers:
Data Scientist
Data Scientists lead or assist with every step of the data science pipeline, from retrieving raw data, through its cleansing and processing, to the final analysis and presentation of results. They need to be well-versed in each of these subfields, and have a wide range of skills, from statistics to programming. Graduates of this capstone course will have an ideal profile to seek employment as Data Scientists, by having hands-on experience applying their skills to real-world situations, and being comfortable working with all stages of the data science process.
Machine Learning Engineer
Machine Learning Engineers transform Machine Learning models into production-ready systems. They are responsible for assuring the accuracy, efficiency, and reliability of a model, and may be called upon to address problems such as bias and overfitting. This Capstone course will prove valuable to Machine Learning Engineers for its emphasis on real world projects, and for its expectation that students be comfortable with the possibility of unexpected results and ambiguous outcomes.
Statistician
Statisticians survey, collect, analyze, interpret, and present data. They use a wide range of statistical techniques to develop models and make predictions. This Capstone course will help prepare students for a career as a Statistician by building a foundation in the theoretical underpinnings of statistical analysis, on top of which students will practice using real-world data and deploying models.
Data Analyst
Data Analysts transform raw data into actionable insights for stakeholders. They clean and analyze data to identify trends, patterns, and anomalies. They present data in a way that can be easily understood by non-technical stakeholders, such as executives and managers. The capstone course will help students grow into successful Data Analysts by requiring them to undertake real world projects with collaborators, and to tolerate ambiguity and negative results.
Business Analyst
Business Analysts bridge the gap between the IT department and the rest of the business. They use data analysis to define business requirements and identify areas for improvement. They translate business needs into technical specifications and work with IT teams to implement solutions. This Capstone course will be valuable to Business Analysts by teaching them skills in data collection, analysis, and presentation.
Data Engineer
Data Engineers build, deploy, and maintain the infrastructure that stores and processes data. They work with a variety of data sources and technologies, and design and implement data pipelines and data warehouses. This Capstone course will provide skills in data preparation, organization, and transformation, which makes it a helpful complement to the engineering skills essential to Data Engineers.
Quantitative Analyst
Quantitative Analysts use mathematical and statistical models to assess the risk and return of investments. They work in a variety of industries, including finance, insurance, and healthcare, and may specialize in areas such as asset pricing, portfolio optimization, and risk management. This Capstone course may be useful for Quantitative Analysts by providing them with the opportunity to work with real world data and build models.
Actuary
Actuaries use mathematical and statistical models to assess risk and uncertainty. They work in a variety of industries, including insurance, finance, and healthcare, and may specialize in areas such as life insurance, health insurance, and pensions. This Capstone course may be useful for Actuaries by providing them with the opportunity to work with real world data and build models.
Software Engineer
Software Engineers design, develop, test, and maintain software systems. They work in a variety of industries, and may specialize in areas such as web development, mobile development, and data science. This Capstone course may be useful for Software Engineers by providing them with the opportunity to work on a real world project and gain experience in data science.
Operations Research Analyst
Operations Research Analysts use mathematical and analytical techniques to solve problems in a variety of industries. They may specialize in areas such as supply chain management, logistics, and healthcare. This Capstone course may be useful for Operations Research Analysts by providing them with the opportunity to work on a real world project.
Consultant
Consultants provide expert advice to clients on a variety of topics, including business strategy, operations, and technology. They help clients identify and solve problems and develop and implement solutions. This Capstone course may be useful for Consultants by providing them with the opportunity to work on a real world project and gain experience in data science.
Project Manager
Project Managers plan, organize, and execute projects. They work with stakeholders to define project goals and objectives, develop project plans, and track project progress. This Capstone course may be useful for Project Managers by providing them with the opportunity to work on a real world project and gain experience in data science.
Technical Writer
Technical Writers create and maintain technical documentation, such as user manuals, white papers, and technical reports. They work with subject matter experts to understand complex technical concepts and translate them into clear and concise language. This Capstone course may be useful for Technical Writers by providing them with the opportunity to work on a real world project and gain experience in data science.
Data Journalist
Data Journalists use data to tell stories. They work with data to identify trends, patterns, and anomalies, and they use data visualization and storytelling techniques to communicate their findings to the public. This Capstone course may be useful for Data Journalists by providing them with the opportunity to work on a real world project and gain experience in data science.
Policy Analyst
Policy Analysts research and analyze public policy issues. They use data to understand the impact of policies and make recommendations for policy changes. This Capstone course may be useful for Policy Analysts by providing them with the opportunity to work on a real world project and gain experience in data science.

Reading list

We've selected 13 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Data Science at Scale - Capstone Project.
Provides a hands-on introduction to machine learning using popular Python libraries like Scikit-Learn, Keras, and TensorFlow. It is particularly useful for individuals who want to gain practical experience in implementing ML models.
Provides a comprehensive guide to machine learning using the Python programming language. It covers various ML techniques, including supervised learning, unsupervised learning, and natural language processing.
Introduces the fundamentals of data analysis using Python. It covers data exploration, data wrangling, and data visualization using libraries like NumPy, Pandas, and Matplotlib.
Covers the fundamentals of data science, including data cleaning, exploratory data analysis, machine learning, and data visualization. It is suitable for individuals who are new to data science or want to refresh their understanding of its core concepts.
Provides hands-on examples of how to apply machine learning techniques to real-world problems. It covers topics like data preprocessing, model training, and model evaluation, making it suitable for individuals who want to gain practical ML experience.
Provides a comprehensive overview of data science, covering topics like data collection, data analysis, and data visualization. It is suitable for individuals who are new to data science or want to broaden their understanding of the field.
Introduces Apache Spark, a popular big data processing engine. It provides a comprehensive guide to using Spark for data processing, machine learning, and graph analysis.
Provides an accessible introduction to deep learning, a subfield of machine learning that has gained popularity in recent years. It focuses on building and training deep learning models using the Keras library in Python.
Provides a comprehensive introduction to reinforcement learning, a subfield of machine learning that deals with learning by interacting with an environment. It is particularly useful for individuals who want to understand the principles of RL and apply it to real-world problems.
Explores the implications of big data on society, businesses, and individuals. It discusses the challenges and opportunities associated with big data, such as data privacy, data security, and data ethics.
Provides a comprehensive overview of Apache Hadoop, a popular big data storage and processing framework. It covers topics like Hadoop architecture, data storage, and data processing, making it suitable for individuals who want to gain a deep understanding of Hadoop.
Focuses on the business applications of data science, explaining how organizations can use data to drive decision-making and gain a competitive advantage. It is particularly relevant for professionals who want to understand the business value of data science.
Provides a probabilistic perspective on machine learning, focusing on the mathematical and statistical foundations of ML algorithms. It is particularly suitable for individuals who want to gain a deeper understanding of the theoretical underpinnings of ML.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser