We may earn an affiliate commission when you visit our partners.
Course image
Sabrina Moore, Rajvir Dua, and Neelesh Tiruviluamala

This course will teach you how to leverage the power of Python and artificial intelligence to create and test hypothesis. We'll start for the ground up, learning some basic Python for data science before diving into some of its richer applications to test our created hypothesis. We'll learn some of the most important libraries for exploratory data analysis (EDA) and machine learning such as Numpy, Pandas, and Sci-kit learn. After learning some of the theory (and math) behind linear regression, we'll go through and full pipeline of reading data, cleaning it, and applying a regression model to estimate the progression of diabetes. By the end of the course, you'll apply a classification model to predict the presence/absence of heart disease from a patient's health data.

Enroll now

What's inside

Syllabus

Introduction to Python Programming for Hypothesis Testing
In this module, we'll get ourselves started with Programming in Python. After becoming familiar with Python and the Jupyter Notebook interface, we'll dive into some basic coding paradigms such as variables, loops, and functions. We'll also cover data structures in the form of lists and dictionaries. We'll go through one of the most useful things in your Python arsenal - importing and using modules effectively. Finally, we'll introduce scikit-learn and walk through a classification problem to predict the presence/absence of cancer from health data.
Read more
Creating a Hypothesis: Numpy, Pandas, and Scikit-Learn
In this module, we'll become familiar with the two most important packages for data science: Numpy and Pandas. We'll begin by learning the differences between the two packages. Then, we'll get ourselves familiar with np arrays and their functionalities. Adding text turns our arrays into tables, and gives rise to the Pandas module. After a basic introduction, we'll end with a series of important data manipulation tools such as indexing, merging/combining datasets, and reshaping data.
Scikit-Learn Revisited: ML for Hypothesis Testing
In this module, we'll work from the ground up to build and test our hypothesis. Learning both the theory and the code, we'll learn to test our predictions with different types of machine learning algorithms. We'll start by going through some of the necessary data preprocessing steps to orient ourselves. Getting familiar with using the Scikit-Learn library starts with the documentation. From there, we'll load in a dataset and analyze some of its most basic properties. Finally, we'll import and use models to make a prediction.
Using Classification to Predict the Presence of Heart Disease
In the final project, we'll try and predict the presence of heart disease using patient data. We'll load in data, create new features, and apply a machine learning algorithm using scikit-learn.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Develops proficiency in hypothesis testing and programming in Python, which are relevant for machine learning and data science
Utilizes industry-standard tools and libraries like NumPy, Pandas, and Scikit-learn, giving learners practical experience
Introduces essential concepts in data analysis and machine learning, catering to beginners in these fields
Provides hands-on experience through a final project involving predicting heart disease presence, reinforcing practical application
Instructors have a combined experience in academia and industry, providing a mix of theoretical and practical insights
Covers both foundational concepts and practical implementation, offering a comprehensive learning experience

Save this course

Save Introduction to Data Science and scikit-learn in Python to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Introduction to Data Science and scikit-learn in Python with these activities:
Review Linear Regression Concepts
Strengthen your knowledge of linear regression concepts to enhance your ability to apply the techniques taught in this course effectively.
Browse courses on Linear Regression
Show steps
  • Revisit the mathematical foundations of linear regression
  • Review the assumptions and limitations of linear regression
  • Practice fitting and evaluating linear regression models
Review Python Programming Fundamentals
Reinforce your understanding of Python programming fundamentals to enhance your ability to apply the concepts taught in this course.
Browse courses on Python Programming
Show steps
  • Review variables, data types, and operators
  • Practice using conditional statements and loops
  • Refresh your knowledge of functions and modules
Participate in Peer Study Groups
Engage in peer-to-peer discussions to enhance your understanding of the course material, clarify concepts, and receive support from fellow learners.
Browse courses on Collaborative Learning
Show steps
  • Join or create study groups with other students
  • Discuss course topics, share insights, and ask questions
  • Work together on projects or assignments
Five other activities
Expand to see all activities and additional details
Show all eight activities
Solve Python Coding Challenges
Deepen your understanding of Python programming by solving coding challenges, fostering your problem-solving skills and algorithm implementation abilities.
Browse courses on Problem Solving
Show steps
  • Find coding challenges online or in textbooks
  • Attempt to solve the challenges independently
  • Review solutions and understand the approaches
Explore Scikit-Learn Tutorials
Enhance your understanding of Scikit-Learn by following guided tutorials, enabling you to effectively apply these techniques in your hypothesis testing projects.
Show steps
  • Identify relevant tutorials on the official Scikit-Learn website or other reputable sources
  • Follow the tutorials step-by-step, implementing the concepts in your own projects
  • Experiment with different parameters and algorithms to gain practical experience
Contribute to Open Source Projects in Python
Gain practical experience and deepen your understanding of Python by contributing to open source projects, fostering your collaborative and problem-solving skills.
Browse courses on Open Source Projects
Show steps
  • Identify open source projects related to Python or data science
  • Review the project documentation and codebase
  • Identify areas where you can contribute and submit pull requests
  • Collaborate with project maintainers and other contributors
Build a Machine Learning Model to Predict Diabetes Progression
Apply your knowledge of Python, data analysis, and machine learning to create a meaningful project that demonstrates your understanding and skills.
Browse courses on Machine Learning Model
Show steps
  • Gather and preprocess diabetes-related data
  • Train and evaluate different machine learning models
  • Analyze the results and interpret the model's performance
  • Write a report summarizing your findings and insights
Build a Heart Disease Prediction Web App
Combine your knowledge of Python, data analysis, and machine learning to create a fully functional web application that can predict the presence of heart disease, demonstrating your comprehensive skills and practical implementation abilities.
Browse courses on Web Development
Show steps
  • Design the web application architecture and user interface
  • Develop the backend data processing and machine learning model integration
  • Create the frontend user interface and connect it to the backend
  • Deploy the web application to a cloud platform

Career center

Learners who complete Introduction to Data Science and scikit-learn in Python will develop knowledge and skills that may be useful to these careers:
Data Scientist
A Data Scientist uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data in various forms, both structured and unstructured. This course introduces you to data science concepts, including data exploration, data cleaning, data visualization, and statistical modeling. The course also covers various machine learning algorithms, including linear regression, classification, and clustering. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Data Scientist.
Machine Learning Engineer
A Machine Learning Engineer designs, develops, and deploys machine learning models to solve real-world problems. This course provides a good understanding of the fundamentals of machine learning, including data preparation, feature engineering, model selection, and model evaluation. The course also covers various machine learning algorithms, including linear regression, classification, and clustering. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Machine Learning Engineer.
Data Analyst
A Data Analyst collects, analyzes, and interprets data to help organizations make informed decisions. This course provides a good introduction to the fundamentals of data science, including data exploration, data cleaning, data visualization, and statistical modeling. The course also covers various machine learning algorithms, including linear regression, classification, and clustering. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Data Analyst.
Statistician
A Statistician collects, analyzes, and interprets data to help organizations make informed decisions. This course provides a good introduction to the fundamentals of statistics, including probability, inference, and regression. The course also covers various statistical techniques, including linear regression, logistic regression, and ANOVA. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Statistician.
Business Analyst
A Business Analyst helps organizations make informed decisions by analyzing data and providing insights. This course provides a good introduction to the fundamentals of data analysis, including data exploration, data cleaning, and data visualization. The course also covers various business analysis techniques, including financial analysis, market research, and competitive analysis. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Business Analyst.
Software Engineer
A Software Engineer designs, develops, and maintains software systems. This course provides a good introduction to the fundamentals of software engineering, such as software design, coding, and testing. The course also covers various programming languages, such as Python, Java, and C++. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Software Engineer.
Operations Research Analyst
An Operations Research Analyst uses mathematical models and optimization techniques to solve business problems. This course provides a good introduction to the fundamentals of operations research, including linear programming, integer programming, and network optimization. The course also covers various applications of operations research, such as supply chain management, scheduling, and inventory management. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as an Operations Research Analyst.
Financial Analyst
A Financial Analyst analyzes financial data to help organizations make sound investments. This course provides a good introduction to the fundamentals of financial analysis, such as financial ratios, financial modeling, and investment analysis. The course also covers various financial instruments, such as stocks, bonds, and derivatives. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Financial Analyst.
Actuary
An Actuary analyzes data to assess the risk of future events, such as illness or death. This course provides a good introduction to the fundamentals of actuarial science, such as probability, statistics, and financial modeling. The course also covers various actuarial techniques, such as life insurance, health insurance, and pension plans. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as an Actuary.
Market Researcher
A Market Researcher analyzes data to understand the needs and wants of customers. This course provides a good introduction to the fundamentals of market research, such as research design, data collection, and data analysis. The course also covers various market research techniques, such as surveys, focus groups, and interviews. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Market Researcher.
Data Engineer
A Data Engineer designs, builds, and maintains data pipelines to support data science and analytics initiatives. This course provides a good introduction to the fundamentals of data engineering, such as data integration, data cleansing, and data warehousing. The course also covers various data engineering tools and technologies, such as Hadoop, Spark, and Hive. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Data Engineer.
Database Administrator
A Database Administrator (DBA) designs, implements, and maintains databases to support data science and analytics initiatives. This course provides a good introduction to the fundamentals of database management, such as database design, data storage, and data access. The course also covers various database technologies, such as MySQL, PostgreSQL, and Oracle. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a DBA.
Systems Analyst
A Systems Analyst designs, develops, and implements computer systems to support data science and analytics initiatives. This course provides a good introduction to the fundamentals of systems analysis, such as systems design, systems development, and systems implementation. The course also covers various systems analysis techniques, such as structured analysis, object-oriented analysis, and agile development. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Systems Analyst.
Business Intelligence Analyst
A Business Intelligence (BI) Analyst uses data analysis and data visualization to help organizations make informed decisions. This course provides a good introduction to the fundamentals of BI, such as data warehousing, data mining, and data visualization. The course also covers various BI tools and technologies, such as Tableau, Power BI, and QlikView. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a BI Analyst.
Technical Writer
A Technical Writer creates and maintains technical documentation, such as user manuals, white papers, and training materials. This course provides a good introduction to the fundamentals of technical writing, such as technical writing principles, documentation tools, and writing for different audiences. The course also covers various technical writing techniques, such as how to write clearly and concisely, how to structure a document, and how to use graphics and visuals. By completing this course, you will gain the knowledge and skills needed to get started or advance your career as a Technical Writer.

Reading list

We've selected 11 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Introduction to Data Science and scikit-learn in Python.
Provides a comprehensive overview of statistical learning, covering topics such as linear regression, logistic regression, and tree-based methods. It valuable resource for beginners and experienced statisticians alike.
Provides a comprehensive overview of machine learning, covering topics such as supervised learning, unsupervised learning, and deep learning. It valuable resource for beginners and experienced machine learning practitioners alike.
Provides a comprehensive overview of deep learning in Python, covering topics such as convolutional neural networks, recurrent neural networks, and generative adversarial networks. It valuable resource for beginners and experienced Python users alike.
Provides a more advanced treatment of statistical learning, covering topics such as support vector machines, kernel methods, and Bayesian methods. It valuable resource for experienced statisticians and machine learning practitioners.
Provides a comprehensive overview of deep learning, covering topics such as convolutional neural networks, recurrent neural networks, and generative adversarial networks. It valuable resource for beginners and experienced deep learning practitioners alike.
Provides a comprehensive overview of speech and language processing, covering topics such as speech recognition, natural language understanding, and text-to-speech synthesis. It valuable resource for beginners and experienced speech and language processing practitioners alike.
Provides a comprehensive overview of reinforcement learning, covering topics such as Markov decision processes, value function approximation, and policy optimization. It valuable resource for beginners and experienced reinforcement learning practitioners alike.
Provides a thorough introduction to Python for data analysis, covering topics such as data manipulation, data visualization, and machine learning. It valuable resource for beginners and experienced Python users alike.
Provides a comprehensive overview of natural language processing, covering topics such as text preprocessing, text classification, and text generation. It valuable resource for beginners and experienced natural language processing practitioners alike.
Provides a practical introduction to data science, covering topics such as data cleaning, data analysis, and machine learning. It valuable resource for beginners who want to learn the fundamentals of data science.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Introduction to Data Science and scikit-learn in Python.
Train Machine Learning Models
Most relevant
Foundations of Statistics and Probability for Machine...
Most relevant
Performing Confirmatory Data Analysis in R
Most relevant
Essential Statistics for Data Analysis
Most relevant
RStudio for Six Sigma - Hypothesis Testing
Most relevant
Complete Linear Regression Analysis in Python
Most relevant
Interpreting Data Using Statistical Models with Python
Most relevant
Deep Learning Prerequisites: Linear Regression in Python
Most relevant
Machine Learning with Python: A Practical Introduction
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser