We may earn an affiliate commission when you visit our partners.
Course image
Rav Ahuja and Ramesh Sannareddy

Showcase your skills in this Data Engineering project! In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate.

Read more

Showcase your skills in this Data Engineering project! In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate.

You will demonstrate your knowledge of Data Engineering by assuming the role of a Junior Data Engineer who has recently joined an organization and be presented with a real-world use case that requires architecting and implementing a data analytics platform.

In this Capstone project you will complete numerous hands-on labs. You will create and query data repositories using relational and NoSQL databases such as MySQL and MongoDB. You’ll also design and populate a data warehouse using PostgreSQL and IBM Db2 and write queries to perform Cube and Rollup operations.

You will generate reports from the data in the data warehouse and build a dashboard using Cognos Analytics. You will also show your proficiency in Extract, Transform, and Load (ETL) processes by creating data pipelines for moving data from different repositories. You will perform big data analytics using Apache Spark to make predictions with the help of a machine learning model.

This course is the final course in the IBM Data Engineering Professional Certificate. It is recommended that you complete all the previous courses in this Professional Certificate before starting this course.

Enroll now

What's inside

Syllabus

Data Platform Architecture and OLTP Database
In this module, you will design a data platform that uses MySQL as an OLTP database. You will be using MySQL to store the OLTP data.
Read more
Querying Data in NoSQL Databases
In this module, you will design a data platform that uses MongoDB as a NoSQL database. You will use MongoDB to store the e-commerce catalog data.
Build a Data Warehouse
In this module you will design and implement a data warehouse and you will then generate reports from the data in the data warehouse.
Data Analytics
In this module, you will assume the role of a data engineer at an e-commerce company. Your company has finished setting up a data warehouse. Now you are assigned the responsibility to design a reporting dashboard that reflects the key metrics of the business.
ETL & Data Pipelines
In this module, you will use the given python script to perform various ETL operations that move data from RDBMS to NoSQL, NoSQL to RDBMS, and from RDBMS, NoSQL to the data warehouse. You will write a pipeline that analyzes the web server log file, extracts the required lines and fields, transforms and loads data.
Big Data Analytics with Spark
In this module, you will use the data from a webserver to analyse search terms. You will then load a pretrained sales forecasting model and predict the sales forecast for a future year.
Final Submission and Peer Review
In this final module you will complete your submission of screenshots from the hands-on labs for your peers to review. Once you have completed your submission you will then review the submission of one of your peers and grade their submission.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Significant emphasis on hands-on labs for a more practical learning experience
Covers industry-standard technologies such as MySQL, MongoDB, and Apache Spark
Provides a comprehensive overview of data engineering processes, from data acquisition to data analysis
Offers the opportunity to build a data analytics dashboard using Cognos Analytics
Requires prior knowledge in data engineering concepts and tools
Does not provide a comprehensive introduction to big data concepts and technologies

Save this course

Save Data Engineering Capstone Project to your list so you can find it easily later:
Save

Reviews summary

Data engineering capstone project

According to students, the Data Engineering Capstone Project is a well-received, practical course that delivers valuable hands-on experience. Learners say that the course provides a comprehensive understanding of data engineering concepts and tools, and the hands-on labs and real-world projects are engaging and informative. While some learners found the Capstone to be anticlimactic, others appreciated the practical approach and the opportunity to navigate linux errors.
Well-structured and comprehensive
"The courses were well-structured, providing a comprehensive understanding of data engineering concepts and tools."
"It warn up every lessons. "
Provides valuable hands-on experience
"Very hands on and informative"
"The courses were well-structured, providing a comprehensive understanding of data engineering concepts and tools. The hands-on labs and real-world projects were incredibly valuable in applying what I learned."
"While there are valuable opportunities for hands-on experience provided for this course..."
May be anticlimactic for some
"The Capstone was a bit of an anticlimax."
"You will spend more time navigating linux errors than the actual central content."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Data Engineering Capstone Project with these activities:
Review data modeling concepts
Refresh your knowledge of data modeling techniques, including entity-relationship diagrams, normalization, and data warehousing principles, to enhance your ability to design and implement efficient data structures.
Browse courses on Data Modeling
Show steps
  • Review textbooks or online resources on data modeling
  • Practice creating ER diagrams for simple scenarios
  • Discuss data modeling concepts with peers or mentors
Review MySQL basics
Solidify your understanding of MySQL, a foundational skill for data engineering.
Browse courses on MySQL
Show steps
  • Review SQL queries and commands
  • Practice creating and modifying tables
  • Execute queries to retrieve and manipulate data
Practice RDBMS querying
Sharpen your SQL skills by practicing queries on sample databases, focusing on data retrieval, manipulation, and aggregation techniques commonly used in data engineering.
Browse courses on MySQL
Show steps
  • Find sample databases online or create your own
  • Use a database management tool or command-line interface to connect to the database
  • Write SQL queries to perform data retrieval, filtering, sorting, and aggregation
Ten other activities
Expand to see all activities and additional details
Show all 13 activities
Participate in peer review sessions
Engage with fellow learners to review and provide feedback on each other's work, fostering a collaborative learning environment and improving your problem-solving and communication skills.
Show steps
  • Find a study group or online forum for peer review
  • Share your work with others and seek feedback
  • Review and provide constructive criticism on others' work
Consolidate and organize course materials
Organize your notes, assignments, and other course materials into a central repository, making them easily accessible for review and reference, improving your ability to retain and retrieve information effectively.
Show steps
  • Gather all relevant course materials
  • Create a structured filing system or digital notebook
  • Categorize and label materials for easy retrieval
Explore MongoDB documentation
Enhance your understanding of MongoDB by exploring its comprehensive documentation.
Browse courses on MongoDB
Show steps
  • Read tutorials on data modeling and query optimization
  • Follow examples of aggregation and indexing techniques
Solve data warehousing exercises
Reinforce your data warehousing concepts by solving practical exercises.
Browse courses on Data Warehousing
Show steps
  • Design star and snowflake schemas
  • Create tables and load data
  • Write queries to analyze data
Review Apache Spark for beginners
Review the basics of Apache Spark, including its architecture, components, and programming interfaces, to strengthen your understanding of big data processing.
Browse courses on Apache Spark
Show steps
  • Watch beginner-friendly tutorials on Apache Spark
  • Install Apache Spark on your local machine
  • Write basic Spark programs using PySpark or Scala
Build a simple data pipeline using Python
Create a working data pipeline using Python to demonstrate your understanding of data ingestion, transformation, and loading processes, providing hands-on experience with real-world data handling scenarios.
Browse courses on Data Pipelines
Show steps
  • Choose a data source and define the data transformation rules
  • Write Python code to implement the data pipeline using libraries like Pandas and NumPy
  • Test and validate the pipeline using sample data
Solve practice problems on data analysis and visualization
Engage in practice drills and exercises to strengthen your data analysis and visualization skills, improving your ability to interpret data, identify patterns, and communicate insights effectively.
Browse courses on Data Analysis
Show steps
  • Find practice problems and datasets online or in textbooks
  • Use data analysis tools like Python, R, or Tableau to solve the problems
  • Compare your solutions with others or discuss them with mentors
Build a dashboard using Cognos Analytics
Apply your data engineering skills by creating a visually informative dashboard.
Browse courses on Data Visualization
Show steps
  • Connect to data sources and create data sets
  • Design visualizations and reports
  • Configure interactive features
Develop an ETL pipeline
Demonstrate your ability to build a complete ETL pipeline by integrating data from multiple sources.
Browse courses on ETL
Show steps
  • Design the ETL architecture
  • Implement data extraction and transformation
  • Load data into the target system
  • Configure monitoring and error handling
Participate in a data analytics hackathon
Challenge yourself and showcase your skills by participating in a data analytics competition.
Browse courses on Big Data Analytics
Show steps
  • Form a team or work independently
  • Analyze the provided data set
  • Develop a solution using appropriate techniques
  • Submit your results and prepare for presentation

Career center

Learners who complete Data Engineering Capstone Project will develop knowledge and skills that may be useful to these careers:
Data Engineer
As a Data Engineer, you will be responsible for designing, building, and maintaining data platforms and pipelines. You will work with data from a variety of sources, including relational databases, NoSQL databases, and big data sources. This course will help you build a strong foundation in data engineering principles and practices. You will learn how to design and implement data platforms, perform data analytics, and build data pipelines.This course is recommended for anyone who wants to pursue a career as a Data Engineer.
Data Warehouse Architect
As a Data Warehouse Architect, you will be responsible for designing and implementing data warehouses. You will work with data from a variety of sources, including relational databases, NoSQL databases, and big data sources. This course will help you build a strong foundation in data warehouse design and implementation. You will learn how to design and implement data warehouses that are scalable, reliable, and performant.
Data Analyst
As a Data Analyst, you will be responsible for collecting, cleaning, and analyzing data. You will use data to identify trends, patterns, and insights. This course will help you build a strong foundation in data analysis principles and practices. You will learn how to collect, clean, and analyze data, and how to present your findings in a clear and concise way.
Data Scientist
As a Data Scientist, you will be responsible for using data to solve business problems. You will use data to build models that can predict future outcomes, identify risks, and optimize business processes. This course will help you build a strong foundation in data science principles and practices. You will learn how to use data to build models, and how to evaluate and interpret the results of your models.
Business Intelligence Analyst
As a Business Intelligence Analyst, you will be responsible for using data to improve business decision-making. You will work with data to identify trends, patterns, and insights. This course will help you build a strong foundation in business intelligence principles and practices. You will learn how to use data to identify trends, patterns, and insights, and how to present your findings in a clear and concise way.
Database Administrator
As a Database Administrator, you will be responsible for managing and maintaining databases. You will ensure that databases are available, performant, and secure. This course will help you build a strong foundation in database administration principles and practices. You will learn how to install, configure, and manage databases.
Software Engineer
As a Software Engineer, you will be responsible for designing, developing, and maintaining software applications. This course may be useful for you if you are interested in developing software applications that use data. You will learn how to design and develop software applications that are scalable, reliable, and performant.
IT Manager
As an IT Manager, you will be responsible for managing and overseeing an IT department. This course may be useful for you if you are interested in managing an IT department that uses data. You will learn how to manage and oversee an IT department that is efficient and effective.
Project Manager
As a Project Manager, you will be responsible for planning, executing, and closing projects. This course may be useful for you if you are interested in managing projects that use data. You will learn how to plan, execute, and close projects that are successful.
Data Architect
As a Data Architect, you will be responsible for designing and implementing data architectures. This course may be useful for you if you are interested in designing and implementing data architectures that use data. You will learn how to design and implement data architectures that are scalable, reliable, and performant.
Technical Writer
As a Technical Writer, you will be responsible for creating and maintaining technical documentation. This course may be useful for you if you are interested in creating and maintaining technical documentation for software applications that use data. You will learn how to create and maintain technical documentation that is clear, concise, and accurate.
Quality Assurance Analyst
As a Quality Assurance Analyst, you will be responsible for testing and evaluating software applications. This course may be useful for you if you are interested in testing and evaluating software applications that use data. You will learn how to test and evaluate software applications for defects and errors.
Information Security Analyst
As an Information Security Analyst, you will be responsible for protecting information systems from security threats. This course may be useful for you if you are interested in protecting information systems from security threats that use data. You will learn how to protect information systems from security threats.
Data Governance Analyst
As a Data Governance Analyst, you will be responsible for developing and implementing data governance policies. This course may be useful for you if you are interested in developing and implementing data governance policies for data. You will learn how to develop and implement data governance policies that are effective and efficient.
Data Privacy Analyst
As a Data Privacy Analyst, you will be responsible for protecting personal data from unauthorized access and use. This course may be useful for you if you are interested in protecting personal data from unauthorized access and use. You will learn how to protect personal data from unauthorized access and use.

Reading list

We've selected ten books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Data Engineering Capstone Project.
Teaches how to design applications that can handle large amounts of data.
Is the essential guide to Apache Spark, a distributed computing framework covered in the course.
This classic work on data warehousing offers a theoretical yet practical overview of the subject.
A must-read for data engineers, this book covers the business side of data engineering.
Comprehensive guide to machine learning with Python, which is covered in the course.
Comprehensive guide to data science with Python, which is covered in the course.
Is considered the essential guide to MongoDB, a database covered in the course.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Data Engineering Capstone Project.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser