We may earn an affiliate commission when you visit our partners.
Course image
Romeo Kienzler, Rav Ahuja, Joseph Santarcangelo, Steve Ryan, Aije Egwaikhide, Ramesh Sannareddy, Yan Luo, Lin Joyner, Karthik Muthuraman, Jeff Grossman, and Rose Malcolm

Organizations have more data at their disposal today than ever before. The vast amount of data that organizations are capturing, along with their desire to extract meaningful insights is driving an urgent demand for Data Engineers.

Read more

Organizations have more data at their disposal today than ever before. The vast amount of data that organizations are capturing, along with their desire to extract meaningful insights is driving an urgent demand for Data Engineers.

Data Engineers play a fundamental role in harnessing data that enable organizations to apply business intelligence for making informed decisions. Today’s Data Engineers require a broad set of skills to develop and optimize data systems and make data available to the organization for analysis.

This Professional Certificate provides you the job-ready skills you will need to launch your career as an entry level data engineer.

Upon completing this Professional Certificate, you will have extensive knowledge and practical experience with cloud-based relational databases (RDBMS) and NoSQL data repositories, working with Python, Bash and SQL, processing big data with Apache Hadoop and Apache Spark, using ETL (extract, transform and load) tools, creating data pipelines, using Apache Kafka and Airflow, designing, populating, and querying data warehouses and utilizing business intelligence tools.

Within each course, you’ll gain practical experience with hands-on labs and projects for building your portfolio. In the final Capstone project, you’ll apply your knowledge and skills attained throughout this program and demonstrate your ability to perform as a Data Engineer.

This program does not require any prior data engineering or programming experience.

What you'll learn

  • Describe the core concepts, processes, tools and technologies in the field of data engineering.
  • Demonstrate your aptitude with RDBMS fundamentals including design & creation of databases, schemas, tables; DB administration, security & working with MySQL, PostgreSQL & IBM Db2.
  • Demonstrate your proficiency with SQL query language, SELECT, INSERT, UPDATE, DELETE statements, database functions, stored procs, working with multiple tables, JOINs, & transactions.
  • Explain NoSQL and big data concepts including practice with MongoDB, Cassandra, IBM Cloudant, Apache Hadoop, Apache Spark, SparkSQL, SparkML, Spark Streaming.
  • Describe ETL tools, data pipelines using Python, shell scripts with Linux, Apache Airflow and Apache Kafka.
  • Describe Data Lakes, Data Marts and Enterprise Data Warehouses (EDW) and design them using Star and Snowflake schemas.
  • Design and populate Data Warehouses and analyze their data with Business Intelligence (BI) tools like Cognos Analytics.

Share

Help others find this collection page by sharing it with your friends and followers:

What's inside

14 courses

Python Basics for Data Science

(21 hours)
Kickstart your Python for data science journey with this beginner-friendly course. You'll learn Python basics, work with data in Python, and create your own Python scripts. Upon completion, you'll be able to perform basic hands-on data analysis using Jupyter Notebooks.

SQL for Data Science

(12 hours)
Much of the world's data lives in databases. SQL (or Structured Query Language) is a powerful programming language that is used for communicating with and extracting various data types from databases. A working knowledge of databases and SQL is necessary to advance as a data scientist or a machine learning specialist. The purpose of this course is to introduce relational database concepts and help you learn and apply foundational knowledge of the SQL language.

SQL Concepts for Data Engineers

(4 hours)
This course builds on your existing SQL knowledge to learn about additional techniques that are key to Data Engineers. You will learn how to create and use views, create and execute stored procedures, work with ACID transactions, and query multiple tables using JOIN operators.

Relational Database Basics

(10 hours)
This course introduces relational databases and Relational Database Management Systems (RDBMS). You will explore relational database design, learn how to transform source data into tables, and apply relational database design principles to your own data. You’ll get an introduction to Structured Query Language (SQL) and use it to add keys and constraints. No prior knowledge of databases or programming is required.

Data Engineering Basics for Everyone

(38 hours)
Welcome to Data Engineering Basics. This course introduces data engineering concepts, ecosystem, lifecycle, processes, and tools. You'll learn about data platforms, data repositories, data integration platforms, data pipelines, and BI and reporting tools. Through hands-on labs, you'll provision a data store on IBM cloud, prepare and load data, and perform basic operations on data.

Python for Data Engineering Project

(4 hours)
Journey into the realm of becoming a Data Engineer and apply your basic Python knowledge of working with data. You will exercise various techniques in Python to extract data in multiple file formats from different sources, transform it into specific datatypes, and then prepare it for loading it into a database.

NoSQL Database Basics

(12 hours)
This course provides technical hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained relevance in the database landscape. Their main advantage is effectively handling scalability and flexibility issues raised by modern applications.

Big Data, Hadoop, and Spark Basics

(15 hours)
Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data to identify behaviors and preferences. This course introduces you to Big Data concepts and practices, including Hadoop, Hive, and Spark.

Apache Spark for Data Engineering and Machine Learning

(7 hours)
Apache® Spark™ is a fast, flexible, and developer-friendly open-source platform for large-scale SQL, batch processing, stream processing, and machine learning. Users can take advantage of its open-source ecosystem, speed, ease of use, and analytic capabilities to work with Big Data in new ways.

Linux Commands & Shell Scripting

(3 hours)
This mini-course provides a practical introduction to commonly used Linux / UNIX shell commands and teaches you basics of Bash shell scripting to automate tasks. The course includes video-based lectures and hands-on labs to practice what you learn. You will have access to a virtual Linux server through your web browser, so you don't need to download and install anything to perform the labs.

Building ETL and Data Pipelines with Bash, Airflow and Kafka

(15 hours)
Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. This course provides the critical knowledge and skills needed by Data Engineers and Data Warehousing specialists to create and manage ETL, ELT, and data pipeline processes.

Relational Database Administration (DBA)

(20 hours)
Managing databases ensures data is reliable, protected, and accessible for organizations to make better decisions. This course provides the knowledge and hands-on experience to manage and maintain databases, understand database security, design and define database schemas, tables, views, and other database objects, describe storage, perform backups and recovery, troubleshoot errors, monitor and optimize performance and automate tasks.

Data Warehousing and BI Analytics

(15 hours)
Today's businesses are investing heavily in capabilities to harness the massive amounts of data that fuel Business Intelligence (BI). Working knowledge of Data Warehouses and BI Analytics tools are a crucial skill for Data Engineers, Data Warehousing Specialists and BI Analysts, making who are amongst, the most valued resources for organizations.

Data Engineering Capstone Project

(15 hours)
In this Capstone, you’ll design, implement, and manage a complete data and analytics platform. You’ll apply skills and knowledge from the IBM Data Engineering Professional Certificate, utilizing tools and technologies to design databases, collect data, extract, transform, and load data into a data warehouse, and create analytic reports and visualizations. You’ll also implement predictive analytics and machine learning models using big data tools and techniques.

Save this collection

Save Data Engineering to your list so you can find it easily later:
Save
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser