We may earn an affiliate commission when you visit our partners.
Course image
Ilkay Altintas and Amarnath Gupta

At the end of the course, you will be able to:

*Retrieve data from example database and big data management systems

Read more

At the end of the course, you will be able to:

*Retrieve data from example database and big data management systems

*Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications

*Identify when a big data problem needs data integration

*Execute simple big data integration and processing on Hadoop and Spark platforms

This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments. Refer to the specialization technical requirements for complete hardware and software specifications.

Hardware Requirements:

(A) Quad Core Processor (VT-x or AMD-V support recommended), 64-bit; (B) 8 GB RAM; (C) 20 GB disk free. How to find your hardware information: (Windows): Open System by clicking the Start button, right-clicking Computer, and then clicking Properties; (Mac): Open Overview by clicking on the Apple menu and clicking “About This Mac.” Most computers with 8 GB RAM purchased in the last 3 years will meet the minimum requirements.You will need a high speed internet connection because you will be downloading files up to 4 Gb in size.

Software Requirements:

This course relies on several open-source software tools, including Apache Hadoop. All required software can be downloaded and installed free of charge (except for data charges from your internet provider). Software requirements include: Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+.

Enroll now

Two deals to help you save

What's inside

Syllabus

Welcome to Big Data Integration and Processing
Welcome to the third course in the Big Data Specialization. This week you will be introduced to basic concepts in big data integration and processing. You will be guided through installing the Cloudera VM, downloading the data sets to be used for this course, and learning how to run the Jupyter server.
Read more
Retrieving Big Data (Part 1)
This module covers the various aspects of data retrieval and relational querying. You will also be introduced to the Postgres database.
Retrieving Big Data (Part 2)
This module covers the various aspects of data retrieval for NoSQL data, as well as data aggregation and working with data frames. You will be introduced to MongoDB and Aerospike, and you will learn how to use Pandas to retrieve data from them.
Big Data Integration
In this module you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
Processing Big Data
This module introduces Learners to big data pipelines and workflows as well as processing and analysis of big data using Apache Spark.
Big Data Analytics using Spark
In this module, you will go deeper into big data processing by learning the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.
Learn By Doing: Putting MongoDB and Spark to Work
In this module you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Introduces foundational concepts essential for understanding big data integration and processing
Taught by renowned experts in the field, Amarnath Gupta and Ilkay Altintas
Covers vital aspects of big data integration, from basic concepts to advanced techniques
Develops hands-on skills in retrieving, integrating, and processing big data using industry-standard tools like Apache Hadoop and Spark
Prepares learners for careers in big data analytics and data science by equipping them with core competencies

Save this course

Save Big Data Integration and Processing to your list so you can find it easily later:
Save

Reviews summary

Big data integration and processing concepts

Learners say this big data integration and processing course has engaging assignments but difficult exams and quizzes. The well received course teaches big data integration and processing concepts using frameworks like MongoDB, Splunk, Spark, and SparkSQL. Students experience hands-on practice through quizzes, assignments, and a final project.
Some programming knowledge and familiarity with big data concepts are helpful.
"As i am not familiar with the VM and its environment, I spent more time struggling with the VM paths, initialization even with the pre command sets than doing the computation of the data."
"In general my experience was very good, I consider that you must have some knowledge in programming to get the most out of it."
Course structure offers a good overview of big data tools, but could provide more in-depth coverage.
"I learnt a lot from the course and my understanding of big data has improved."
"I especially enjoyed the hand-on exercise of week 6 and all-in-all the lectures. They give a good overview on various data integration tools."
Instructors are knowledgeable and enthusiastic, but may not be very responsive in the forums.
"Hello Gentlemen,This course was very helpful foe me. It enhanced my knowledge about Big Data Integration. Thank you so much for providing me such important knowledge. Thank you once again."
"They have a nice way of conveying a message, making it easy to follow."
Engaging and informative, but some issues with outdated software.
"this course is great. each material is taught in great detail from the video explanation, also accompanied by material document slides, and there are many quizzes for..."
"I really enjoyed the hand-on exercise of week 6 and all-in-all the lectures. They give a good overview on various data integration tools."
Exams and quizzes can be challenging, especially the final project.
"The final test is very difficult and will require extensive knowledge of Linux and coding."
"Some of the cousera material is not clear and also the final quiz is too hard"
Outdated software and materials can cause problems with hands-on assignments.
"the final project quiz is terrible. It has outdated guides, unclear instructions and out-of-sync hints. If you want to torture yourself, feel free to try it. If you know nothing about linux or coding, forget about completing this course."
"I'm about halfway through this course and the specialization as a whole.It it apparent that these courses were created a few years ago and have been left to their own devices since then."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data Integration and Processing with these activities:
Review database fundamentals
Reviewing key concepts from relational database management systems will give you a much stronger footing for the later parts of the course.
Show steps
  • Review the main concepts of database management and relational database models
  • Understand the main database operations and data retrieval techniques
Review Linux Basics
Refreshes your core competencies in Linux terminal usage, which is used in this course for various operations and demonstrations.
Browse courses on Linux
Show steps
  • Review basic commands
  • Practice file and directory navigation
Organize Course Materials
Ensures you have a well-structured and easily accessible repository of course materials, enhancing your ability to review and reinforce key concepts throughout the learning journey.
Show steps
  • Create a dedicated folder or notebook for course materials
  • Organize materials by module or topic
  • Include notes, summaries, and practice questions
11 other activities
Expand to see all activities and additional details
Show all 14 activities
Organize materials
Putting course materials in one location before the course begins makes it easier to follow content and reinforcing your understanding of key topics.
Show steps
  • Create a clean folder on your computer for storing materials
  • Set up a Google Drive or Dropbox folder to serve as a backup
  • Print materials as needed
Review NoSQL Databases
Refreshes your understanding of NoSQL databases and their applications, providing a strong foundation for the upcoming modules on data integration and processing.
Browse courses on NOSQL Databases
Show steps
  • Revisit key concepts of NoSQL databases
  • Review different types of NoSQL databases
Connect with a Data Scientist
Offers the opportunity to gain valuable insights and guidance from experienced professionals in the field, enhancing your learning experience and career prospects.
Show steps
  • Attend industry events and conferences
  • Join online communities and forums
Attend a workshop on big data analytics
Attending a workshop on big data analytics will provide you with insights into the latest trends and best practices in the field, which can complement your learning in this course.
Browse courses on Big Data Analytics
Show steps
  • Research and identify relevant workshops in your area
  • Register for and attend the workshop
Spark SQL Tutorial
Enhances your understanding of Spark SQL by providing step-by-step guidance through essential concepts and use cases.
Browse courses on Spark SQL
Show steps
  • Follow the official Spark SQL tutorial
  • Practice writing SQL queries on sample datasets
Build a simple NoSQL database
Building a simple NoSQL database will reinforce the concepts of NoSQL databases and enhance your understanding when studying them in this course.
Browse courses on NOSQL Databases
Show steps
  • Choose a NoSQL database such as MongoDB or Cassandra
  • Develop a simple schema for your database
  • Insert data into your database
  • Query and retrieve data from your database
Practice data processing with Spark
Practicing data processing with Spark will enhance your knowledge and skills on data processing using Apache Spark.
Browse courses on Apache Spark
Show steps
  • Work through the exercises provided in the Apache Spark documentation
  • Find additional practice exercises and examples online
Develop a Data Integration Pipeline
Provides hands-on experience with building and deploying a data integration pipeline, comparable to those used in real-world applications.
Browse courses on Data Engineering
Show steps
  • Design the pipeline architecture
  • Select and configure data sources
  • Implement data transformation and cleaning
  • Deploy and monitor the pipeline
Develop a small-scale data integration pipeline
Creating a small-scale data integration pipeline will solidify the principles of data integration and processing you will learn in this course.
Browse courses on Data Integration
Show steps
  • Identify two data sources that are relevant to each other
  • Develop a plan for integrating the data
  • Implement your plan using a data integration tool such as Informatica or Talend
  • Test your data integration pipeline
Spark MLlib Exercises
Strengthens your grasp of Spark MLlib through a series of targeted exercises, reinforcing key concepts and algorithms.
Browse courses on Spark MLlib
Show steps
  • Solve classification problems using Logistic Regression
  • Implement clustering algorithms like k-means
Big Data Analytics Project
Provides a comprehensive challenge by applying the concepts learned in the course to a real-world data analytics project, fostering practical implementation skills.
Browse courses on Big Data Analytics
Show steps
  • Define the project scope and objectives
  • Collect and preprocess relevant data
  • Develop data analysis and modeling pipelines
  • Evaluate and interpret results

Career center

Learners who complete Big Data Integration and Processing will develop knowledge and skills that may be useful to these careers:
Data Scientist
Data Scientists use their knowledge of statistics, machine learning, and data mining to extract insights from data. They can work in a variety of industries, including healthcare, finance, and retail. This course may be useful to a Data Scientist as it provides a solid foundation in big data integration and processing. Learners will gain experience with data retrieval, data integration, and big data analytics using Apache Spark.
Data Engineer
Data Engineers are responsible for designing, building, and maintaining the infrastructure that supports data analytics. A Data Engineer may be responsible for collecting, storing, and processing data from a variety of sources, as well as developing and maintaining data pipelines. This course may be useful to a Data Engineer as it provides a comprehensive overview of big data integration and processing. Learners will gain experience with a range of big data tools and technologies, including Hadoop, Spark, and MongoDB.
Data Analyst
A Data Analyst can work independently to analyze data and draw conclusions from it. They are tasked with gathering and interpreting large amounts of data, applying statistical techniques to uncover patterns and trends, and presenting their findings to stakeholders. This course may be useful to a Data Analyst as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with data integration tools including Apache Hadoop and Spark, which are essential for success in this role.
Data Architect
A Data Architect designs and builds the infrastructure that supports data analytics. They are responsible for ensuring that data is accessible, reliable, and secure. This course may be useful to a Data Architect as it provides a comprehensive overview of big data integration and processing. Learners will gain experience with a range of big data tools and technologies, including Hadoop, Spark, and MongoDB.
Database Administrator
A Database Administrator is responsible for managing and maintaining databases. They ensure that databases are operational and that data is secure. This course may be useful to a Database Administrator as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with a range of database technologies, including PostgreSQL and MongoDB.
Machine Learning Engineer
Machine Learning Engineers apply machine learning techniques to solve real-world problems. They work with data scientists to develop and deploy machine learning models. This course may be useful to a Machine Learning Engineer as it provides a strong foundation in big data integration and processing. Learners will gain experience with a range of big data tools and technologies, including Hadoop, Spark, and MongoDB.
Statistician
Statisticians use statistical methods to collect, analyze, and interpret data. They work in a variety of industries, including healthcare, finance, and public policy. This course may be useful to a Statistician as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with a range of statistical techniques and tools, including R and Python.
Information Security Analyst
An Information Security Analyst is responsible for protecting an organization's IT infrastructure from security threats. They are responsible for identifying and mitigating vulnerabilities, and for developing and implementing security policies. This course may be useful to an Information Security Analyst as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with a range of security tools and technologies, including firewalls and intrusion detection systems.
Information Technology Manager
An Information Technology Manager is responsible for managing an organization's IT infrastructure. They are responsible for ensuring that IT systems are operational and that data is secure. This course may be useful to an Information Technology Manager as it provides a comprehensive overview of big data integration and processing. Learners will gain experience with a range of big data tools and technologies, including Hadoop, Spark, and MongoDB.
Quantitative Analyst
A Quantitative Analyst uses mathematical and statistical methods to analyze financial data. They work with investment banks and hedge funds to develop trading strategies. This course may be useful to a Quantitative Analyst as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with a range of financial data analysis tools and techniques, including Python and R.
Business Analyst
A Business Analyst helps businesses to identify and solve problems. They use data to analyze business processes and recommend solutions. This course may be useful to a Business Analyst as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with a range of data analysis tools and techniques, including SQL and Python.
Software Engineer
Software Engineers design, develop, and maintain software systems. They work in a variety of industries, including healthcare, finance, and technology. This course may be useful to a Software Engineer as it provides a strong foundation in data integration and processing. Learners will gain experience with a range of software development tools and technologies, including Java and Python.
Operations Research Analyst
An Operations Research Analyst uses mathematical models to solve business problems. They work with businesses to improve efficiency and profitability. This course may be useful to an Operations Research Analyst as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with a range of mathematical modeling techniques, including linear programming and simulation.
Financial Analyst
A Financial Analyst uses financial data to make investment recommendations. They work with clients to develop investment strategies and manage portfolios. This course may be useful to a Financial Analyst as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with a range of financial data analysis tools and techniques, including Excel and Python.
Market Researcher
A Market Researcher conducts research to understand consumer behavior. They use data to identify trends and make recommendations for marketing campaigns. This course may be useful to a Market Researcher as it provides a strong foundation in data retrieval and big data processing. Learners will gain experience with a range of market research tools and techniques, including surveys and focus groups.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Big Data Integration and Processing.
A comprehensive guide to data visualization. Covers data visualization principles, data visualization techniques, and data visualization best practices. Useful for data analysts, data scientists, and anyone who needs to communicate data effectively.
The definitive guide to Hadoop, covering architecture, installation, configuration, and administration. Essential reading for Hadoop administrators and engineers.
The definitive guide to Spark, covering architecture, programming, and advanced topics. Essential reading for Spark developers and engineers.
The definitive guide to MongoDB, covering architecture, installation, configuration, and administration. Essential reading for MongoDB administrators and developers.
A practical guide to data analysis with Pandas. Covers data cleaning, data manipulation, and data visualization. Useful for data analysts and data scientists.
A practical guide to big data processing, covering Hadoop, Spark, and other tools. Provides hands-on examples and case studies. Useful for data engineers and developers.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Big Data Integration and Processing.
Big Data Modeling and Management Systems
Most relevant
Introduction to Big Data
Most relevant
Managing Big Data in Clusters and Cloud Storage
Most relevant
Foundations for Big Data Analysis with SQL
Most relevant
Analyzing Big Data with SQL
Most relevant
Arquitecturas de Big Data
Most relevant
Introduction to FPGA Design for Embedded Systems
Most relevant
Hands-on Penetration Testing Labs 3.0
Create Amazing Graphics and Art using Stable Cascade
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser