We may earn an affiliate commission when you visit our partners.
Course image
Coursera logo

Managing Big Data in Clusters and Cloud Storage

Ian Cook and Glynn Durham

In this course, you'll learn how to manage big datasets, how to load them into clusters and cloud storage, and how to apply structure to the data so that you can run queries on it using distributed SQL engines like Apache Hive and Apache Impala. You’ll learn how to choose the right data types, storage systems, and file formats based on which tools you’ll use and what performance you need.

Read more

In this course, you'll learn how to manage big datasets, how to load them into clusters and cloud storage, and how to apply structure to the data so that you can run queries on it using distributed SQL engines like Apache Hive and Apache Impala. You’ll learn how to choose the right data types, storage systems, and file formats based on which tools you’ll use and what performance you need.

By the end of the course, you will be able to

• use different tools to browse existing databases and tables in big data systems;

• use different tools to explore files in distributed big data filesystems and cloud storage;

• create and manage big data databases and tables using Apache Hive and Apache Impala; and

• describe and choose among different data types and file formats for big data systems.

To use the hands-on environment for this course, you need to download and install a virtual machine and the software on which to run it. Before continuing, be sure that you have access to a computer that meets the following hardware and software requirements:

• Windows, macOS, or Linux operating system (iPads and Android tablets will not work)

• 64-bit operating system (32-bit operating systems will not work)

• 8 GB RAM or more

• 25GB free disk space or more

• Intel VT-x or AMD-V virtualization support enabled (on Mac computers with Intel processors, this is always enabled;

on Windows and Linux computers, you might need to enable it in the BIOS)

• For Windows XP computers only: You must have an unzip utility such as 7-Zip or WinZip installed (Windows XP’s built-in unzip utility will not work)

Enroll now

What's inside

Syllabus

Orientation to Data in Clusters and Cloud Storage
Defining Databases, Tables, and Columns
Data Types and File Types
Read more
Managing Datasets in Clusters and Cloud Storage
Optimizing Hive and Impala (Honors)
Honors (Optional)

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Examines big data tools and technologies, which are core skills for any developer working with large datasets
Taught by Ian Cook and Glynn Durham, who are recognized for their work in big data and data engineering
Develops fundamental big data skills, such as using Apache Hive and Apache Impala, which are crucial for data analysis and management
Emphasizes data management in cloud storage, a highly relevant topic in modern data engineering
Involves the use of hands-on virtual machines, providing practical experience in big data environments

Save this course

Save Managing Big Data in Clusters and Cloud Storage to your list so you can find it easily later:
Save

Reviews summary

Big data clusters and cloud storage: engaging and comprehensive

Learners say this course is an engaging and comprehensive introduction to managing big data in clusters and cloud storage. Reviewers mention they enjoyed the many hands-on exercises and relevant case studies. They also say the course is well-structured and well-paced. Many students found the readings to be helpful and say the instructors are knowledgeable and engaging. Overall, students say this course is a valuable learning experience for anyone interested in big data.
Real-world case studies help students apply their learning.
"I would like to implement the skills that I learnt in this course in some project."
"Amazing course. Both instructors have motivated me to learn more and utilize this platform more than I did ever before."
"I am also very happy they covered Amazon S3."
Course materials are well-organized and easy to follow.
"This is one of the systematic specializations which makes the harder and otherwise overwhelming subject so easy to navigate, follow and learn."
"Great course and specialization. Great instructors and course materials."
"All course structure, and content was well thought out for a online course."
Instructors are knowledgeable, engaging, and provide clear explanations.
"The both lectorers delivered their knowledge."
"The instructors are really good and I learned a lot about Hive, Impala and SQL in general."
"There are very much qualified, they are thorough knowledgable and give good direction on what is important and how it all works."
Practical, hands-on exercises help reinforce learning.
"Super useful course with a lot of hands on practices."
"Very good material and the labs using the VM are wonderful hands-on experience. "
"One of the very few - learn it by doing it - big data courses that deals topics like Hadoop, Hive, Impala comprehensively and unambiguously."
Some reviewers prefer more video content and less reading.
"I prefer the type of course where the class is teached in video. This course have many lectures."
"The course is good designed and information is well structured and explained."
"Compared to the first 2 courses, this course feels somewhat lacking in the video lectures as learners are given more readings to go through."

Activities

Coming soon We're preparing activities for Managing Big Data in Clusters and Cloud Storage. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Managing Big Data in Clusters and Cloud Storage will develop knowledge and skills that may be useful to these careers:
Data Analyst
Data Analysts are employed to analyze big data with the goal of improving business operations. Their duties include gathering, cleaning, analyzing, and interpreting large datasets. The Cloudera course Managing Big Data in Clusters and Cloud Storage provides the foundation needed to effectively analyze data on a large scale. The skills learned in this course will allow you to perform in the role of a Data Analyst.
Big Data Engineer
Big Data Engineers create and manage the infrastructure, processes, and tools to manage big data. They design, build, test, and maintain the systems that store and process large datasets. This Cloudera course will expose you to the tools and technologies used by Big Data Engineers in the field.
Data Architect
Data Architects design, create, and manage the data management systems used by an organization. They work with business stakeholders to understand data needs and develop data management strategies. This Cloudera course will provide you with a strong foundation in big data management, which is essential for Data Architects.
Data Scientist
Data Scientists use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. They build models to predict future outcomes and make recommendations. The Cloudera course Managing Big Data in Clusters and Cloud Storage can provide a helpful foundation for a career as a Data Scientist, especially for those interested in working with big data.
Machine Learning Engineer
Machine Learning Engineers design, develop, and deploy machine learning models. They work with data scientists to identify and solve business problems using machine learning. While this Cloudera course does not directly teach machine learning, it does provide a strong foundation in data management, which is essential for Machine Learning Engineers.
Statistician
Statisticians collect, analyze, interpret, and present data. They use statistical methods to solve problems in a variety of fields. While this Cloudera course does not directly teach statistics, it does provide a strong foundation in data management, which is essential for Statisticians.
Database Administrator
Database Administrators are responsible for the installation, configuration, maintenance, and performance of database systems. They work with users to understand data needs and develop database solutions. This Cloudera course will help you build a strong foundation in database management, which is essential for Database Administrators, particularly those working with big data.
Business Analyst
Business Analysts use data to identify and solve business problems. They work with stakeholders to gather requirements, analyze data, and develop solutions. This Cloudera course can provide a helpful foundation for a career as a Business Analyst, especially for those interested in working with big data.
Data Engineer
Data Engineers design, build, and maintain the data pipelines that move data between different systems. They work with data scientists and other data professionals to ensure that data is clean, consistent, and accessible. This Cloudera course will help you build a strong foundation in data management, which is essential for Data Engineers, particularly those working with big data.
Software Engineer
Software Engineers design, develop, and maintain software systems. They work with users to understand software needs and develop software solutions. While this Cloudera course does not directly teach software engineering, it does provide a strong foundation in data management, which is increasingly important for Software Engineers working on big data projects.
Cloud Architect
Cloud Architects design, build, and maintain cloud computing systems. They work with customers to understand their business needs and develop cloud solutions. This Cloudera course can provide a helpful foundation for a career as a Cloud Architect, especially for those interested in working with big data.
IT Manager
IT Managers plan, organize, and direct the implementation of information technology systems. They work with users to understand business needs and develop IT solutions. This Cloudera course can provide a helpful foundation for a career as an IT Manager, especially for those interested in working with big data.
Project Manager
Project Managers plan, organize, and execute projects. They work with stakeholders to define project scope, develop project plans, and track project progress. This Cloudera course can provide a helpful foundation for a career as a Project Manager, especially for those working on big data projects.
Data Warehouse Manager
Data Warehouse Managers are responsible for the design, construction, and maintenance of data warehouses. They work with users to understand data needs and develop data warehouse solutions. This Cloudera course will help you build a strong foundation in data management, which is essential for Data Warehouse Managers, particularly those working with big data.
Information Security Analyst
Information Security Analysts plan and implement security measures to protect an organization's information systems. They work with users to identify security risks and develop security solutions. While this Cloudera course does not directly teach information security, it does provide a strong foundation in data management, which is increasingly important for Information Security Analysts working with big data.

Reading list

We've selected eight books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Managing Big Data in Clusters and Cloud Storage.
Provides a practical guide to designing and building data-intensive applications. Useful for learners interested in the architectural considerations of big data systems.
Provides a comprehensive guide to operating and managing Hadoop clusters. Useful as a reference for learners responsible for deploying and maintaining big data systems.
Provides a practical guide to data science techniques and their application to business problems. Useful for learners interested in using big data for business intelligence.
Provides a practical introduction to data analytics techniques and their application to business problems. Useful for learners with limited background in data analytics.
Provides a comprehensive overview of database systems, including data models, query processing, and transaction management. Useful as background reading or for learners interested in the underlying concepts of big data systems.
Provides a high-level overview of big data and its impact on businesses and society. Useful as background reading or for learners interested in the broader context of big data.
Provides an overview of cloud computing principles and technologies. Useful as background reading or for learners interested in the deployment of big data systems in the cloud.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Managing Big Data in Clusters and Cloud Storage.
Analyzing Big Data with SQL
Most relevant
Foundations for Big Data Analysis with SQL
Most relevant
Big Data, Hadoop, and Spark Basics
Most relevant
Introduction to Big Data
Most relevant
Big Data Integration and Processing
Most relevant
Scalable Machine Learning on Big Data using Apache Spark
Most relevant
Getting Started with Apache Spark on Databricks
Most relevant
Big Data Modeling and Management Systems
Windows 11 Desktop Administration: Managing Devices,...
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser