We may earn an affiliate commission when you visit our partners.
Course image
Reza Farivar and Roy H. Campbell

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data!

Read more

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data!

In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information. We start the first week by introducing some major systems for data analysis including Spark and the major frameworks and distributions of analytics applications including Hortonworks, Cloudera, and MapR. By the middle of week one we introduce the HDFS distributed and robust file system that is used in many applications like Hadoop and finish week one by exploring the powerful MapReduce programming model and how distributed operating systems like YARN and Mesos support a flexible and scalable environment for Big Data analytics. In week two, our course introduces large scale data storage and the difficulties and problems of consensus in enormous stores that use quantities of processors, memories and disks. We discuss eventual consistency, ACID, and BASE and the consensus algorithms used in data centers including Paxos and Zookeeper. Our course presents Distributed Key-Value Stores and in memory databases like Redis used in data centers for performance. Next we present NOSQL Databases. We visit HBase, the scalable, low latency database that supports database operations in applications that use Hadoop. Then again we show how Spark SQL can program SQL queries on huge data. We finish up week two with a presentation on Distributed Publish/Subscribe systems using Kafka, a distributed log messaging system that is finding wide use in connecting Big Data and streaming applications together to form complex systems. Week three moves to fast data real-time streaming and introduces Storm technology that is used widely in industries such as Yahoo. We continue with Spark Streaming, Lambda and Kappa architectures, and a presentation of the Streaming Ecosystem. Week four focuses on Graph Processing, Machine Learning, and Deep Learning. We introduce the ideas of graph processing and present Pregel, Giraph, and Spark GraphX. Then we move to machine learning with examples from Mahout and Spark. Kmeans, Naive Bayes, and fpm are given as examples. Spark ML and Mllib continue the theme of programmability and application construction. The last topic we cover in week four introduces Deep Learning technologies including Theano, Tensor Flow, CNTK, MXnet, and Caffe on Spark.

Enroll now

What's inside

Syllabus

Course Orientation
You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course.
Read more
Module 1: Spark, Hortonworks, HDFS, CAP
In Module 1, we introduce you to the world of Big Data applications. We start by introducing you to Apache Spark, a common framework used for many different tasks throughout the course. We then introduce some Big Data distro packages, the HDFS file system, and finally the idea of batch-based Big Data processing using the MapReduce programming paradigm.
Module 2: Large Scale Data Storage
In this module, you will learn about large scale data storage technologies and frameworks. We start by exploring the challenges of storing large data in distributed systems. We then discuss in-memory key/value storage systems, NoSQL distributed databases, and distributed publish/subscribe queues.
Module 3: Streaming Systems
This module introduces you to real-time streaming systems, also known as Fast Data. We talk about Apache Storm in length, Apache Spark Streaming, and Lambda and Kappa architectures. Finally, we contrast all these technologies as a streaming ecosystem.
Module 4: Graph Processing and Machine Learning
In this module, we discuss the applications of Big Data. In particular, we focus on two topics: graph processing, where massive graphs (such as the web graph) are processed for information, and machine learning, where massive amounts of data are used to train models such as clustering algorithms and frequent pattern mining. We also introduce you to deep learning, where large data sets are used to train neural networks with effective results.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Starts with rudiments of data analytics and Big Data technologies
Develops an understanding of data analytics tools and methods
Provides hands-on experience with Apache Spark, Hadoop, and other popular Big Data tools

Save this course

Save Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud to your list so you can find it easily later:
Save

Reviews summary

Big data cloud applications

Learners say this course provides a solid overview of essential technologies for big data in a cloud environment. Apache tools are heavily featured. It is important to note however, that there were some ordering issues with class content. The last module is also a bit superficial. Therefore, this course may be a bit uneven.
Focuses heavily on Apache tools.
"The class also focuses heavily on apache tools, which are nice in that the tools will probably be around for a while and aren't specific to one cloud provider..."
Last module is superficial.
"The course is good, gives you an overview of many important technologies, although the last module is too superficial."
Course content has ordering issues.
"For both this course and it's part 1, the content was overall OK, but the class seemed to have some ordering issues (early videos referenced discussion that didn't occur until later)."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud with these activities:
Read 'Programming for Big Data' by Michael Gannon
Provides hands-on experience and a deeper understanding of big data principles, programming frameworks, and applications.
Show steps
  • Purchase or borrow the book 'Programming for Big Data'
  • Read the book carefully, taking notes and identifying key concepts
  • Practice the examples provided in the book to apply your understanding
  • Summarize and reflect on the main ideas covered in each chapter
Join a study group for Big Data fundamentals
Provides a supportive environment for discussing concepts, sharing knowledge, and clarifying doubts.
Browse courses on Hadoop
Show steps
  • Find or create a study group with peers taking the same course
  • Set regular meeting times and discuss assigned topics or questions
  • Engage in active listening, ask questions, and share insights
  • Summarize key points and follow up on any unresolved questions
Solve Apache Spark practice problems
Enhances understanding of Apache Spark's syntax, functionality, and use cases.
Browse courses on Apache Spark
Show steps
  • Find a website or platform offering Apache Spark practice problems
  • Attempt to solve the problems independently, referring to documentation or tutorials when needed
  • Review solutions and identify areas for improvement
Three other activities
Expand to see all activities and additional details
Show all six activities
Complete tutorials on Apache Kafka for real-time data processing
Deepens understanding of Apache Kafka's architecture, functionality, and practical applications.
Browse courses on Apache Kafka
Show steps
  • Find online tutorials or courses on Apache Kafka
  • Follow the tutorials step-by-step, setting up Kafka and practicing its features
  • Experiment with different data sources and use cases to gain practical experience
Develop a Big Data use case presentation
Encourages critical thinking, practical application, and communication skills in the context of Big Data.
Browse courses on Big Data Applications
Show steps
  • Identify a specific industry or problem that could benefit from Big Data
  • Research and gather relevant data, including sources, types, and formats
  • Develop a data analysis plan, including the tools and techniques to be used
  • Analyze the data, draw insights, and formulate recommendations
  • Create a presentation to communicate the findings, including visuals, charts, and a clear narrative
Create a comprehensive course summary
Reinforces learning by synthesizing and organizing course materials.
Show steps
  • Review all lecture notes, readings, and assignments
  • Identify key concepts, definitions, and examples
  • Create a structured outline or summary document
  • Reference specific pages or sections in the original materials for easy retrieval

Career center

Learners who complete Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud will develop knowledge and skills that may be useful to these careers:
Data Scientist
Data Scientists use data to solve business problems. They collect, clean, and analyze data to identify patterns and trends. They then use these insights to develop data-driven solutions. This course provides a strong foundation for a career as a Data Scientist by teaching you the core concepts of Big Data analytics, including data storage, data processing, and data analysis. You will also learn how to use popular Big Data tools and technologies, such as Apache Spark, Hadoop, and HDFS.
Data Engineer
Data Engineers are responsible for designing, building, testing, and maintaining the data pipelines that collect, transform, and store data for use by data analysis applications. This course provides a strong foundation for a career as a Data Engineer by teaching you the core concepts of Big Data analytics, including data storage, data processing, and data analysis. You will also learn how to use popular Big Data tools and technologies, such as Apache Spark, Hadoop, and HDFS.
Data Analyst
Data Analysts use data to understand and improve business performance. They collect, clean, and analyze data to identify trends and patterns. They then use these insights to make recommendations to stakeholders. This course provides a strong foundation for a career as a Data Analyst by teaching you the core concepts of Big Data analytics, including data storage, data processing, and data analysis. You will also learn how to use popular Big Data tools and technologies, such as Apache Spark, Hadoop, and HDFS.
Machine Learning Engineer
Machine Learning Engineers build, deploy, and maintain machine learning models. They work with data scientists to identify the appropriate machine learning algorithms and models for a given problem. They then develop and deploy the models and monitor their performance. This course provides a strong foundation for a career as a Machine Learning Engineer by teaching you the core concepts of Big Data analytics, including data storage, data processing, and data analysis. You will also learn how to use popular Big Data tools and technologies, such as Apache Spark, Hadoop, and HDFS.
Cloud Architect
Cloud Architects design, build, and maintain cloud computing environments. They work with businesses to identify their cloud computing needs and develop solutions that meet those needs. This course provides a strong foundation for a career as a Cloud Architect by teaching you the core concepts of cloud computing, including cloud computing infrastructure, cloud computing services, and cloud computing security. You will also learn how to use popular cloud computing platforms, such as AWS, Azure, and GCP.
Big Data Architect
Big Data Architects design, build, and maintain Big Data systems. They work with businesses to identify their Big Data needs and develop solutions that meet those needs. This course provides a strong foundation for a career as a Big Data Architect by teaching you the core concepts of Big Data analytics, including data storage, data processing, and data analysis. You will also learn how to use popular Big Data tools and technologies, such as Apache Spark, Hadoop, and HDFS.
Network Engineer
Network Engineers design, build, and maintain computer networks. They work with businesses to identify their networking needs and develop solutions that meet those needs. This course may be useful for a career as a Network Engineer by teaching you the core concepts of computer networking, including network design, network development, and network administration. You will also learn how to use popular networking tools and technologies, such as Cisco, Juniper, and F5.
Database Administrator
Database Administrators design, build, and maintain databases. They work with businesses to identify their database needs and develop solutions that meet those needs. This course may be useful for a career as a Database Administrator by teaching you the core concepts of database management, including database design, database development, and database administration. You will also learn how to use popular database management systems, such as MySQL, PostgreSQL, and Oracle.
Software Engineer
Software Engineers design, build, and maintain software applications. They work with businesses to identify their software needs and develop solutions that meet those needs. This course may be useful for a career as a Software Engineer by teaching you the core concepts of software engineering, including software design, software development, and software testing. You will also learn how to use popular software engineering tools and technologies, such as Java, Python, and C++.
Security Engineer
Security Engineers design, build, and maintain computer security systems. They work with businesses to identify their security needs and develop solutions that meet those needs. This course may be useful for a career as a Security Engineer by teaching you the core concepts of computer security, including computer security design, computer security development, and computer security administration. You will also learn how to use popular computer security tools and technologies, such as firewalls, intrusion detection systems, and antivirus software.
Systems Engineer
Systems Engineers design, build, and maintain computer systems. They work with businesses to identify their computing needs and develop solutions that meet those needs. This course may be useful for a career as a Systems Engineer by teaching you the core concepts of computer engineering, including computer hardware, computer software, and computer networks. You will also learn how to use popular computer engineering tools and technologies, such as Linux, Windows, and Cisco.
IT Auditor
IT Auditors audit computer systems and networks. They work with businesses to identify their IT audit needs and develop solutions that meet those needs. This course may be useful for a career as an IT Auditor by teaching you the core concepts of IT auditing, including IT audit techniques, IT audit tools, and IT audit deliverables. You will also learn how to use popular IT audit tools and technologies, such as ACL, IDEA, and Argh.
Business Analyst
Business Analysts work with businesses to identify their business needs and develop solutions that meet those needs. This course may be useful for a career as a Business Analyst by teaching you the core concepts of business analysis, including business analysis techniques, business analysis tools, and business analysis deliverables. You will also learn how to use popular business analysis tools and technologies, such as Microsoft Visio, IBM Rational Rose, and Oracle Business Process Analysis Suite.
Project Manager
Project Managers plan, execute, and close projects. They work with businesses to identify their project needs and develop solutions that meet those needs. This course may be useful for a career as a Project Manager by teaching you the core concepts of project management, including project management techniques, project management tools, and project management deliverables. You will also learn how to use popular project management tools and technologies, such as Microsoft Project, Asana, and Trello.
Technical Writer
Technical Writers write technical documentation. They work with businesses to identify their documentation needs and develop solutions that meet those needs. This course may be useful for a career as a Technical Writer by teaching you the core concepts of technical writing, including technical writing techniques, technical writing tools, and technical writing deliverables. You will also learn how to use popular technical writing tools and technologies, such as Microsoft Word, Adobe FrameMaker, and MadCap Flare.

Reading list

We've selected 25 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud.
Definitive guide to Spark, the unified analytics engine for big data processing. It would be a valuable resource for this course as it provides a comprehensive overview of Spark's capabilities.
Comprehensive guide to big data analytics, covering everything from strategic planning to enterprise integration. It would be a valuable addition to this course as it provides a broader perspective on the field of big data.
Practical guide to learning Spark. It would be a valuable resource for this course as it provides a hands-on approach to learning Spark's capabilities.
Provides a comprehensive overview of Spark, including its architecture, components, and applications. It valuable resource for anyone who wants to learn more about Spark and how to use it to process big data.
Provides a comprehensive overview of cloud computing. It would be a valuable addition to this course as it provides a foundation in the principles of cloud computing.
Provides a comprehensive overview of data management for big data. It would be a valuable addition to this course as it provides a foundation in the principles of data management.
Provides a practical introduction to deep learning for programmers, using the Fastai library and PyTorch, with a focus on building and deploying real-world applications.
Provides a practical introduction to data analysis with Pandas. It would be a valuable addition to this course as it provides a foundation in the principles of data analysis.
Provides a practical introduction to deep learning with Python, covering fundamental concepts and popular deep learning libraries.
Introduces NoSQL databases and their advantages over traditional relational databases, providing a practical guide to their use and implementation.
Provides a comprehensive overview of NoSQL databases, including their design, architecture, and applications. It valuable resource for anyone who wants to learn more about NoSQL databases and how to use them to build scalable, high-performance applications.
Offers a non-technical introduction to data science, focusing on how to apply data-driven insights to business problems.
Provides a comprehensive overview of big data analytics with Java, including its methods, tools, and applications. It valuable resource for anyone who wants to learn more about big data analytics and how to use Java to build data-intensive applications.
Provides a comprehensive overview of Hadoop operations, including its architecture, components, and administration. It valuable resource for anyone who wants to learn more about Hadoop operations and how to manage Hadoop clusters.
Provides a comprehensive overview of graph databases, including their design, architecture, and applications. It valuable resource for anyone who wants to learn more about graph databases and how to use them to build scalable, high-performance applications.
Provides a comprehensive overview of cloud computing for data analysis, including its benefits, challenges, and applications. It valuable resource for anyone who wants to learn more about cloud computing and how to use it to build data-intensive applications.
A practical guide to using Redis, an open-source, in-memory data structure store, for caching and other applications.
A comprehensive guide to deep learning, providing a theoretical and practical foundation for this rapidly growing field.
A classic textbook on statistical learning, providing a comprehensive overview of the field and its applications.
An introduction to big data analytics, providing a high-level overview of the field and its applications.
A comprehensive textbook on machine learning, providing a probabilistic approach to the field.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud.
Structured Streaming in Apache Spark 2
Most relevant
Apache Spark 3 Fundamentals
Most relevant
Streaming API Development and Documentation
Most relevant
Big Data, Hadoop, and Spark Basics
Most relevant
Cloud Computing Applications, Part 1: Cloud Systems and...
Most relevant
Getting Started with Stream Processing with Spark...
Most relevant
Kafka Fundamentals
Most relevant
Getting Started with Apache Spark on Databricks
Most relevant
Conceptualizing the Processing Model for Apache Spark...
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser