We may earn an affiliate commission when you visit our partners.
Course image
Ilkay Altintas and Amarnath Gupta

Welcome to the Capstone Project for Big Data! In this culminating project, you will build a big data ecosystem using tools and methods form the earlier courses in this specialization. You will analyze a data set simulating big data generated from a large number of users who are playing our imaginary game "Catch the Pink Flamingo". During the five week Capstone Project, you will walk through the typical big data science steps for acquiring, exploring, preparing, analyzing, and reporting. In the first two weeks, we will introduce you to the data set and guide you through some exploratory analysis using tools such as Splunk and Open Office. Then we will move into more challenging big data problems requiring the more advanced tools you have learned including KNIME, Spark's MLLib and Gephi. Finally, during the fifth and final week, we will show you how to bring it all together to create engaging and compelling reports and slide presentations. As a result of our collaboration with Splunk, a software company focus on analyzing machine-generated big data, learners with the top projects will be eligible to present to Splunk and meet Splunk recruiters and engineering leadership.

Enroll now

What's inside

Syllabus

Simulating Big Data for an Online Game
This week we provide an overview of the Eglence, Inc. Pink Flamingo game, including various aspects of the data which the company has access to about the game and users and what we might be interested in finding out.
Read more
Acquiring, Exploring, and Preparing the Data
Next, we begin working with the simulated game data by exploring and preparing the data for ingestion into big data analytics applications.
Data Classification with KNIME
This week we do some data classification using KNIME.
Clustering with Spark
This week we do some clustering with Spark.
Graph Analytics of Simulated Chat Data With Neo4j
This week we apply what we learned from the 'Graph Analytics With Big Data' course to simulated chat data from Catch the Pink Flamingos using Neo4j. We analyze player chat behavior to find ways of improving the game.
Reporting and Presenting Your Work
Final Submission

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Teaches real-world big data science steps for acquiring, exploring, preparing, analyzing, and reporting
Applies industry-standard tools and methods for analyzing big data
Taught by experienced instructors in the field of big data science
Provides opportunities for learners to present their work to potential employers
May not be suitable for learners without prior knowledge of big data concepts
Requires access to specialized software and tools, which may incur additional expenses

Save this course

Save Big Data - Capstone Project to your list so you can find it easily later:
Save

Reviews summary

Capstone course with shared information

learners say that this capstone project has been an excellent learning experience where the instructor and fellow members have shared their valuable information during the course of the learning and capstone project phase.
This course provided an great learning experience.
"This has been excellent Learning experience."
The instructor and fellow members share valuable information.
"Instructor and fellow members shared their valuable information during the course of the Learning and Capstone Project phase."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data - Capstone Project with these activities:
Review Basic Statistics
Brush up on the foundational statistical concepts such as descriptive statistics, probability theory, and statistical distributions to strengthen your background for this course.
Browse courses on Statistical Analysis
Show steps
  • Review your class notes or textbooks on descriptive statistics, probability theory, and statistical distributions.
  • Practice solving basic statistical problems using online resources or textbooks.
  • Conduct a self-assessment to identify areas where you need further reinforcement.
Data Analytics Practice Exercises
Enhance your analytical skills by engaging in regular practice exercises. This will reinforce your understanding of data manipulation techniques, statistical analysis, and machine learning algorithms.
Show steps
  • Find online platforms or textbooks that offer data analytics practice exercises.
  • Regularly solve these exercises, focusing on accuracy and efficiency.
  • Analyze your performance and identify areas where you need further improvement.
Guided Tutorials on Big Data Tools
Enhance your proficiency in big data tools and techniques used in this course by following guided tutorials. This will provide hands-on experience and strengthen your understanding of the concepts.
Show steps
  • Identify a reputable online platform or course that offers guided tutorials on KNIME, Spark, and Gephi.
  • Follow the tutorials step-by-step, completing the exercises and assignments.
  • Apply what you learn to small-scale projects or datasets to reinforce your understanding.
Five other activities
Expand to see all activities and additional details
Show all eight activities
Big Data Workshop: Industry Trends and Applications
Attend industry workshops to learn about emerging trends, innovative applications, and best practices in big data. Engage with experts and professionals to gain valuable insights.
Show steps
  • Research and identify upcoming big data workshops hosted by industry leaders or reputable organizations.
  • Register for the workshop and actively participate in the sessions.
  • Engage with speakers, ask questions, and network with other attendees.
Online Study Group
Join an online study group with peers to engage in discussions, share insights, and collaborate on assignments. This will enhance your understanding, identify areas for improvement, and provide support from fellow learners.
Show steps
  • Reach out to fellow students through online forums or social media groups.
  • Establish a regular meeting schedule and platform for discussions.
  • Actively participate in discussions, sharing your perspectives and seeking clarification from others.
  • Collaborate on assignments, offering support and feedback to each other.
Advanced Big Data Techniques Tutorial
Explore advanced big data techniques to expand your knowledge and skills. This will prepare you for more complex challenges and applications in the field.
Show steps
  • Identify a reputable online platform or course that offers advanced big data techniques tutorials.
  • Follow the tutorials, completing the exercises and assignments.
  • Implement the techniques learned on real-world datasets or projects.
Contribute to Big Data Open-Source Projects
Engage with the open-source community to gain practical experience and contribute to the advancement of big data technologies.
Show steps
  • Identify open-source big data projects that align with your interests and skill level.
  • Review the project documentation and codebase to understand its functionality.
  • Identify areas where you can contribute, such as bug fixes or feature enhancements.
  • Submit your contributions to the project repository and engage with the community for feedback.
Capstone Project: Big Data Analytics Solution
Apply your knowledge and skills to develop an end-to-end big data analytics solution for a real-world problem. This will showcase your proficiency in the course material and prepare you for industry challenges.
Show steps
  • Identify a suitable problem statement or dataset for your project.
  • Develop a research plan, including data collection, analysis techniques, and evaluation metrics.
  • Implement your analytics solution using appropriate tools and techniques.
  • Present your findings and insights in a comprehensive report and presentation.

Career center

Learners who complete Big Data - Capstone Project will develop knowledge and skills that may be useful to these careers:
Machine Learning Engineer
**Machine Learning Engineers** build, deploy, and maintain machine learning models. The Capstone Project provides hands-on experience with advanced analytics tools such as KNIME, Spark's MLLib, and Gephi, which are used in machine learning model development. By completing this course, you can gain valuable skills for this role.
Data Engineer
**Data Engineers** build and maintain data pipelines and infrastructure. The Capstone Project provides hands-on experience with big data management and analysis tools. By completing this course, you will gain valuable skills in data acquisition, exploration, preparation, analysis, and reporting, which are essential for this role.
Big Data Engineer
**Big Data Engineers** design, build, and maintain big data systems. The Capstone Project simulates big data generated by a large number of users of an online game, making it highly relevant to this role. Hands-on experience with tools such as Splunk, KNIME, Spark's MLLib, and Gephi will enable you to perform advanced analytics on big data.
Data Visualization Specialist
**Data Visualization Specialists** create visual representations of data to communicate insights and trends. This Capstone Project includes creating compelling reports and slide presentations based on big data analysis. The course provides hands-on experience with visualization tools, such as Splunk and Gephi, which will be valuable in this role.
Data Scientist
**Data Scientists** use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data in various forms. This Capstone Project may be helpful for Data Scientists by providing experience with data classification using KNIME, clustering with Spark, and graph analytics with Neo4j. It can help build a foundation for success in this role.
Data Architect
**Data Architects** design and manage data systems to meet business needs. The Capstone Project provides experience with big data ecosystem design using tools and methods from the specialization. Familiarity with the data acquisition, exploration, preparation, analysis, and reporting processes covered in the course will be valuable in this role.
Quantitative Analyst
**Quantitative Analysts** use mathematical and statistical models to analyze data and make predictions. The Capstone Project provides exposure to big data analysis and modeling techniques. By completing this course, you will gain valuable skills in data acquisition, exploration, preparation, analysis, and reporting, which will be useful in this role.
Risk Analyst
**Risk Analysts** identify and assess risks to businesses. The Capstone Project provides experience with big data analysis and risk modeling techniques. By completing this course, you will gain valuable skills in data acquisition, exploration, preparation, analysis, and reporting, which will be useful in this role.
Operations Research Analyst
**Operations Research Analysts** use analytical methods to solve complex business problems. The Capstone Project provides experience with big data analysis and optimization techniques. By completing this course, you will gain valuable skills in data acquisition, exploration, preparation, analysis, and reporting, which will be useful in this role.
Business Intelligence Analyst
**Business Intelligence Analysts** use data to improve business processes and make better decisions. The Capstone Project provides exposure to real-world business data through the simulated game data. It covers topics like data exploration, preparation, analysis, and reporting, which are essential skills for this role.
Database Administrator
**Database Administrators** manage and maintain databases. The Capstone Project provides experience with big data management techniques. You will learn about data acquisition, exploration, preparation, analysis, and reporting, which are essential skills for database administration.
Cloud Architect
**Cloud Architects** design and manage cloud computing systems. The Capstone Project provides exposure to big data analytics in the cloud. By working with tools like Splunk and Open Office, you can gain insights into cloud-based data management and analysis techniques, which will be valuable in this role.
Product Manager
**Product Managers** manage the development and launch of new products. The Capstone Project provides insights into big data analytics and its applications in product development. You will learn about data acquisition, exploration, preparation, analysis, and reporting, which can be valuable in understanding user needs and making data-driven product decisions.
Data Analyst
**Data Analysts** apply analytical methods and statistical techniques to analyze data and draw meaningful conclusions. This Capstone Project may be useful for Data Analysts by providing hands-on experience with acquiring, exploring, preparing, analyzing, and reporting big data. Familiarity with KNIME, Spark's MLLib, and Gephi will be particularly valuable in this role.
Software Engineer
**Software Engineers** design, develop, and maintain software systems. The Capstone Project provides experience with big data analytics tools and technologies. By completing this course, you can gain valuable skills in data acquisition, exploration, preparation, analysis, and reporting, which are transferable to software development roles.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Big Data - Capstone Project.
This comprehensive textbook provides a thorough overview of machine learning algorithms and their applications in big data analytics. It would serve as a valuable reference for learners interested in exploring machine learning in greater depth.
Provides a comprehensive introduction to text processing using MapReduce. It would be valuable for learners who want to specialize in text mining and natural language processing.
This definitive guide to Hadoop covers the fundamentals of the platform and its use in big data processing. It would be a valuable reference for learners who want to specialize in Hadoop.
Provides a comprehensive introduction to big data analytics using the R programming language. It would be valuable for learners who want to use R for data analysis and visualization.
This practical guide introduces deep learning concepts and techniques using Fastai and PyTorch. It would be helpful for learners who want to explore deep learning without a strong background in mathematics or computer science.
Introduces the principles and techniques of data visualization, emphasizing the effective communication of data insights. It would be useful for learners who want to develop their data visualization skills.
This concise guide provides an overview of NoSQL databases and their role in big data analytics. It would be useful for learners who want to understand the different NoSQL options available.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser