本课程将重点讲解高级大数据系统的实现、优化和应用,包括分布式文件系统、MapReduce/Spark、Storm/Spark streaming、Mahout等系统的原理、实现、策略优化。
本课程将重点讲解高级大数据系统的实现、优化和应用,包括分布式文件系统、MapReduce/Spark、Storm/Spark streaming、Mahout等系统的原理、实现、策略优化。
近年来,人工智能技术正在快速地渗透进各个不同领域。因大数据系统是当今数据驱动人工智能的基础,而变得至关重要。本课程旨在引导学生了解大数据系统的基本概念,包括如何有效地存储、处理和分析数据。课程从分布式系统设计的一般原理出发。之后我们提供了如何在大数据系统中评定存储、计算和网络功能的框架。最后,为了使这些设计原则便于理解,我们的案例研究将使用真实的工业系统来演示基本设计原则如何应用于实际系统,以及该如何分析它们的性能以及局限性。
Recent years have witnessed the rapid increase of the penetration of AI technology into different areas in the industry. Big data systems, the foundation that enables today’s data-driven AI, are thus becoming critically important. This course is dedicated to lead students into the basic concepts of big data systems, covering how data is effectively stored, processed and analyzed. We start from the general principles in the design of distributed systems; then we provide frameworks on how storage, computation, and network capabilities are scaled in big data systems; finally, to make such design principles easy to follow, our case studies use real industrial systems to demonstrate how the basic design principles are applied in real-world systems as well as how their performance and limitation are analyzed.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.