Analyzing Big Data with Microsoft R
Heads up! This course may be archived and/or unavailable.
The open-source programming language R has for a long time been popular (particularly in academia) for data processing and statistical analysis. Among R's strengths are that it's a succinct programming language and has an extensive repository of third party libraries for performing all kinds of analyses. Together, these two features make it possible for a data scientist to very quickly go from raw data to summaries, charts, and even full-blown reports. However, one deficiency with R is that traditionally it uses a lot of memory, both because it needs to load a copy of the data in its entirety as a data.frame object, and also because processing the data often involves making further copies (sometimes referred to as copy-on-modify). This is one of the reasons R has been more reluctantly received by industry compared to academia.
The main component of Microsoft R Server (MRS) is the RevoScaleR package, which is an R library that offers a set of functionalities for processing large datasets without having to load them all at once in the memory. RevoScaleR offers a rich set of distributed statistical and machine learning algorithms, which get added to over time. Finally, RevoScaleR also offers a mechanism by which we can take code that we developed on our laptop and deploy it on a remote server such as SQL Server or Spark (where the infrastructure is very different under the hood), with minimal effort.
In this course, we will show you how to use MRS to run an analysis on a large dataset and provide some examples of how to deploy it on a Spark cluster or a SQL Server database. Upon completion, you will know how to use R for big-data problems.
Since RevoScaleR is an R package, we assume that the course participants are familiar with R. A solid understanding of R data structures (vectors, matrices, lists, data frames, environments) is required. Familiarity with 3rd party packages such as dplyr is also helpful.
edX offers financial assistance for learners who want to earn Verified Certificates but who may not be able to pay the fee. To apply for financial assistance, enroll in the course, then follow this link to complete an application for assistance.
Get a Reminder
Rating | Not enough ratings |
---|---|
Length | 4 weeks |
Effort | 2 - 4 hours per week |
Starts | Oct 1 (238 weeks ago) |
Cost | $99 |
From | Microsoft via edX |
Instructors | Seth Mottaghinejad, Jonathan Sanito |
Download Videos | On all desktop and mobile devices |
Language | English |
Subjects | Data Science |
Tags | Data Analysis & Statistics |
Get a Reminder
Similar Courses
Careers
An overview of related careers and their average salaries in the US. Bars indicate income percentile.
SQL Server Administrator $77k
IT Conslutant - SQL Server $83k
SQL Server/Oracle DBA $95k
SQL Server Instructor $95k
PHP and SQL Server Developer $103k
SQL Server | BI SQL Server developer $108k
Microsoft SQL Server Developer $109k
Database Administrator SQL Server $115k
SQL Server DBA 5 $122k
SQL Server Architect $138k
SQL Server MVP 5 $156k
SQL Server MVP 3 $163k
Write a review
Your opinion matters. Tell us what you think.
Please login to leave a review
Rating | Not enough ratings |
---|---|
Length | 4 weeks |
Effort | 2 - 4 hours per week |
Starts | Oct 1 (238 weeks ago) |
Cost | $99 |
From | Microsoft via edX |
Instructors | Seth Mottaghinejad, Jonathan Sanito |
Download Videos | On all desktop and mobile devices |
Language | English |
Subjects | Data Science |
Tags | Data Analysis & Statistics |
Similar Courses
Sorted by relevance
Like this course?
Here's what to do next:
- Save this course for later
- Get more details from the course provider
- Enroll in this course