Save for later

Analyzing Big Data with Microsoft R

Heads up! This course may be archived and/or unavailable.

The open-source programming language R has for a long time been popular (particularly in academia) for data processing and statistical analysis. Among R's strengths are that it's a succinct programming language and has an extensive repository of third party libraries for performing all kinds of analyses. Together, these two features make it possible for a data scientist to very quickly go from raw data to summaries, charts, and even full-blown reports. However, one deficiency with R is that traditionally it uses a lot of memory, both because it needs to load a copy of the data in its entirety as a data.frame object, and also because processing the data often involves making further copies (sometimes referred to as copy-on-modify). This is one of the reasons R has been more reluctantly received by industry compared to academia.

The main component of Microsoft R Server (MRS) is the RevoScaleR package, which is an R library that offers a set of functionalities for processing large datasets without having to load them all at once in the memory. RevoScaleR offers a rich set of distributed statistical and machine learning algorithms, which get added to over time. Finally, RevoScaleR also offers a mechanism by which we can take code that we developed on our laptop and deploy it on a remote server such as SQL Server or Spark (where the infrastructure is very different under the hood), with minimal effort.

In this course, we will show you how to use MRS to run an analysis on a large dataset and provide some examples of how to deploy it on a Spark cluster or a SQL Server database. Upon completion, you will know how to use R for big-data problems.

Since RevoScaleR is an R package, we assume that the course participants are familiar with R. A solid understanding of R data structures (vectors, matrices, lists, data frames, environments) is required. Familiarity with 3rd party packages such as dplyr is also helpful.

edX offers financial assistance for learners who want to earn Verified Certificates but who may not be able to pay the fee. To apply for financial assistance, enroll in the course, then follow this link to complete an application for assistance.

Get Details and Enroll Now

OpenCourser is an affiliate partner of edX and may earn a commission when you buy through our links.

Get a Reminder

Send to:
Rating Not enough ratings
Length 4 weeks
Effort 2 - 4 hours per week
Starts Oct 1 (238 weeks ago)
Cost $99
From Microsoft via edX
Instructors Seth Mottaghinejad, Jonathan Sanito
Download Videos On all desktop and mobile devices
Language English
Subjects Data Science
Tags Data Analysis & Statistics

Get a Reminder

Send to:

Similar Courses

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile.

SQL Server Administrator $77k

IT Conslutant - SQL Server $83k

SQL Server/Oracle DBA $95k

SQL Server Instructor $95k

PHP and SQL Server Developer $103k

SQL Server | BI SQL Server developer $108k

Microsoft SQL Server Developer $109k

Database Administrator SQL Server $115k

SQL Server DBA 5 $122k

SQL Server Architect $138k

SQL Server MVP 5 $156k

SQL Server MVP 3 $163k

Write a review

Your opinion matters. Tell us what you think.

Rating Not enough ratings
Length 4 weeks
Effort 2 - 4 hours per week
Starts Oct 1 (238 weeks ago)
Cost $99
From Microsoft via edX
Instructors Seth Mottaghinejad, Jonathan Sanito
Download Videos On all desktop and mobile devices
Language English
Subjects Data Science
Tags Data Analysis & Statistics

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now