We may earn an affiliate commission when you visit our partners.
Course image
Geoffrey Hubona, Ph.D.

More Data Mining with R presents a comprehensive overview of a myriad of contemporary data mining techniques. More Data Mining with R is the logical follow-on course to the preceding Udemy course Data Mining with R: Go from Beginner to Advanced although it is not necessary to take these courses in sequential order. Both courses examine and explain a number of data mining methods and techniques, using concrete data mining modeling examples, extended case studies, and real data sets. Whereas the preceding Data Mining with R: Go from Beginner to Advanced course focuses on: (1) linear, logistic and local polynomial regression; (2) decision, classification and regression trees (CART); (3) random forests; and (4) cluster analysis techniques, this course, More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about: (1) association analysis (or market basket analysis) and creating, mining and interpreting association rules using several case examples; (2) network analysis, including the versatile iGraph visualization capabilities, as well as social network data mining analysis cases (marriage and power; friendship links); (3) text mining using Twitter data and word clouds; (4) text and string manipulation, including the use of 'regular expressions'; (5) time series data mining and analysis, including an extended case study forecasting house price indices in Canberra, Australia.

Enroll now

What's inside

Learning objectives

  • Understand the conceptual foundations of association analysis and perform market basket analyses.
  • Be able to create visualizations of social (and other) networks using the igraph package.
  • Understand how to examine and mine social network data to understand all of the implicit relationships.
  • Mine text data to create word association visualizations, term documents with word frequency counts and associations, and create word clouds.
  • Learn how to process text and string data, including the use of 'regular expressions'.
  • Extract prototypical information about cycles from time series data.

Syllabus

Students are given a crash course on using R software and then introduced . . .
Welcome to More Data Mining with R !

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Read more
Data Input and Output (part 1)
Data Input and Output (part 2)

Data visualization or data visualisation is viewed by many disciplines as a modern equivalent of visual communication. It is not owned by any one field, but rather finds interpretation across many (e.g. it is viewed as a modern branch of descriptive statistics by some, but also as a grounded theory development tool by others). It involves the creation and study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information".

A primary goal of data visualization is to communicate information clearly and efficiently to users via the statistical graphics, plots, information graphics, tables, and charts selected. Effective visualization helps users in analyzing and reasoning about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understandingcausality, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look-up a specific measure of a variable, while charts of various types are used to show patterns or relationships in the data for one or more variables.

More R Scripting and Visualizations (part 2)
More Input and Output (part 1)
More Input and Output (part 2)
Homework Exercise: Execute Second Set of Scripts on your Own
Know about and work with association analysis and market basket analysis

Affinity analysis, a form of association analysis, is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans.

Introduction to Association Analysis (part 2)

The sinking of the Titanic is a famous event, and new books are still being published about it. Many well-known facts—from the proportions of first-class passengers to the 'women and children first' policy, and the fact that that policy was not entirely successful in saving the women and children in the third class—are reflected in the survival rates for various classes of passenger.

These data were originally collected by the British Board of Trade in their investigation of the sinking. Note that there is not complete agreement among primary sources as to the exact numbers on board, rescued, or lost.

Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Rakesh Agrawal et al.[2] introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example frommarket basket analysis association rules are employed today in many application areas including Web usage mining, intrusion detection, Continuous production, and bioinformatics. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.

Rule Mining with Titanic Dataset (part 2)
Interpreting Rules

Association rule mining is one of the most popular data mining methods. However, mining association rules often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. Sifting manually through large sets of rules is time consuming and strenuous. Visualization has a long history of making large amounts of data better accessible using techniques like selecting and zooming. However, most association rule visualization techniques are still falling short when it comes to a large number of rules. In this paper we present a new interactive visualization technique which lets the user navigate through a hierarchy of groups of association rules. We demonstrate how this new visualization techniques can be used to analyze a large sets of association rules with examples from our implementation in the R-package arulesViz.

Visualizing Association Rules (part 2)
Know about association analysis and market basket analysis in specific cases of an online radio predictor system and for predicting in income

In data mining and association rule learning, lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. A targeting model is doing a good job if the response within the target is much better than the average for the population as a whole. Lift is simply the ratio of these values: target response divided by average response.

For example, suppose a population has an average response rate of 5%, but a certain model (or rule) has identified a segment with a response rate of 20%. Then that segment would have a lift of 4.0 (20%/5%).

Typically, the modeller seeks to divide the population into quantiles, and rank the quantiles by lift. Organizations can then consider each quantile, and by weighing the predicted response rate (and associated financial benefit) against the cost, they can decide whether to market to that quantile or not.

Lift is analogous to information retrieval's average precision metric, if one treats the precision (fraction of the positives that are true positives) as the target response probability.

The lift curve can also be considered a variation on the receiver operating characteristic (ROC) curve, and is also known in econometrics as the Lorenz or power curve.

The difference between the lifts observed on two different subgroups is called the uplift. The subtraction of two lift curves forms the uplift curve, which is a metric used in uplift modelling.

It is important to note that in general marketing practice the term Lift is also defined as the difference in response rate between the treatment and control groups, indicating the causal impact of a marketing program (versus not having it as in the control group). As a result, "no lift" often means there is no statistically significant effect of the program. On top of this, uplift modelling is a predictive modeling technique to improve (up) lift over control.

Association Rules Reviewed (part 2)
Online Radio Predictor Example (part 1)
Online Radio Predictor Example (part 2)
Predicting Income Example (part 1)
Predicting Income Example (part 2)
Predicting Income Example (part 3)
Know about and work with social networks, especially using iGraph visualizations

igraph is a library collection for creating and manipulating graphs and analyzing networks. It is written in C/C++ and also exists as Python and R packages. The software is widely used in academic research in network science and related fields.

iGraph Visualization Examples (part 1)
iGraph Visualization Examples (part 2)
iGraph Measurement Examples (part 3)
iGraph Measurement Examples (part 4)
iGraph Visualization Examples (part 5)
iGraph Visualization Examples (part 6)
iGraph Visualization Examples (part 7)
Know about and work with social networks

The term 'social network' is increasingly used in the mainstream where it is inextricably tied to notions of influence. Mark Granovetter's articles on "The Strength of Weak Ties" (Granovetter 1973) and "Threshold Models of Collective Behavior" (Granovetter 1978) were probably the first to ignite public fascination with social networks and the spread of ideas, but Malcom Gladwell's (2000) best selling The Tipping Point is surely responsible for the most recent public fascination with social networks and the spread of social phenomena. Gladwell writes that change occurs when sociological phenomena (ideas, products, behaviors) reach critical mass; in other words, these phenomena spread through society like diseases. This idea has proven so attractive to that people now use the expression "that video went viral" to describe popular YouTube clips. In Gladwell's "framework," the success or failure of any social epidemic depends on the configuration of the network of social ties, which are analogous to disease vectors. He argues that a relatively few number of people, known as "connectors, mavens, and salesmen" hold the keys to spreading a good idea to a large enough number of people so it 'sticks.' The implication is that with the right combination of these few people on your side, you wield major social influence.

Visual Network: Marriage and Power in 15th Century Florence (part 1)
Visual Network: Marriage and Power in 15th Century Florence (part 2)
Example: Friendship Network (part 1)
Example: Friendship Network (part 2)
Example: Friendship Network (part 3)
Learn about a specific extended case of analyzing Twitter tweets as an example of text mining.

The data to analyze is Twitter text data of @RDataMining used in the example of Text Mining, and it can be downloaded as file “termDocMatrix.rdata” at the Data webpage. Putting it in a general scenario of social networks, the terms can be taken as people and the tweets as groups on LinkedIn, and the term-document matrix can then be taken as the group membership of people. We will build a network of terms based on their co-occurrence in the same tweets, which is similar with a network of people based on their group memberships.

Transforming Twitter Data
Stemming and Frequency Counts
Building a Text Term Document
Frequent Terms and Associations
Word Cloud and Word Clustering
K-Means and K-Medoids Clustering
Using Lists for Text Processing (part 1)
Using Lists for Text Processing (part 2)
Using Lists for Text Processing (part 3)
Learn more about manipulating string data (variables) and using regular expressions in R.
Introduction to String Manipulation (slides, part 1)
Introduction to String Manipulation (slides, part 2)
Text and String Manipulation Script Demos (part 1)
Text and String Manipulation Demos (part 2)
Text and String Manipulation Demos (part 3)
Text and String Manipulation Demos (part 4)

A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE. There is a also fixed = TRUE which can be considered to use a literal regular expression.

More Advanced Regular Expression Capabilities (slides and script)
Know about and work with time series data to extract time series cycle components

Time series decomposition is to decompose a time series into trend, seasonal, cyclical and irregular components.

Maine Unemployment Data (part 2)
Airline Travel Example
Electric Consumption in Australia (part 1)
Electric Consumption in Australia (part 2)
Time Series Clustering (part 1)
Time Series Clustering (part 2)
Time Series Classification
How to apply the time series data mining procedures in a specific case of house price changes over time in Canberra.
Forecasting House Prices: Exploring the Data (part 1)
Forecasting House Prices: Exploring the Data (part 2)
Forecast House Prices: Use Trend and Seasonal Components

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Develops knowledge of association analysis and market basket analysis, which are core skills for data scientists
Teaches how to manipulate string data (variables) and use regular expressions in R, which are useful tools for data manipulation
Examines social networks, which is highly relevant to diverse domains such as social science and marketing
Taught by Geoffrey Hubona, Ph.D. who is recognized for their work in data mining and machine learning

Save this course

Save More Data Mining with R to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in More Data Mining with R with these activities:
Revisit basic R syntax and functions
Reviewing core concepts will better prepare you to understand the more advanced topics in this course.
Browse courses on R Functions
Show steps
  • Go through your notes or study materials from a previous R course or tutorial.
  • Complete a few practice exercises to test your understanding of basic R syntax and functions.
Mentor junior data miners or students
By mentoring others, you can reinforce your own understanding of data mining concepts while fostering the growth of the next generation of data miners.
Browse courses on Mentoring
Show steps
  • Identify opportunities to mentor junior data miners or students, such as volunteering at a local university or joining a mentorship program.
  • Share your knowledge and experiences, providing guidance and support to your mentees.
  • Encourage your mentees to ask questions and explore different aspects of data mining.
Join a study group or online forum for data mining
Engaging with peers will provide you with opportunities to discuss concepts, share insights, and learn from others' experiences.
Show steps
  • Search for study groups or online forums dedicated to data mining.
  • Join a group that aligns with your interests and level of expertise.
  • Participate in discussions, ask questions, and share your own knowledge.
Five other activities
Expand to see all activities and additional details
Show all eight activities
Practice data manipulation in R
Hands-on practice with data manipulation will solidify your understanding of the concepts and techniques covered in this course.
Browse courses on Data Manipulation
Show steps
  • Find a dataset online or use one provided by the course instructor.
  • Load the dataset into R and explore its structure using functions like `head()`, `str()`, and `summary()`.
  • Practice data cleaning and transformation tasks, such as removing duplicates, handling missing values, and creating new variables.
Contribute to open-source data mining projects
Contributing to open-source projects will provide you with hands-on experience, expand your network, and enhance your understanding of the latest data mining techniques.
Browse courses on Open Source
Show steps
  • Identify open-source data mining projects that align with your interests.
  • Familiarize yourself with the project's codebase and documentation.
  • Identify areas where you can contribute, such as bug fixes, feature enhancements, or documentation improvements.
Attend data mining workshops or conferences
Attending workshops or conferences will expose you to the latest advancements in data mining and provide opportunities to network with experts.
Show steps
  • Research upcoming data mining workshops or conferences.
  • Identify events that cover topics relevant to your interests and career goals.
  • Register for the event and actively participate in sessions, presentations, and networking opportunities.
Explore advanced data mining techniques in iGraph
Engaging with tutorials will provide you with additional insights and practical examples of advanced data mining techniques using iGraph.
Browse courses on Social Network Analysis
Show steps
  • Search for tutorials on iGraph's website or other online resources.
  • Follow the tutorials step-by-step, experimenting with different parameters and datasets.
  • Apply the techniques you learn to your own research or projects.
Create a data mining portfolio
Developing a portfolio will provide you with a tangible showcase of your data mining skills and enhance your employability.
Browse courses on Portfolio Development
Show steps
  • Gather your best data mining projects, assignments, or research work.
  • Create a website or online platform to showcase your portfolio.
  • Include clear descriptions, code snippets, and visualizations to demonstrate your abilities.

Career center

Learners who complete More Data Mining with R will develop knowledge and skills that may be useful to these careers:
Quantitative Analyst
Quantitative Analysts use mathematical and statistical modeling to analyze and interpret financial data. More Data Mining with R focuses on advanced data mining techniques, using concrete data mining modeling examples, extended case studies, and real data sets. The skills learned in this course could be of great value to a Quantitative Analyst.
Statistician
Statisticians collect, analyze, interpret, and present data. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis (or market basket analysis), network analysis, text mining, and time series data mining and analysis. This course may be useful for Statisticians looking to expand their knowledge of data mining techniques.
Data Scientist
As a Data Scientist, you will apply scientific methods and algorithms to extract knowledge and insights from data. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis, network analysis, text mining, text and string manipulation, and time series data mining and analysis. This course may provide a helpful foundation for your career as a Data Scientist.
Machine Learning Engineer
Machine Learning Engineers develop and maintain machine learning models. More Data Mining with R provides a comprehensive overview of a myriad of contemporary data mining techniques. While not specifically focused on machine learning, the data mining techniques explored in this course could be useful for Machine Learning Engineers.
Operations Research Analyst
Operations Research Analysts use advanced analytical techniques to solve complex problems in business and industry. More Data Mining with R provides a comprehensive overview of contemporary data mining techniques, with a focus on association analysis (or market basket analysis), network analysis, text mining, and time series data mining and analysis. The data mining techniques explored in this course could be useful for an Operations Research Analyst.
Risk Analyst
Risk Analysts assess and manage risks to businesses and organizations. More Data Mining with R provides a comprehensive overview of a myriad of contemporary data mining techniques. The skills learned in this course could be useful for Risk Analysts seeking to expand their knowledge of the data mining process and its application in risk management.
Product Manager
Product Managers oversee the development and marketing of products. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis (or market basket analysis), network analysis, text mining, and time series data mining and analysis. The data mining techniques explored in this course could be useful for Product Managers seeking to improve their understanding of customer behavior and market trends.
Market Research Analyst
Market Research Analysts study market conditions, identify trends, and analyze marketing strategies to determine potential opportunities for businesses. More Data Mining with R provides detailed instruction and plentiful "hands-on" examples about association analysis. This course may be useful for Market Research Analysts to better understand the utility of association analysis.
Business Analyst
Business Analysts use data and analytical techniques to identify and solve business problems. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis (or market basket analysis), network analysis, text mining, and time series data mining and analysis. This course may be useful for Business Analysts looking for a comprehensive overview of contemporary data mining techniques.
Financial Analyst
Financial Analysts make investment recommendations and provide advice to clients. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis. This course may be useful for Financial Analysts looking to learn more about association analysis for investment research.
Data Engineer
Data Engineers build and maintain the infrastructure that stores and processes data. More Data Mining with R is a course that provides a comprehensive overview of contemporary data mining techniques. While not specifically focused on data engineering, the skills learned in this course may be useful for Data Engineers.
Actuary
Actuaries use mathematical and statistical models to assess risk and uncertainty. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis (or market basket analysis). While not specifically focused on actuarial science, the data mining techniques explored in this course could be useful for Actuaries.
Epidemiologist
Epidemiologists investigate the causes and patterns of health and disease in populations. More Data Mining with R provides a comprehensive overview of a myriad of contemporary data mining techniques. While not specifically focused on epidemiology, the skills learned in this course may be useful for Epidemiologists.
Data Analyst
In the role of a Data Analyst, you will gather, clean, and interpret data to make data-driven decisions. More Data Mining with R is a course that provides a comprehensive overview of a myriad of contemporary data mining techniques. This could be useful for a Data Analyst seeking to expand their knowledge of the data mining process.
Software Engineer
Software Engineers design, develop, and maintain software systems. More Data Mining with R uses the R software, which is a popular programming language for data analysis and data mining. This course may be useful for Software Engineers interested in learning more about the R programming language and how to use it for data mining tasks.

Reading list

We've selected 13 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in More Data Mining with R.
Provides a comprehensive overview of statistical learning. It covers a wide range of topics, including supervised learning, unsupervised learning, and model evaluation.
Provides a comprehensive overview of forecasting techniques. It covers a wide range of topics, including time series decomposition, forecasting models, and evaluation methods.
Provides a comprehensive overview of pattern recognition and machine learning. It covers a wide range of topics, including supervised learning, unsupervised learning, and model evaluation.
This comprehensive textbook covers a wide range of data mining techniques, including association analysis, clustering, and time series analysis. It valuable resource for both students and practitioners.
Provides a comprehensive overview of data science for business. It covers a wide range of topics, including data collection, data analysis, and data visualization.
Provides a comprehensive overview of advanced analytics with Spark. It covers a wide range of topics, including data engineering, machine learning, and graph processing.
Provides a comprehensive overview of big data analytics. It covers a wide range of topics, including data collection, data analysis, and data visualization.
Provides a comprehensive overview of association rule mining, including algorithms, applications, and evaluation techniques. It valuable resource for researchers and practitioners.
Provides a comprehensive overview of data mining applications with R. It covers a wide range of topics, including data preprocessing, feature engineering, and model evaluation.
Provides a comprehensive overview of time series analysis using R. It covers a wide range of topics, including time series decomposition, forecasting, and statistical inference.
Provides a comprehensive overview of string manipulation in R. It covers a wide range of topics, including regular expressions, string matching, and string replacement.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser