More Data Mining with R from Udemy

What's inside

Learning objectives

Understand the conceptual foundations of association analysis and perform market basket analyses.
Be able to create visualizations of social (and other) networks using the igraph package.
Understand how to examine and mine social network data to understand all of the implicit relationships.

Mine text data to create word association visualizations, term documents with word frequency counts and associations, and create word clouds.
Learn how to process text and string data, including the use of 'regular expressions'.
Extract prototypical information about cycles from time series data.

Understand the conceptual foundations of association analysis and perform market basket analyses.
Be able to create visualizations of social (and other) networks using the igraph package.
Understand how to examine and mine social network data to understand all of the implicit relationships.
Mine text data to create word association visualizations, term documents with word frequency counts and associations, and create word clouds.
Learn how to process text and string data, including the use of 'regular expressions'.
Extract prototypical information about cycles from time series data.

Syllabus

Students are given a crash course on using R software and then introduced . . .

Welcome to More Data Mining with R !

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Data visualization or data visualisation is viewed by many disciplines as a modern equivalent of visual communication. It is not owned by any one field, but rather finds interpretation across many (e.g. it is viewed as a modern branch of descriptive statistics by some, but also as a grounded theory development tool by others). It involves the creation and study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information".

A primary goal of data visualization is to communicate information clearly and efficiently to users via the statistical graphics, plots, information graphics, tables, and charts selected. Effective visualization helps users in analyzing and reasoning about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understandingcausality, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look-up a specific measure of a variable, while charts of various types are used to show patterns or relationships in the data for one or more variables.

Affinity analysis, a form of association analysis, is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans.

The sinking of the Titanic is a famous event, and new books are still being published about it. Many well-known facts—from the proportions of first-class passengers to the 'women and children first' policy, and the fact that that policy was not entirely successful in saving the women and children in the third class—are reflected in the survival rates for various classes of passenger.

These data were originally collected by the British Board of Trade in their investigation of the sinking. Note that there is not complete agreement among primary sources as to the exact numbers on board, rescued, or lost.

Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Rakesh Agrawal et al.^[2] introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example frommarket basket analysis association rules are employed today in many application areas including Web usage mining, intrusion detection, Continuous production, and bioinformatics. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.

Association rule mining is one of the most popular data mining methods. However, mining association rules often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. Sifting manually through large sets of rules is time consuming and strenuous. Visualization has a long history of making large amounts of data better accessible using techniques like selecting and zooming. However, most association rule visualization techniques are still falling short when it comes to a large number of rules. In this paper we present a new interactive visualization technique which lets the user navigate through a hierarchy of groups of association rules. We demonstrate how this new visualization techniques can be used to analyze a large sets of association rules with examples from our implementation in the R-package arulesViz.

In data mining and association rule learning, lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. A targeting model is doing a good job if the response within the target is much better than the average for the population as a whole. Lift is simply the ratio of these values: target response divided by average response.

For example, suppose a population has an average response rate of 5%, but a certain model (or rule) has identified a segment with a response rate of 20%. Then that segment would have a lift of 4.0 (20%/5%).

Typically, the modeller seeks to divide the population into quantiles, and rank the quantiles by lift. Organizations can then consider each quantile, and by weighing the predicted response rate (and associated financial benefit) against the cost, they can decide whether to market to that quantile or not.

Lift is analogous to information retrieval's average precision metric, if one treats the precision (fraction of the positives that are true positives) as the target response probability.

The lift curve can also be considered a variation on the receiver operating characteristic (ROC) curve, and is also known in econometrics as the Lorenz or power curve.

The difference between the lifts observed on two different subgroups is called the uplift. The subtraction of two lift curves forms the uplift curve, which is a metric used in uplift modelling.

It is important to note that in general marketing practice the term Lift is also defined as the difference in response rate between the treatment and control groups, indicating the causal impact of a marketing program (versus not having it as in the control group). As a result, "no lift" often means there is no statistically significant effect of the program. On top of this, uplift modelling is a predictive modeling technique to improve (up) lift over control.

igraph is a library collection for creating and manipulating graphs and analyzing networks. It is written in C/C++ and also exists as Python and R packages. The software is widely used in academic research in network science and related fields.

The term 'social network' is increasingly used in the mainstream where it is inextricably tied to notions of influence. Mark Granovetter's articles on "The Strength of Weak Ties" (Granovetter 1973) and "Threshold Models of Collective Behavior" (Granovetter 1978) were probably the first to ignite public fascination with social networks and the spread of ideas, but Malcom Gladwell's (2000) best selling The Tipping Point is surely responsible for the most recent public fascination with social networks and the spread of social phenomena. Gladwell writes that change occurs when sociological phenomena (ideas, products, behaviors) reach critical mass; in other words, these phenomena spread through society like diseases. This idea has proven so attractive to that people now use the expression "that video went viral" to describe popular YouTube clips. In Gladwell's "framework," the success or failure of any social epidemic depends on the configuration of the network of social ties, which are analogous to disease vectors. He argues that a relatively few number of people, known as "connectors, mavens, and salesmen" hold the keys to spreading a good idea to a large enough number of people so it 'sticks.' The implication is that with the right combination of these few people on your side, you wield major social influence.

The data to analyze is Twitter text data of @RDataMining used in the example of Text Mining, and it can be downloaded as file “termDocMatrix.rdata” at the Data webpage. Putting it in a general scenario of social networks, the terms can be taken as people and the tweets as groups on LinkedIn, and the term-document matrix can then be taken as the group membership of people. We will build a network of terms based on their co-occurrence in the same tweets, which is similar with a network of people based on their group memberships.

A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE. There is a also fixed = TRUE which can be considered to use a literal regular expression.

Time series decomposition is to decompose a time series into trend, seasonal, cyclical and irregular components.

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Develops knowledge of association analysis and market basket analysis, which are core skills for data scientists

Teaches how to manipulate string data (variables) and use regular expressions in R, which are useful tools for data manipulation

Examines social networks, which is highly relevant to diverse domains such as social science and marketing

Taught by Geoffrey Hubona, Ph.D. who is recognized for their work in data mining and machine learning

Reviews summary

Advanced r data mining techniques

According to learners, this course is a highly practical and valuable continuation for those looking to deepen their R-based data mining skills, particularly in association rules, network analysis, text mining, and time series. Students consistently highlight the instructor's clear explanations and strong examples, especially appreciating the hands-on demonstrations. While the course assumes some R familiarity, many found it well-structured and easy to follow, though some topics like regular expressions could benefit from more detailed explanation. The use of real-world data and case studies is frequently praised as a major strength, making the concepts tangible and applicable.

Beneficial for those with some R background.

"Having taken the previous 'Data Mining with R' course was definitely a huge advantage here."

"I came in with some R knowledge, and that made the course much smoother to follow."

"It's great for intermediate R users, but beginners might need extra preparation."

Well-organized and easy to follow progression.

"The course flow was logical, building up from basic R concepts to more advanced mining techniques."

"I found the progression through association, network, text, and time series mining very well-structured."

"It was easy to follow along with the lectures and assignments due to the clear organization."

Instructor simplifies complex topics effectively.

"The instructor explains complex topics like association rules and social network analysis very clearly."

"I really appreciated how the instructor broke down difficult concepts into understandable parts."

"His explanations were straightforward and made even the advanced topics accessible."

Strong emphasis on practical coding and demos.

"The hands-on examples using R were incredibly helpful for solidifying my understanding."

"I loved the script demos for text and string manipulation; they were very practical."

"Learning by doing with the provided R scripts made the concepts much clearer."

Excellent course for real-world application of data mining.

"This course provided me with very relevant data mining techniques using R that I can apply directly."

"I found the content highly practical, especially the market basket analysis and text mining examples."

"The extended case studies with real datasets made the learning highly applicable to professional scenarios."

Needs more depth and clarity for new learners.

"The section on regular expressions felt a bit rushed; I needed more detailed explanations and examples."

"I struggled a bit with the regular expressions part and wished there were more exercises to practice."

"For someone new to regex, that module could be expanded for better understanding."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in More Data Mining with R with these activities:

Revisit basic R syntax and functions

Show steps

Reviewing core concepts will better prepare you to understand the more advanced topics in this course.

Browse courses on R Functions

Show steps

Go through your notes or study materials from a previous R course or tutorial.
Complete a few practice exercises to test your understanding of basic R syntax and functions.

Mentor junior data miners or students

Show steps

By mentoring others, you can reinforce your own understanding of data mining concepts while fostering the growth of the next generation of data miners.

Browse courses on Mentoring

Show steps

Identify opportunities to mentor junior data miners or students, such as volunteering at a local university or joining a mentorship program.
Share your knowledge and experiences, providing guidance and support to your mentees.
Encourage your mentees to ask questions and explore different aspects of data mining.

Join a study group or online forum for data mining

Show steps

Engaging with peers will provide you with opportunities to discuss concepts, share insights, and learn from others' experiences.

Show steps

Search for study groups or online forums dedicated to data mining.
Join a group that aligns with your interests and level of expertise.
Participate in discussions, ask questions, and share your own knowledge.

Five other activities

Expand to see all activities and additional details

Show all eight activities

Practice data manipulation in R

Show steps

Hands-on practice with data manipulation will solidify your understanding of the concepts and techniques covered in this course.

Browse courses on Data Manipulation

Show steps

Find a dataset online or use one provided by the course instructor.
Load the dataset into R and explore its structure using functions like `head()`, `str()`, and `summary()`.
Practice data cleaning and transformation tasks, such as removing duplicates, handling missing values, and creating new variables.

Contribute to open-source data mining projects

Show steps

Contributing to open-source projects will provide you with hands-on experience, expand your network, and enhance your understanding of the latest data mining techniques.

Browse courses on Open Source

Show steps

Identify open-source data mining projects that align with your interests.
Familiarize yourself with the project's codebase and documentation.
Identify areas where you can contribute, such as bug fixes, feature enhancements, or documentation improvements.

Attend data mining workshops or conferences

Show steps

Attending workshops or conferences will expose you to the latest advancements in data mining and provide opportunities to network with experts.

Show steps

Research upcoming data mining workshops or conferences.
Identify events that cover topics relevant to your interests and career goals.
Register for the event and actively participate in sessions, presentations, and networking opportunities.

Explore advanced data mining techniques in iGraph

Show steps

Engaging with tutorials will provide you with additional insights and practical examples of advanced data mining techniques using iGraph.

Browse courses on Social Network Analysis

Show steps

Search for tutorials on iGraph's website or other online resources.
Follow the tutorials step-by-step, experimenting with different parameters and datasets.
Apply the techniques you learn to your own research or projects.

Create a data mining portfolio

Show steps

Developing a portfolio will provide you with a tangible showcase of your data mining skills and enhance your employability.

Browse courses on Portfolio Development

Show steps

Gather your best data mining projects, assignments, or research work.
Create a website or online platform to showcase your portfolio.
Include clear descriptions, code snippets, and visualizations to demonstrate your abilities.

Career center

Learners who complete More Data Mining with R will develop knowledge and skills that may be useful to these careers:

Quantitative Analyst

Quantitative Analysts use mathematical and statistical modeling to analyze and interpret financial data. More Data Mining with R focuses on advanced data mining techniques, using concrete data mining modeling examples, extended case studies, and real data sets. The skills learned in this course could be of great value to a Quantitative Analyst.

See salaries and explore the career path for Quantitative Analyst

Statistician

Statisticians collect, analyze, interpret, and present data. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis (or market basket analysis), network analysis, text mining, and time series data mining and analysis. This course may be useful for Statisticians looking to expand their knowledge of data mining techniques.

See salaries and explore the career path for Statistician

Data Scientist

As a Data Scientist, you will apply scientific methods and algorithms to extract knowledge and insights from data. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis, network analysis, text mining, text and string manipulation, and time series data mining and analysis. This course may provide a helpful foundation for your career as a Data Scientist.

See salaries and explore the career path for Data Scientist

Machine Learning Engineer

Machine Learning Engineers develop and maintain machine learning models. More Data Mining with R provides a comprehensive overview of a myriad of contemporary data mining techniques. While not specifically focused on machine learning, the data mining techniques explored in this course could be useful for Machine Learning Engineers.

See salaries and explore the career path for Machine Learning Engineer

Risk Analyst

Risk Analysts assess and manage risks to businesses and organizations. More Data Mining with R provides a comprehensive overview of a myriad of contemporary data mining techniques. The skills learned in this course could be useful for Risk Analysts seeking to expand their knowledge of the data mining process and its application in risk management.

See salaries and explore the career path for Risk Analyst

Operations Research Analyst

Operations Research Analysts use advanced analytical techniques to solve complex problems in business and industry. More Data Mining with R provides a comprehensive overview of contemporary data mining techniques, with a focus on association analysis (or market basket analysis), network analysis, text mining, and time series data mining and analysis. The data mining techniques explored in this course could be useful for an Operations Research Analyst.

See salaries and explore the career path for Operations Research Analyst

Product Manager

Product Managers oversee the development and marketing of products. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis (or market basket analysis), network analysis, text mining, and time series data mining and analysis. The data mining techniques explored in this course could be useful for Product Managers seeking to improve their understanding of customer behavior and market trends.

See salaries and explore the career path for Product Manager

Data Engineer

Data Engineers build and maintain the infrastructure that stores and processes data. More Data Mining with R is a course that provides a comprehensive overview of contemporary data mining techniques. While not specifically focused on data engineering, the skills learned in this course may be useful for Data Engineers.

See salaries and explore the career path for Data Engineer

Market Research Analyst

Market Research Analysts study market conditions, identify trends, and analyze marketing strategies to determine potential opportunities for businesses. More Data Mining with R provides detailed instruction and plentiful "hands-on" examples about association analysis. This course may be useful for Market Research Analysts to better understand the utility of association analysis.

See salaries and explore the career path for Market Research Analyst

Business Analyst

Business Analysts use data and analytical techniques to identify and solve business problems. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis (or market basket analysis), network analysis, text mining, and time series data mining and analysis. This course may be useful for Business Analysts looking for a comprehensive overview of contemporary data mining techniques.

See salaries and explore the career path for Business Analyst

Financial Analyst

Financial Analysts make investment recommendations and provide advice to clients. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis. This course may be useful for Financial Analysts looking to learn more about association analysis for investment research.

See salaries and explore the career path for Financial Analyst

Epidemiologist

Epidemiologists investigate the causes and patterns of health and disease in populations. More Data Mining with R provides a comprehensive overview of a myriad of contemporary data mining techniques. While not specifically focused on epidemiology, the skills learned in this course may be useful for Epidemiologists.

See salaries and explore the career path for Epidemiologist

Actuary

Actuaries use mathematical and statistical models to assess risk and uncertainty. More Data Mining with R presents detailed instruction and plentiful "hands-on" examples about association analysis (or market basket analysis). While not specifically focused on actuarial science, the data mining techniques explored in this course could be useful for Actuaries.

See salaries and explore the career path for Actuary

Data Analyst

In the role of a Data Analyst, you will gather, clean, and interpret data to make data-driven decisions. More Data Mining with R is a course that provides a comprehensive overview of a myriad of contemporary data mining techniques. This could be useful for a Data Analyst seeking to expand their knowledge of the data mining process.

See salaries and explore the career path for Data Analyst

Software Engineer

Software Engineers design, develop, and maintain software systems. More Data Mining with R uses the R software, which is a popular programming language for data analysis and data mining. This course may be useful for Software Engineers interested in learning more about the R programming language and how to use it for data mining tasks.

See salaries and explore the career path for Software Engineer