We may earn an affiliate commission when you visit our partners.
Course image
Attreya Bhatt

In early 2008, Scrapy was released into this world and it soon became the #1 Web Scraping tool for beginners. Why? It's because it's simple enough for beginners yet advanced enough for the pros. Here are some of the use cases -

Read more

In early 2008, Scrapy was released into this world and it soon became the #1 Web Scraping tool for beginners. Why? It's because it's simple enough for beginners yet advanced enough for the pros. Here are some of the use cases -

Ecommerce ( Amazon ) - Scrape product names, pricing and reviews

Data - Get a huge collection of data/images for Machine Learning

Email Address - Big companies scrape it and use it for Lead Generation

Come learn with me and I'll show you how you can bend Scrapy to your will. This course is great for beginners in Python at any age and any level of computer literacy.

The goal is simple: learn Scrapy by working on real projects step-by-step while we explain every concept along the way. For the duration of this course we will take you on a journey and you're going to learn how to:

  • Scrape Data from nearly Any Website

  • Build your own Spiders from scratch for all types of Web Scraping purposes

  • Transfer the data that you have scraped into Json, CSV and XML

  • Store the data in databases - SQLite3, MySQL and MongoDB

  • Create Web Crawlers and follow links on any web page

  • Logging in into websites

  • Bypassing restrictions & bans by using User-Agents and Proxies

  • Internalize the concepts by completely scraping amazon and get ready to scrape more advance website.

Enroll now

What's inside

Learning objectives

  • 3.5+ hours of full hd video material divided into 28 downloadable lectures
  • Scraping single or multiple websites with scrapy
  • Building powerful crawlers and spiders
  • Creating a web crawler for amazon from scratch
  • Bypass restrictions using user-agents and proxies
  • Logging into websites with scrapy
  • Storing data extracted by scrapy into sqlite3, mysql and mongodb databases
  • Exporting data extracted by scrapy into csv, xml, or json files
  • Understand xpath and css selectors to extract data
  • Access to our private facebook group available to only students of this scrapy course

Syllabus

Introduction to Scrapy and Web Scraping

In this video we understand the terms python web scraping, spiders and web crawling. We also see an example of amazon being scraped using scrapy.

Read more

In this video we look at the behind the scenes of web scraping a website and how does our scrapy python program goes to a website to extract data.

In this video we look at a secret file called as robots.txt file and how does scrapy treat that file to make sure you are following the policies of web scraping public websites. We also learn how to bypass the rules.

Installation Guide for Scrapy

In this video we learn how to install scrapy using my favourite IDE Pycharm.

In this video we install scrapy using the terminal so you can use it with Sublime text, VScode or any IDE.

Creating your first Spider

In this video we understand the project structure of scrapy and go into the different files like Items, Pipelines and Settings.

In this video we will create our very first spider/crawler using Scrapy!

In this video we will run our very first spider/crawler and finally scrape a website using Scrapy.

Extracting data with Scrapy

In this video we will scrape quotes from a website and select elements that need to be scraped using CSS Selectors. We will also learn about the tool called as Selector Gadget that is going to make your life so much easier!

There are two type of selectors 'CSS selectors' and 'XPATH selectors'. One of the main uses of xpath selectors is getting the value of html tags.

In this video we will be scraping quotes and authors from our website using the concepts we have learned in the previous python web scraping videos.

Storing the scraped data

In this video we are going to learning how to put that extracted data in containers called items.

Now why exactly do we need to put them in containers? Because we have already extracted the data. Can;t we just put them in some kind of database? The answer is yes. You can. But there might be a few problems when you are storing the data directly in the database when you are working on big/multiple projects.

Scrapy spiders can return the extracted data as Python dictionaries which we have already been doing right with our quotes project. But the problem with Python dictionaries is that it lacks structure. It is easy to make a typo in a field name or return inconsistent data, especially in a larger project with many spiders.

So it's always a good idea to move the scraped data to temporary location called containers and then store them inside the database. So these temporary containers are called as items.

Now that we have successfully scraped data from quote website and stored them in these temporary containers we can finally go to the next step and learn how to store the scraped data in some kind of a database or a file system.


So in this video we are going to be learning how to store this extracted data in a JSON, XML and CSV file.

Now before we go on to learn about storing the scraped data in our database we got to learn about Pipelines.

So if we discuss the flow of our scraped data it somewhat looks like this. It first gets scraped by our spider then it is stored inside the temporary containers called items and then you can store them inside a JSON file. But if we want send this data to a database we have to add one more step to this flow. After storing them inside item containers we are going to send them to this pipeline where this process_item method is automatically called and the item variable will contain are scraped data.

Extracting data to Databases : SQLite3, MySQL & MongoDB

In this video we are going to learn about the basics of SQLite3 so that we can store the scraped data in a database.

In this video we will be integrating scrapy with sqlite3 and finally storing the data inside a database using pipelines

In this video we are going to learn how to store our scraped data inside a MySQL database. Now before watching this video make sure that you have watched previous two videos in which we cover how to store data inside an sqlite database because a lot of the concepts that I teach in those videos are going to be used in this video. And I don't want to go over them again.

Now the first thing we need to do is install MySQL on our computer. You can go to this link if you are on windows to install MySQL. But if you are using Linux you can check this link out to install and use MySQL. I am going to covering only the windows part of installation because the Linux installation is pretty easy.

Now just click on this link to start the installation. I am going through this installation pretty quickly because it's simple. While installing make sure that you choose the developer's default option because we want everything installed on our computer including Connectors, routers, servers and MySQL Work Bench which is a GUI software to connect and handle various connections.

Also when you are asked to choose the root password. You can choose whatever you want but make sure that you remember the root password because we are going to be using the same password everywhere. If you forget this password it's going to be difficult to reset it.

Steps -

1) Install MySQL https://dev.mysql.com/downloads/installer/

Linux - https://support.rackspace.com/how-to/installing-mysql-server-on-ubuntu/

- Make sure you go with default options.

- Remember the root password

2) Install MySQL-connector-python

3) Create a new connection using Mysql workbench

4) Create a new database myquotes using SQL workbench

5) Write the code in Pipelines

6) Scrape the data

7) Show the data in SQL Workbench

In this video we will be learning how to store the scraped data inside a MongoDB database using Python.

Instructions -

1) Install MongoDB - https://docs.mongodb.com/manual/administration/install-community/

Make sure you install everything including mongodb compass https://www.mongodb.com/products/compass

2) Create a folder /data/db

3) Run the mongod.exe once

3) Install Pymongo on pycharm

4) Make sure you pipleline is activated

5) Write the code in MongoDB

6) See the saved data in Mongo Compass

Web Crawling and Pagination

In this web crawling video we will learn how to follow links given on any webpage and also how to scrape multiple pages using Scrapy Python.

In this web scraping video we learn how to scrape multiple pages using URL's / websites with Pagination.

Logging into websites using Scrapy

In this video we are going to learn to login into websites using scrapy and we will be using the quotes.toscrape.com website to learn that. As you can see on the left there is a login button and clicking on it takes us to a form which contains the username and the password.

Now why are we exactly learning to login? A lot of websites will restrict the content that you might want to scrape behind a login page. So to scrape that restricted data. It's always a good idea to learn how to login into websites using scrapy.

Scraping Amazon.com & Bypassing Restrictions

So by this video you already have a very good understanding of Scrapy. Now just to internalize the concepts we have learned, we will be a working on a complete really life project of scraping amazon.com

We will be scraping the books department of amazon. More specifically the collection of books that were released in the last 30 days. Now if you are following along, you don't have to choose books. You can choose any department on amazon.

I already created the project 'AmazonTutorial' on pycharm and have installed scrapy. If you don't remember how to install scrapy you can always go back to my installing scrapy video.

Now before we run our spider, I just want you want to tell you that are program might not work. If you have scraped amazon before then it's not going to work but if this you first time then the above code will work. The reason for it not working is that amazon puts on restrictions when you are trying to scrape a lot of its data. So we are going to bypass those restriction by using something known as user agents. But before we get into that lets actually run our program

In the last video we scraped the book section of amazon and we used something known as user-agent to bypass the restriction. So what exactly is this user agent and how is it able to bypass the restrictions placed by amazon?

Whenever a browser like chrome or Mozilla visits a website, that website asks for the identity of your browser. That identity of you browser is known as a user-agent. And if we give the same identity to a website like amazon. It places restrictions and sometimes bans the computer from visiting amazon.

So there are two ways to trick amazon. First is to use user-agents that are allowed by amazon. For example, amazon has to allow Google to crawl it's website if it wants it's products to be shown on Google Search. So we can basically replace our user-agent with Google's user-agent which is known as Google bot and trick amazon into thinking that actually Google is crawling the website and not us. And this exactly what we did in the last video. We found out the Google's user-agent name by typing it in Google Search. And then we replaced our user agent with Google.

The other way is to keep rotating our user-agents. If amazon identifies our computer using our user-agent then we can probably use fake user-agents in rotation and trick amazon into thinking that a lot of browsers are visiting the website instead of just one and this is what we will be learning in this video.

In this last video we bypassed the scraping restrictions by using user-agents and in this video we will be learning how to bypass them by using something known as proxies.

Before we go into proxies, you need to understand what is an IP address. An IP address is basically an address of your computer. You can find your own IP address by going to google and typing in 'What is my IP'.

Whenever you connect to a website you are automatically telling them your IP address. A website like amazon can recognize your IP address and ban you if you try to scrape a lot of it's data. But what if used a another IP address instead of our own. And even better we can use a lot of IP addresses that our not our own, and put them in rotation. So we every-time we send a request to amazon. It's going to be with a different IP address.

When you use an IP address that is not your own. Then that other IP address is known as a proxy. If we look up the definition of proxy on google it says 'the authority to represent someone else'. So basically we are hiding our address and using someone elses.

In this last video we will scrap the rest of the pages of amazon.

Thankyou for joining me in this video series :)

BONUS : Classes, Objects and Inheritance

In this video we go into Object Oriented Programming (OOP) and how to use it to create classes and Objects. We also discover the difference between and instance and an object. And at the end we cover class variables and instance variables.

In this video we are going to learn about inheritance and how to inherit the properties like methods and attributes of one class to another class by creating a subclass. We are also going to cover nested inheritance. Let's get started.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Develops skills and knowledge in web scraping and handling extracted data, which are core skills for data engineering, data science, data journalism, computer science, and software engineering
Taught by Attreya Bhatt, who is recognized for their work in web scraping
Develops Python programming, which is a highly relevant tool in industry and academia
Examines data extraction in the vast e-commerce landscape, which is highly relevant to e-commerce professionals and researchers
Suitable for beginners, as it provides a foundational understanding of web scraping
Requires learners to come in with some experience with Python, serving as a possible barrier to entry for absolute beginners

Save this course

Save Scrapy : Python Web Scraping & Crawling for Beginners to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Scrapy : Python Web Scraping & Crawling for Beginners with these activities:
Participate in Web Scraping Study Groups
Participate in web scraping study groups to connect with other learners and benefit from peer support.
Browse courses on Web Scraping
Show steps
  • Find or create a web scraping study group
  • Set regular meeting times
  • Discuss web scraping topics and share knowledge
Follow Guided Web Scraping Tutorials
Follow guided web scraping tutorials to learn specific techniques and best practices.
Browse courses on Web Scraping
Show steps
  • Find reputable web scraping tutorials
  • Follow the tutorials step-by-step
  • Experiment with the techniques covered in the tutorials
Practice Web Scraping Exercises
Practice web scraping exercises to reinforce your understanding of the underlying concepts and techniques.
Browse courses on Web Scraping
Show steps
  • Find online web scraping exercises
  • Solve the exercises using Scrapy
  • Review your solutions and identify areas for improvement
Four other activities
Expand to see all activities and additional details
Show all seven activities
Practice Basic Web Scraping with Scrapy
Practicing with basic web scraping tasks will help you solidify your understanding of the fundamental techniques covered in the course.
Show steps
  • Set up a Scrapy project.
  • Write a spider to extract data from a simple website.
  • Parse the extracted data using CSS selectors or XPath.
Create a Web Scraping Project
Create a web scraping project to solidify your understanding of web scraping concepts and techniques covered in the course.
Browse courses on Web Scraping
Show steps
  • Choose a website to scrape
  • Identify the data you want to extract
  • Write a Scrapy spider to extract the data
  • Store the extracted data in a database or file
  • Visualize the extracted data
Contribute to a Web Scraping Open Source Project
Contribute to an open source web scraping project to gain practical experience and learn from others.
Browse courses on Web Scraping
Show steps
  • Find a suitable open source web scraping project
  • Identify ways to contribute to the project
  • Submit a pull request with your contributions
  • Review feedback and make necessary changes
Build a Simple Web Crawler
Building a web crawler will challenge you to apply your knowledge of web scraping and programming to a more complex task.
Show steps
  • Design the architecture of your web crawler.
  • Implement the functionality to crawl and scrape websites.
  • Handle errors and exceptions that may occur during the crawling process.

Career center

Learners who complete Scrapy : Python Web Scraping & Crawling for Beginners will develop knowledge and skills that may be useful to these careers:
Web Scraper
A Web Scraper is responsible for extracting data from websites. The course is designed to help people learn how to scrape data from websites, which is an essential skill for a Web Scraper to have.
Business Analyst
A Business Analyst is responsible for analyzing business processes and identifying opportunities for improvement. The course may be useful for someone who wants to become a Business Analyst, as it provides a strong foundation in data extraction and web scraping. These skills can help a Business Analyst to understand how data is stored and how to extract it from websites.
Product Manager
A Product Manager is responsible for the development and launch of new products. The course may be useful for someone who wants to become a Product Manager, as it provides a strong foundation in data extraction and web scraping. These skills can help a Product Manager to understand how data is stored and how to extract it from websites.
Database Administrator
A Database Administrator is responsible for the management and maintenance of databases. The course may be useful for someone who wants to become a Database Administrator, as it provides a strong foundation in data extraction and web scraping. These skills can help a Database Administrator to understand how data is stored and how to extract it from websites.
Web Developer
A Web Developer specializes in the design and development of websites. They are responsible for coding, maintenance, and web design. This course is a great foundation for the skills that a Web Developer needs to succeed. It covers foundational concepts to the advanced concepts of web scraping and crawling, which are important for a Web Developer to master.
Information Security Analyst
An Information Security Analyst is responsible for protecting an organization's data and systems from cyber threats. The course may be useful for someone who wants to become an Information Security Analyst, as it provides a strong foundation in web scraping, crawling, and data extraction.
Market Researcher
A Market Researcher is responsible for conducting research and gathering data on consumer preferences. The course may be useful for someone who wants to become a Market Researcher, as it provides a strong foundation in data extraction and web scraping. These skills can help a Market Researcher to understand how data is stored and how to extract it from websites.
Data Miner
A Data Miner is responsible for extracting knowledge from data. The course may be useful for someone who wants to become a Data Miner, as it provides a strong foundation in data extraction and web scraping. These skills can help a Data Miner to understand how data is stored and how to extract it from websites.
Data Engineer
A Data Engineer is responsible for designing, building, and maintaining data pipelines. The course may be useful for someone who wants to become a Data Engineer, as it provides a strong foundation in data extraction and web scraping. These are essential skills for a Data Engineer to have.
Technical Writer
A Technical Writer is responsible for creating documentation for software products. The course may be useful for someone who wants to become a Technical Writer, as it provides a strong foundation in web scraping and crawling. These skills can help a Technical Writer to understand how websites are structured and how to write documentation for them.
Data Analyst
A Data Analyst is responsible for collecting, analyzing, and interpreting data. The course may be useful for someone who wants to become a Data Analyst, as it provides a strong foundation in data extraction and web scraping. These are essential skills for a Data Analyst to have.
Software Engineer
A Software Engineer designs, develops, and maintains software systems. The course may be useful for someone who wants to become a Software Engineer, as it provides a strong foundation in web scraping and crawling, which are essential skills for a Software Engineer to have.
Web Designer
A Web Designer is responsible for the visual design of websites. The course may be useful for someone who wants to become a Web Designer, as it provides a strong foundation in web scraping and crawling. These skills can be useful for a Web Designer to have, as they can help them to understand how websites are structured and how to design them in a way that is both visually appealing and functional.
Software Tester
A Software Tester is responsible for testing software applications for bugs and errors. The course may be useful for someone who wants to become a Software Tester, as it provides a strong foundation in web scraping and crawling. These skills can help a Software Tester to understand how websites are structured and how to test them for errors.
Data Scientist
A Data Scientist is someone who has a strong understanding of data and is able to use that data to solve problems. This course may be useful for someone who wants to become a Data Scientist, as it provides a strong foundation in data extraction and web scraping. These are important skills for a Data Scientist to have.

Reading list

We've selected 12 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Scrapy : Python Web Scraping & Crawling for Beginners.
Comprehensive guide to deep learning. It covers everything from basic usage to advanced techniques, such as convolutional neural networks and recurrent neural networks. This book would be a valuable resource for anyone who wants to learn more about deep learning or for those who want to learn more about web scraping with Python.
Comprehensive guide to using Python for predictive analytics. It covers everything from basic usage to advanced techniques, such as regression analysis and decision trees. This book would be a valuable resource for anyone who wants to learn more about predictive analytics or for those who want to learn more about web scraping with Python.
Provides a step-by-step guide to web scraping using Python, covering topics such as HTTP requests, parsing HTML, and handling different types of data. Useful as a practical companion to the course, offering additional examples and exercises.
Comprehensive guide to using Python for data analysis. It covers everything from basic usage to advanced techniques, such as data manipulation and visualization. This book would be a valuable resource for anyone who wants to learn more about data analysis or for those who want to learn more about web scraping with Python.
Comprehensive guide to using Python for machine learning. It covers everything from basic usage to advanced techniques, such as supervised and unsupervised learning. This book would be a valuable resource for anyone who wants to learn more about machine learning or for those who want to learn more about web scraping with Python.
Comprehensive guide to using Python for natural language processing. It covers everything from basic usage to advanced techniques, such as text classification and sentiment analysis. This book would be a valuable resource for anyone who wants to learn more about natural language processing or for those who want to learn more about web scraping with Python.
Comprehensive guide to using Python for data mining. It covers everything from basic usage to advanced techniques, such as data clustering and association rule mining. This book would be a valuable resource for anyone who wants to learn more about data mining or for those who want to learn more about web scraping with Python.
Comprehensive guide to web scraping with R. It covers all the essential concepts and techniques, and it great resource for anyone who wants to learn how to scrape data from the web.
Provides a comprehensive overview of Python programming, including chapters on web scraping and data manipulation. Suitable for beginners looking to build a strong foundation in Python before delving into web scraping.
Comprehensive guide to web scraping with R and rvest. It covers all the essential concepts and techniques, and it great resource for anyone who wants to learn how to scrape data from the web.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Scrapy : Python Web Scraping & Crawling for Beginners.
Extracting Structured Data from the Web Using Scrapy
Most relevant
Scrapy: Powerful Web Scraping & Crawling with Python
Most relevant
Web Scraping with Python
Most relevant
Web Scraping 101 with Python3 using REQUESTS, LXML &...
Most relevant
Scraping Your First Web Page with Python
Most relevant
Web Scraping: Python Data Playbook
Most relevant
Advanced Web Scraping Tactics: Python 3 Playbook
Most relevant
Scraping Media from the Web with R
Most relevant
Advanced Web Scraping Tactics: R Playbook
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser