Web Scrapping Using Python

Manoj Damor
4 min readJul 30, 2021

To Perform Data Collection by web scrapping using python:

Web Scrapping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Whether you are a data scientist, engineer, or anybody who analyzes large amounts of datasets, the ability to scrape data from the web is a useful skill to have.

WHAT IS WEB SCRAPING?

Web Scripting is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications. There are many different ways to perform web scraping to obtain data from websites.

WEB SCRAPING USED FOR?

  • Search engine bots crawling a site, analyzing its content and then ranking it.
  • Price comparison sites deploying bots to auto-fetch prices and product descriptions for allied seller websites.
  • Market research companies using scrapers to pull data from forums and social media (e.g., for sentiment analysis).

To extract data using web scraping with python, you need to follow these basic steps:

  1. Find the URL that you want to scrape.

2. Inspecting the Page.

3. Find the data you want to extract.

4. Write the code.

5. Run the code and extract the data.

6. Store the data in the required format.

Python Libraries Used For Web Scrapping:

There are many different libraries available in python for web scrapping, but here we have used Requests, BeautifulSoup and Pandas.

  1. Requests: It allows you to send HTTP/1.1 requests with ease and it does not require to manually add query strings to your URLs, or to form-encode your POST data.
  2. BeautifulSoup : is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.
  3. Pandas: Pandas is mainly used for data analysis. Pandas allows importing data from various file formats such as comma-separated values, JSON, SQL, Microsoft Excel. Pandas allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features.

Web Scraping involves 3 basic steps:

1. Find the URL: Select the website from where you want the data. For example, here I have used IMDb website to get the data of top 50 highest rated Web-Series. The url is here

2. Inspect the page: The data is usually nested in tags. So, we inspect the page to see, under which tag the data we want to scrape is nested. To inspect the page, just right click on the element and click on “Inspect”.

3. Find the data which is to be extracted: In this example I’m going to extract data of name of web-series, their year of release, genre and IMDb ratings , which are under nested “div” tags.

4. Write the code: To do this, you can use any Python IDE. Here I have us Jupyter Notebook.

Import the required python libraries:

Create Empty variables to store scraped data:

Now enter the URL from where you want the data. Requests library is used to make html requests to the server.

Using the Find and Find All methods in BeautifulSoup extract the data from required tags and store it in the variables declared.

Now using Pandas Library, create a DataFrame in which the data is stored in structured way so that you can export it into the desired file format. Here I have exported the data in .csv format.

After run this code, here is snapshot of generated csv file

This is a basic program to perform web scraping. By performing this, you get to learn how to scrape data from the internet and format it for further analysis.

GitHub Link: https://github.com/manoj0221/Web-Scrapping

--

--