How to scrap a website with and without programming easily


This article explains the ways to scrap a website and collect data to analyze.


Data for a website is a crucial factor because it becomes the market strategy in major issues and solves a huge amount of problems by simple mathematics procedure.


Generally, web scraping is called as a method of creating own API(application program interface).

Live stocks, Live score are the primary examples where analysis plays a vital role in making out decisions for the future.


Is web scraping new?

Definitely not, web scraping has been there for years but since the boon towards data analysis, scraping of data becomes a featured one.

Where it get started, actually it is part of Unix Os structure, in Windows operating system every one might get know about DLL files, what do these files give?  featured data access between different application.

These DLL files were later replaced by API's which many companies work on between them. In general, most websites don't have the API feature. API is not an alien feature, it just the data sharing support. In common, widgets in mobiles can be attributed to this one.



Scraping with programming?

Python programming is at the forefront of data science programming. Scraping is the first stage of the overall data analytics system. Scraping can also be named to data collection.

Python uses two major libraries in this process
bs4 - Beautiful Soup - A library which provides data structuring support means converting raw data into a readable and reusable data.

requests - A library which collects data from URL
parser -html5 would be used in the current league.

To install Beautiful Soup -> pip install BeautifulSoup 
To install requests -> pip install requests

The first step in getting data from URL
import requests page = requests.get("https://cricbuzz.com/api/html/live-score-board/202299")
page.content

These lines of code will get your URL data and store
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>

<html>

<head>

<title> A simple example page </title>

</head>

<body>

<p> Here is some simple content for this page. </p>

</body>

</html>

In this way, we can get HTML code of that website and we can even remove HTML tags using the
text command from the beautiful soup library.

Web scraping without programming:

This is a very common one, everyone might know this, Microsoft Excel, Google sheets are used for
getting data from URL

In the EXCEL, there is a tab called data after clicking this click from the webpage and now
enter the URL and click ok.
Now you get the webpage data into the excel sheet.

In a similar way, in the google sheets, type =IMPORTHTML or IMPORT DATA these will help
 you get the values into the cells.

After this in excel, you can refresh the data from the properties icon by changing the refresh time
to 1 minute which will automatically update into the cell.

Comments