Simplest Web Scrapping

Soumyabrata Roy
3 min readAug 8, 2022

Do you want to learn about web scrapping techniques? Do you want to apply it as quickly as possible? Do you need something which is as simple as watching a video?

Well in that case you are at the right place.

In this quick and simple tutorial I will show you how to scrap website for free. (Check website policies section if you are allowed to scrap)

Introducing Google Sheet web scrapping. Along with many benefits, Google sheet has a really cool feature to scrap web content which you can save in your Google sheet. Let’s see that in practice.

Step1: Find out the web content- As you already reading this far, chances are you know the content you would like to scrap. If not, I will show you an example.

Here I will use India government website: https://www.india.gov.in/ . All the content in this website are free to use. So we can scrap it.

https://www.india.gov.in/

Step2: Identify the section or content- In this example I will scrap the copyright policy of the website. You Usually find that at the bottom of a most websites.https://www.mygov.in/simple-page/website-policies/

https://www.mygov.in/simple-page/website-policies/

Step 3: Create a empty new Google sheet and copy/ paste the web URL from where you will scrap the content.

Step 4: Open the URL using Google chrome, right click on the content and select inspect.

Copyright policy

It will open up the backend HTML code. Just right click on the code block in the respective HTML and click on copy and then copyXPath.

Step 5: In Google sheet paste the copied XmlPath.

Step 6: Write down the below formula to get the content.

In IMPORTXML has two arguments, from where (the website) you want the content (URL) and second is XmlPath. Once you give both and click enter, you will get the content.

That’s it. It is this simple. I hope you enjoyed it. Let me know your thoughts down in the comment below.

--

--

Soumyabrata Roy

Data Scientist Cognizant | Answering what, why, and how of different business scenarios through machine learning and deep learning.