How do you scrape data from a website using python?

1 minute read

Published: October 09, 2022

How to scrape websites?

First lets grab the needed modules

pip install -U selenium
pip install webdriver-manager
pip install beautifulsoup4

You might be asking what are these modules?

Here’s the run down.

selenium basically helps us automate browsers! It opens up a browser on your device your python script can interact with.

webdriver-manager is a wrapper module for selenium. How it works is, selenium needs a “driver” to interface with a chosen browser. Selenium docs has links to where you can download these drivers, however webdriver-manager circumvents this problem by automating the downloading and saving of webdrivers

And finally beatifulsoup4 is our chosen html parser.

Next create a new file main.py

You can do this manually or by running the below command (only for linux)

touch main.py

Now in the file add these lines

from selenium import webdriver
from selenium.webdriver.firefox.service import Service as FirefoxService
from webdriver_manager.firefox import GeckoDriverManager

driver = webdriver.Firefox(service=FirefoxService(GeckoDriverManager().install()))

Choose your URL

First lets see which sites we can scrape from The example rn is olx.com, a pakistani site called olx.com

URL = "https://czone.com.pk/laptops-pakistan-ppt.74.aspx"
driver.get(URL)

We can get the page source from the driver and pass it to our html parser

soup = BeautifulSoup(driver.page_source, 'lxml')
products = soup.findAll(
    'div', {'class': 'product'}
)

Next lets grab all the products on that page and put them into a list

records = []
for product in products:
    product_name = product.find('h4').getText().strip()
    product_price = product.find('div', {'class': 'price'}).getText().strip()

    records.append((product_name, product_price))

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Abdul Ahad Butt

How do you scrape data from a website using python?

How to scrape websites?

You might be asking what are these modules?

Next create a new file main.py

Now in the file add these lines

Choose your URL

Share on

You May Also Enjoy

How to make your own Mayonnaise

Why make your own Mayo?

How does a terminal multiplexer work?

What is tmux

Add ssh keys

What is ssh?