How to create web scraper with python ? ( Selenium )

How to create web scraper with python ? ( Selenium )

By HackTheGeek | Hack The Geek | 31 Aug 2023


there are many ways to create Web Scraper with python the best way to do that is using selenium . the selenium enables you to open up a browser page with python and do certain tasks like pressing keys or scraping part of the page . the best thing about selenium is that it acts like human being and is not like any other scraper that are easily  detectable .

the selenium is not only used to scrape data . it can be used for many things like automated buy order or …

today we will be creating the basic python program that opens up google in IE and select the search bar in google ( currently google search bar class name is “gLFyf” ) and types “Hi mom” and press ENTER to do the search and writes the source code of the page  in txt called “page_source_of_google_after_typing_hi_mom.txt” file in the same place as the program . 

this program is for demonstration of the way that work is done . after getting the new page source code you can do anything with it . please be creative there are many projects like this on freelancing sites . you just have to be more creative and play around with code ( some ideas are that you can create a web page and put the scraped information in it so that it becomes user friendly )

The first thing that you should do is download the python from the official website :

https://www.python.org/

15ccec65fe2ff8f2ae0c561934ff8ec7be8928b7b35d528286c22ac3725042b6.png

in this tutorial we will be using windows .

After the installing the python you need to install selenium and webdriver-manager . you can do that by typing below commands in CMD or powershell of your windows.

pip install webdriver-manager

pip install selenium

now you are ready to go . I wrote the program and put it on github you can download it and play with it . 

www.github.com/sinas12/blue_scrape

now I want to explain briefly  what every line does .

first we create a function called “scrape(url)” and pass the ’url” varible to it . in the function first we need to open up edge browser ( driver = webdriver.Edge() ( you can use driver =  webdriver.Firefox() for opening firefox ) then we need to open the url ( driver.get(url) ) now we have the url opened . you can try running it at this stage and see it only opens the edge and goes to url .

now we need to select the google textarea with class name “gLFyf” . and press enter ( element.send_keys('Hi mom !' + Keys.RETURN) ) . 

the html_content = driver.page_source will get the source code of page and put it in html_conten variable.

the last three lines below are for writing html_conten variable  to file called page_source_of_google_after_typing_hi_mom.txt .
the code

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

def scrape(url):
    # Fetch the content of the URL
    driver = webdriver.Edge()
    driver.get(url)

    
    # Get the element with class name of gLFyf ( in google it was the class name of search textarea )
    element =  driver.find_element(By.CLASS_NAME, "gLFyf")

    # type " Hi mom ! " in the selected element and press enter
    element.send_keys('Hi mom !' + Keys.RETURN)

    # Get the page source
    html_content = driver.page_source
    f = open("page_source_of_google_after_typing_hi_mom.txt", "x")
    f.write(html_content)
    f.close()
    # in here you can do any thing else because you have the page_source of the page after typing the text
    # Be creative My friend !


# Example usage
url = "https://www.google.com/"
matches = scrape(url)






My Website : https://alphageek.ir/

Read.cash : read.cash/@alphageek

Steemit : steemit.com/@alphageekir

Hive.blog : hive.blog/@alfageek

How do you rate this article?

5


HackTheGeek
HackTheGeek

i love learning new things and sharing it with others


Hack The Geek
Hack The Geek

sometime you die hero or live long enough to become villain

Send a $0.01 microtip in crypto to the author, and earn yourself as you read!

20% to author / 80% to me.
We pay the tips from our rewards pool.