Extracting Search Engine Links – Python

August 15, 2016February 13, 2017 PH

It could be a basic need for many data science projects to extract all the links present in the search engine results. The task can be achieved through many way including Google API’s, Scrappers etc.

One simple and easy way is to use the BeautifulSoup. Following is the python code to extract all the links present in the google search for the user supplied keyword.

Requirements:
– Python 3.5
– Beautiful Soup 4
– Internet connection
[Note: the code might need some syntactic changes if other versions are used]

Code can also be downloaded from the github HERE.

from bs4 import BeautifulSoup
import urllib.request

# Collect the relevant urls from the search engine for the
# user supplied keyword
def collect_urls():
    # Define and set an agent
    opener = urllib.request.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]

    # Accept a keyword from the user
    print("Enter the Keyword. [use - instead of space for more than one word]")
    key = input()

    # Prepare the search engine query
    # We are going to visit the google search engine
    # URL is prepared accordingly
    # Our result will be from the first page of the search engine only
    url = "http://www.google.com/search?q="+ key +"&start="

    # Open the page and parse through the BeautifulSoup
    # Parser is set to html. Any other suitable could be used as well
    page = opener.open(url)
    soup = BeautifulSoup(page, "html.parser")

    # Open a file and write all the links found
    file = open("links.txt", "w")
    for cite in soup.find_all('cite'):
        file.write(cite.text)
        file.write("\n")

    # Close the file
    file.close()

collect_urls()

Let me Know What you Think! Cancel reply

PH Engineering Education Research Talks

Lets talk them out!

Words everywhere

the world of wonders with amazing words

Suitcase Travel Blog

My best traveling tips for your unforgettable journey.

There are always gonna be flaws in the plan. Nothing is perfect. Neither the blog, its url or the blogger. Make yourself home in the chaos.

collecting moments that stump

Welcome to your new home on WordPress.com

TechnoReview : Business Mobile Apps to Boost Your Business

My words, My life

With sweet and sour experiences, life is full of colors

Every beat of my heart belongs to I AM! 💙

Insightful Geopolitics

Impartial Everytime Always

Python Unleashed

The power of the python programming language unleashed.

Duplicate My Success

Cooking Without Limits

Food Photography & Recipes

Embracing Complexity

worddreams.wordpress.com/

Intelectual Emergence "The whole is greater than the sum of its parts"

from sand to stars

... and in between

THE 'Bella Girlss' PURRFECT PAD

*~~BELLA GIRLSS FUR EVER~~*

Design a site like this with WordPress.com