How To Extract Google Ad Results Using Python

There’re two kinds of ad results available having different layouts:

Logic:

  • Import libraries for working with.
  • Add user-agent for fake real-user visits.
  • Enter the search queries.
  • Have HTML response.
  • Have HTML code.
  • Discover and specify where to extract data.
  • Repeat over that till nothing left.

Google might block the requests if:

  • Recognize script as the script, e.g. python-requests.
  • There’re so many requests from single IP address.
  • Not working like the human. Fundamentally everything above

There’re many ways to tag along blocking scripts from Google:

  • Use referrer or Python-requests Session Objects.
  • Use customized headers -User Agents and list of different user agents.
  • Use headless browsers or browser auto frameworks like Pyppeteer or Selenium.
  • Use proxies as well as rotate them.
  • Use CAPTCHA solving services.
  • Use request delays much slower.\

Shopping Ads

 

import requests, lxml, urllib.parse
from bs4 import BeautifulSoup

# Adding User-agent (default user-agent from requests library is 'python-requests')
# https://github.com/psf/requests/blob/589c4547338b592b1fb77c65663d8aa6fbb7e38b/requests/utils.py#L808-L814
headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}

# Search query
params = {'q': 'сoffee buy'}

# Getting HTML response
html = requests.get(f'https://www.google.com/search?q=',
                    headers=headers,
                    params=params).text

# Getting HTML code from BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

# Looking for container that has all necessary data findAll() or find_all()
for container in soup.findAll('div', class_='RnJeZd top pla-unit-title'):
  # Scraping title
  title = container.text

  # Creating beginning of the link to join afterwards
  startOfLink = 'https://www.googleadservices.com/pagead'
  # Scraping end of the link to join afterwards
  endOfLink = container.find('a')['href']
  # Combining (joining) relative and absolute URL's (adding begining and end link)
  ad_link = urllib.parse.urljoin(startOfLink, endOfLink)

  # Printing each title and link on a new line
  print(f'{title}\n{ad_link}\n')


# Output
''' 
Jot Ultra Coffee Triple | Ultra Concentrated
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABABGgJ5bQ&sig=AOD64_0x-PlrWek-JFlDTSo7E9Z7YhUOjg&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCED4&adurl=
MUD\WTR | A Healthier Coffee Alternative, 30 servings
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAJGgJ5bQ&sig=AOD64_3gltZJ6kPrxic5o8yUO5cuJrHXnw&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEEg&adurl=
Jot Ultra Coffee Double | 2 bottles = 28 cups
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAHGgJ5bQ&sig=AOD64_3hD0JWZSLr8NUgoTW5K0HMzdFvng&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEE4&adurl=
'''
Note: At times, there would be zero results as Google didn’t indicate ads at script runtime. Just run that again.

 

Standard Website Ads

 

import requests, lxml, urllib.parse
from bs4 import BeautifulSoup

# Adding user-agent to fake real user visit
headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}

# Search query
params = {'q': 'coffee buy'}

# HTML response
html = requests.get(f'https://www.google.com/search?q=',
                    headers=headers,
                    params=params).text
# HTML code from BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

# Looking for container that has needed data and iterating over it 
for container in soup.findAll('span', class_='Zu0yb LWAWHf qzEoUe'):
  # Using .text since in 'span' there's no other text other than link
  ad_link = container.text
  # Printing links
  print(ad_link)

# Output
'''
https://www.coffeeam.com/
https://www.sfbaycoffee.com/
https://www.onyxcoffeelab.com/
https://www.enjoybettercoffee.com/
https://www.klatchroasting.com/
https://www.pachamamacoffee.com/
Home
'''

Use Google Ads Results API

Instead, you can perform the same things using Google Ad Results API from X-Byte, except you don’t need to consider solving CAPTCHA in case you send so many requests, getting proxies, reduces development complexities, and offers easy data manipulation.

This is a paid API.

Code to integrate:

import os
from serpapi import GoogleSearch

params = {
  "engine": "google",
  "q": "kitchen table",
  "api_key": os.getenv("API_KEY"),
  "no_cache":"true" # add this param if it throws an error
}

search = GoogleSearch(params)
results = search.get_dict()

for ad in results['ads']: # shopping ads -> ['shopping_results']
  shopping_ad = ad['tracking_link'] # shopping ads -> ['link']
  print(shopping_ad)

# Output for regular ads
'''
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAPGgJxdQ&ae=2&sig=AOD64_2ZH32FlwxW1XqO9V49i2L8J5qy2A&q&adurl
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAMGgJxdQ&ae=2&sig=AOD64_2l1PVJAqbVmrcu8UpkGPVk-VK3UA&q&adurl
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAQGgJxdQ&sig=AOD64_2DDuyRZUcFi04jfneAzwnOQBuLtw&q&adurl
'''
# Output for shopping ads
'''
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAEGgJ5bQ&ae=2&sig=AOD64_2zCyytR6tDeB3BjdOX5sFQQKwOAA&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARA8&adurl=
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAFGgJ5bQ&ae=2&sig=AOD64_2HeGVTNF91vkSHjg-wRDtC1ouATw&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARBI&adurl=
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAGGgJ5bQ&ae=2&sig=AOD64_1n4ztvwQxiSMInwgntgY-WyVc2eQ&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARBY&adurl=
'''

In case, you have any queries or anything isn’t working properly or you need to write some other codes, feel free to contact X-Byte Enterprise Crawling or ask for a free quote!

✯ Alpesh Khunt ✯
Alpesh Khunt, CEO and Founder of X-Byte Enterprise Crawling created data scraping company in 2012 to boost business growth using real-time data. With a vision for scalable solutions, he developed a trusted web scraping platform that empowers businesses with accurate insights for smarter decision-making.
Instagram
YouTube

Related Blogs

How to Monitor Competitors’ Prices on Amazon During the Holiday Season?
Read More
Web Scraping for Getting the Best Holiday Deals: Track Prices and Get Bargains
Read More
How Businesses Can Automate Due Diligence with Web Scraping
Read More