Zomato menu data scraping is one of the most considerable applications for scraping restaurant menu listings from various websites. This will extract prices, design an aggregator, or deliver improved UX on the current hotel booking websites.
Here, we will use BeautifulSoup for scraping data from Zomato. Below described is a simple script of Beautifulsoup that can help elevate CSS selectors to question the script for necessary data.
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.zomato.com/ncr/restaurants/pizza'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
#print(soup.select('[data-lid]'))
for item in soup.select('.search-result'):
try:
print('----------------------------------------')
print(item)
except Exception as e:
#raise e
print('')
Here, we also forward the user agent headers to replicate a browser signal.
Now, let us find the zomato scraping result we expect.
When we observe the web page, we can conclude that each item of the HTML is included in a tag with class search results.
We can use the below script to break HTML codes into various parts which consist of individual item data like:
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.zomato.com/ncr/restaurants/pizza'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
#print(soup.select('[data-lid]'))
for item in soup.select('.search-result'):
try:
print('----------------------------------------')
print(item)
except Exception as e:
#raise e
print('')
Once you execute the code:
python3 scrapeZomato.py
Here, the code isolates the cards HTML:
On observation, you will find that restaurant’s name always possesses a class result title. Let us have a look at the result.
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.zomato.com/ncr/restaurants/pizza' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') #print(soup.select('[data-lid]')) for item in soup.select('.search-result'): try: print('----------------------------------------') #print(item) print(item.select('.result-title')[0].get_text()) except Exception as e: #raise e print('')
The above code will receive the results as follows:
Now, let us retrieve other data pieces.
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.zomato.com/ncr/restaurants/pizza' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') #print(soup.select('[data-lid]')) for item in soup.select('.search-result'): try: print('----------------------------------------') #print(item) print(item.select('.result-title')[0].get_text().strip()) print(item.select('.search_result_subzone')[0].get_text().strip()) print(item.select('.res-rating-nf')[0].get_text().strip()) print(item.select('[class*=rating-votes-div]')[0].get_text().strip()) print(item.select('.res-timing')[0].get_text().strip()) print(item.select('.res-cost')[0].get_text().strip()) except Exception as e: #raise e print('')
When executed:
The result we get will display ratings, reviews, rates, and addresses.
The rotation of the user-agent string will let zomato know that it is not a similar browser. Additionally, Zomato can also block all your IP avoiding other tricks.
Conquering IP Blocks
Investment of private rotating proxy services such as Proxies API will make the discrepancy between successful and hassle-free Zomato menu data scraping projects that will complete the job.
Rotating proxy servers proxies API will provide a smooth API solving all the problems instantly using IP blocking.
- Using millians of higher speed performance proxies situated around the world
- Using our automated IP Rotation
- Using our auto User-Agents-String rotations that will simulate requests from various valid browsers and different web browsing versions
- Using automatically generated CAPTCHA solving system.
Several customers have resolved the problem of IP block using simple API.
The entire process can be processed using an easy API in the programming language.
curl "https://www.xbyte.io/?key=API_KEY&url=https://example.com"
We at X-Byte Enterprise Crawling, are always happy to answer your queries. Drop a message in case of any questions !!