How Python And Beautifulsoup Will Help You Scrape Listings From Airbnb

how-python-and-beautifulsoup-will-help-scrape-listings-from-airbnb

Scraping Airbnb listings from various websites is among the most popular Web Scraping apps that help scrape listings from Airbnb with Python and BeautifulSoup. This might be done by monitoring the rates, building an aggregator, or improving the user experience on current hotel booking services.

This can be accomplished using a simple code. We will use BeautifulSoup to extract data and information from Airbnb.com.

To begin with, we will prefer to use some codes to extract data from Airbnb.com search pages and configure BeautifulSoup in assisting inquiring about the page for valuable data using CSS selectors.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.airbnb.co.in/s/New-York--NY--United-States/homes?query=New York, NY, United States&checkin=2020-03-12&checkout=2020-03-19&adults=4&children=1&infants=0&guests=5&place_id=ChIJOwg_06VPwokRYv534QaPC8g&refinement_paths[]=/for_you&toddlers=0&source=mc_search_bar&search_type=unknown'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')

To get blocked, we are also passing the user agent headers to fake a browser call.

Now, let us look at the Airbnb consequences for a certain destination. The below image shows how does it look.

When we look at the website, we notice that each HTML item is included within a tag that includes the attribute itemprop and the value itemListElement.

We can simply divide the HTML document into these cards, each of which contains personal item data, as shown below.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.airbnb.co.in/s/New-York--NY--United-States/homes?query=New York, NY, United States&checkin=2020-03-12&checkout=2020-03-19&adults=4&children=1&infants=0&guests=5&place_id=ChIJOwg_06VPwokRYv534QaPC8g&refinement_paths[]=/for_you&toddlers=0&source=mc_search_bar&search_type=unknown'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')#print(soup.select('.a-carousel-card')[0].get_text())for item in soup.select('[itemprop=itemListElement]'):
	try:
		print('----------------------------------------')
		print(item)
		print('----------------------------------------')
	except Exception as e:
		#raise e
		print('')

Once you run the code:

python3 scrapeAirbnb.py

You can see that the code isolates the HTML cards.

On closer inspection, the name of the bed and breakfast always includes the area-label property. So let’s see if we can get it back.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.airbnb.co.in/s/New-York--NY--United-States/homes?query=New York, NY, United States&checkin=2020-03-12&checkout=2020-03-19&adults=4&children=1&infants=0&guests=5&place_id=ChIJOwg_06VPwokRYv534QaPC8g&refinement_paths[]=/for_you&toddlers=0&source=mc_search_bar&search_type=unknown'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')#print(soup.select('.a-carousel-card')[0].get_text())for item in soup.select('[itemprop=itemListElement]'):
	try:
		print('----------------------------------------')
		print(item.select('a')[0]['aria-label'])
		#name = item.find("meta",  itemprop="name")
		print(name)		print('----------------------------------------')
	except Exception as e:
		#raise e
		print('')

This will display the result:

Now let us extract the other pieces of information.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.airbnb.co.in/s/New-York--NY--United-States/homes?query=New York, NY, United States&checkin=2020-03-12&checkout=2020-03-19&adults=4&children=1&infants=0&guests=5&place_id=ChIJOwg_06VPwokRYv534QaPC8g&refinement_paths[]=/for_you&toddlers=0&source=mc_search_bar&search_type=unknown'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.select('[itemprop=itemListElement]'):
	try:
		print('----------------------------------------')
		print(item.select('a')[0]['aria-label'])
		print(item.select('a')[0]['href'])
		print(item.select('._krjbj')[0].get_text())
		print(item.select('._krjbj')[1].get_text())
		print(item.select('._16shi2n')[0].get_text())		print(item.select('._zkkcbwd')[0].get_text())
		print(name)		print('----------------------------------------')
	except Exception as e:
		#raise e
		print('')

When you run the code:

It displays all the needed data, including reviews, ratings, links, and reduced prices.

In more complicated solutions, you must rotate the User-Agent string, so Airbnb cannot detect if you use a similar browser. If we go further, you will find that Airbnb will block your IP, ignoring all the previous efforts. This is disappointing because that is where the majority of web crawling programs fall short.

Overcoming IP Blocks

Investing in a personal rotating proxy service such as proxies API can generally make the difference between a successful and pain-free web scraping operation that will consistently do the job.

Investment in a private rotating proxy service will include Proxies API, which can often mean the change between an effective and pain-free web-extracting operation that consistently gets the job done.

Plus, with the current offer of 1000 free API requests, there’s almost nothing to lose by using our rotating proxy and comparing notes. It simply takes a single line of integration to make it almost unnoticeable.

Our rotating proxy server Proxies API is indeed a simple API that instantly solves any IP Blocking issues.

There are thousands of high-speed spinning proxies scattered over the globe.
Using our IP rotation service, you can rest assured that your IP address will be changed.
Using our automated rotation of the User-Agent-String (which mimics requests from various, valid web browsers and versions of web browsers) and our automatic CAPTCHA solving technology.
Our CAPTCHA-solving technology works automatically

Thousands of our clients have used a simple API to solve the problem of IP restrictions.

In any computer language, a basic API like the one below can access the entire system.

curl https://xbyte.io/?key=API_KEY&url=https://example.com

For any further assistance, you can kindly contact X-Byte Enterprise Crawling.

✯ Alpesh Khunt ✯

Alpesh Khunt, CEO and Founder of X-Byte Enterprise Crawling created data scraping company in 2012 to boost business growth using real-time data. With a vision for scalable solutions, he developed a trusted web scraping platform that empowers businesses with accurate insights for smarter decision-making.