How To Scrape Or Extract Movie Details From Myvue Website

Scrape Cinema’s Listings Data From Myvue, Imdb Website Using Python

Web scraping or data crawling or data extractions – these words are similar and very much effective in the world of data. Web scraping helps you to dig out any information or data from the web and let you boost your business with Data-as-a-Service (Daas) model. It is the most efficient way to extract data from the web as per the client’s requirements.

Since this article is about movie details extraction, here we would like to tell you how can we scrape the movie details using Python and LXML framework. Web scraping helps you extract data about movies, timings, seats & its availability, etc from movie websites.

Imagine all the movie data that you can gather on a daily basis. You could scrape the data for a particular actor, director, scriptwriter or genre and use the information to analyze ongoing movie trends.

The whole article is about scraping movie details based on a given location and date from www.myvue.com, the movie booking website which allows you to get all the details that you want like – movie overviews, star cast, and current show timings.

In this tutorial, we will build a web scraper for extracting data from the website – myvue.com. We will build this simple web scraper using python’s Scrapy Framework.

Tools we are going to use:

Python 2.7 (link_here)
Python pip
Scrapy
Scrapy-splash

For installing scrapy – framework and other modules in your system just open terminal and run this command.

pip install

Strategy:

As you can see in the given image, on Myvue website we have to select the locality in a way to find the ongoing movie list in particular locality. Here, we are going to list down the movie showing at “Aberdeen Myvue”. You can find the details for all the available locality on the

Myvue’s website.

Here is the link which will be create after selecting the locality.

https://www.myvue.com/cinema/Aberdeen/whats-on

For example, here I am going to search for ” Aberdeen”, so the url will be…

https://www.myvue.com/cinema/Aberdeen/whats-on

The details we are going to scrap from the website, are shown in the following image.

The data on the Myvue.com are loaded from the JS (JavaScript). So directly, we will not be able to fetch the details from this. For that, we have to install and configure our system with scrapy-splash. You can find the details for scrapy-splash from here

Also, we are going to extract the data from the website with the help of “xpath“. You can learn more about xpath from here

For getting xpath of elements, we are going to use Chrome browser’s Developer-tools. You can open developer tools by clicking F12 in Chrome window.

Following are the steps to get started with the scrapy:

Step 1: Open terminal

Step 2: Execute the following command.

scrapy startproject

ex: scrapy startproject myvueThis will generate the scrapy project, and you will find the directory structure having directory named “myvue”.

cd myvue

Change the current directory to root directory of the project.

scrapy genspider

ex: scrapy genspier myvueSpider myvue.comGenerate the main spider file in which we are going to code for extracting data.You will find the following type of directory structure.

myvuescrapy.cfg

└── myvue

│ items.py

│ middlewares.py

│ pipelines.py

│ settings.py

│ settings.pyc

│ __init__.py

│ __init__.pyc

└── spiders

__init__.py

__init__.pyc

myvueSpider.py

myvueSpider.pyc

# -*- coding: utf-8 -*-
importscrapy
fromscrapy_splash
importSplashRequest
frommyvue.items
importmyvueItem
classMyvuespiderSpider(scrapy.Spider)
name = ‘myvueSpider’
allowed_domains = [‘myvue.com’]defstart_requests(self)
locality = “Aberdeen”url = “https://www.myvue.com/cinema/%s/whats-on”%str(locality)yieldSplashRequest(url=url, callback=self.parse)defparse(self, response):divList = response.xpath(‘//*[@class=”filmlist__item”]/div’)
item = myvueItem()for div indivList:item[‘title’] = div.xpath(“./div[2]/a/span/text()”).extract_first()
item[‘shorDescription’] = div.xpath(“./div[2]/p/text()”).extract_first()
item[‘starCast’] = div.xpath(“./div[2]/div[1]/dl[1]/dd/text()”).extract_first()
item[‘runningTime’] = div.xpath(“./div[2]/div[1]/dl[2]/dd/text()”).extract_first()
item[‘showTiming’] = “|”.join(div.xpath(“./div[3]/div/div/div/ul/li/a/@title”).extract())
item[‘image_or_videoURL’] = div.xpath(“./div[1]/a/@data-videourl”).extract_first()
yielditem

importscrapy
classStoreLocatorItem(scrapy.Item):
title= scrapy.Field()
shorDescription = scrapy.Field()
starCast = scrapy.Field()
runningTime = scrapy.Field()
showTiming = scrapy.Field()
image_or_videoURL = scrapy.Field()

SPLASH_URL = ‘http://192.168.99.100:32768’ # <—- Your Splash URL here.

DOWNLOADER_MIDDLEWARES = {
‘scrapy_splash.SplashCookiesMiddleware’: 723,
‘scrapy_splash.SplashMiddleware’: 725,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’: 810,
}
SPIDER_MIDDLEWARES = {
‘scrapy_splash.SplashDeduplicateArgsMiddleware’: 100,
}
DUPEFILTER_CLASS = ‘scrapy_splash.SplashAwareDupeFilter’

Step 6: Go to the spiders folder, open terminal there and run the following command.

scrapy crawl myvue -o MovieDetails.csv

This will run our crawler and scrape data from the website, which give us the output in the csv format.

You can find the output csv file in the same “spiders” folder.

This scraper should be able to scrape the details of movies currently showing on movie booking websites. To explore more data and into the web scraping field, you can even create a complex scraper to collect the details of the available seats, ratings, votes, crew for each movie. If you would like to scrape the details of thousands of pages at very short intervals we can help you out with Web Scraping service.

✯ Alpesh Khunt ✯

Alpesh Khunt, CEO and Founder of X-Byte Enterprise Crawling created data scraping company in 2012 to boost business growth using real-time data. With a vision for scalable solutions, he developed a trusted web scraping platform that empowers businesses with accurate insights for smarter decision-making.