Data: The Future Oil
The enormous amount of cryptocurrency data accessible on the internet is a good resource to do cryptocurrency investment and research. Getting the capability of managing and leveraging these data would provide us control in the crypto investment.
Web extraction is the procedure of downloading data from different websites as well as scraping important data. For our objective, we are fascinated about extracting cryptocurrency data.
Why Use Web Scraping?
With so many websites providing free tools, why would somebody want to gather your individual data? The majority of users would use websites like CoinGecko, CoinMarketCap, etc. to get data as well as create a watch list. Isn’t it more convenient?
We should utilize both option, proceeding ourselves from some newbies (utilize standard features of usual crypto websites) to our data analysis (data scraping and create our own brainy data.)
In our experiences, we have found the following benefits:
Maintain Focus and Control: We are more focused as well as in control, understanding that a list as well as analysis, which we build using our spreadsheet is the key working version for our investment objective. We do not need to depend on other person’s data. Skipping from one website to another also diverts us as well as sway us from our main job.
Filling the Gaps: Not all the coins are accessible on main websites. You will always find inconsistencies and gaps in a coin list. If we own data, we could manage it.
For Progressive Analysis: By getting data in the spreadsheet, you could do advance analysis as well as filter to get niche coins, which websites might not provide.
Personal Comments and Notes: You could add columns in your spreadsheets for extra comments as well as investment insights. We also add which exchange we are going to utilize, as well as what capital amount we are allocating to a coin.
For example, we could search for coins that are in gaming and Solano, when we had data in the spreadsheet:
How to Look for the Coins That are in Both Gaming and Polkadot?
As an assessment, most websites support merely one-level of filteration. For instance, CoinMarketCap could list all coins in the Polkadot ecosystem:
CoinMarketCap could list all the gaming tokens also, but not both gaming and Polkadot.
Generally, these websites just cannot go outside two/three-level filtration e.g. listing all the gaming coins associated to Polkadot.
Advanced filtration might not look like a great problem on a surface, however, with thousands of coins accessible in the market, getting that capability of automating things for the investment objectives and maintaining focus is a key for success.
Concept
We will use two libraries of Python:
BeautifulSoup is the Python library to get data out from XML, HTML, as well as other mark-up languages.
Request that is utilized for getting HTML data from a website. In case, you already get data in the HTML file, you won’t require Request library.
We also utilize Jupyter Notebook with Google Cloud Platform to run however, the given Python code could run in all platforms.
As per the Python setup, you might require to pip install beautifulsoup4.
Learn by This Example: ‘Hello World’ with Web Scraping
We will start with ‘Hello World’ for web scraping, through scraping the preliminary text about What is Binance Coin (BNB), given in the green box here.
Visit the BNB coin page with your Chrome browser and then Right-click on the page, and click Inspect to examine the elements:
Then click on a little arrow given in the center of your screen and click on corresponding web elements, as given below.
After review, we observe that all web elements are
Below div class sc-2qtjgt-0 eApVPN
Title is utilizing h2
Subtitles are utilizing h3
All rest are below p
Just go through the scraping code, which is very easy!
from bs4 import BeautifulSoup
import requests
# retrieve the web page and parse the contents
mainpage = requests.get('https://coinmarketcap.com/currencies/binance-coin/')
soup = BeautifulSoup(mainpage.content, 'html.parser')
whatis = soup.find_all("div", {"class" : "sc-2qtjgt-0 eApVPN"})
# extract elements from the contents
title = whatis[0].find_all("h2")
print(title[0].text.strip() + "\n")
for p in whatis[0].find_all('p'):
print(p.text.strip() + "\n")
Example 2: Web Extraction Coin Statistics
In the example given here, we will extract Binance Coin (BNB)’s data i.e. Market Cap, Circulating Supply, Completely Diluted Market Cap, Volume / Market Cap, Volume (24h)).
On same BNB coin page, just go to top of a page as well as click on or a web element. Observe the whole block is named:
<div class="hide statsContainer">
<div class="statsValue">
Therefore, we will find the:
<div class="hide statsContainer"> with child of type = <div class="statsValue">
statsContainer = soup.find_all("div", {"class" : "hide statsContainer"}) statsValues = statsContainer[0].find_all("div", {"class" : "statsValue"}) statsValue_marketcap = statsValues[0].text.strip() print(statsValue_marketcap) statsValue_fully_diluted_marketcap = statsValues[1].text.strip() print(statsValue_fully_diluted_marketcap) statsValue_volume = statsValues[2].text.strip() print(statsValue_volume) statsValue_volume_per_marketcap = statsValues[3].text.strip() print(statsValue_volume_per_marketcap) statsValue_circulating_supply = statsValues[4].text.strip() print(statsValue_circulating_supply)
having the given outputs (result differs and prices change repeatedly).
$104,432,294,030 $104,432,294,030 $3,550,594,245 0.034 166,801,148.00 BNB
Example 3: Exercise
In this exercise, utilize knowledge from previous two as well as check in case you can extract data for Max Supply as well as Total Supply for ADA (Cardano) and BNB.
BNB Total Supply and Max Supply
Appendix: Options to BeautifulSoup
Other options include Selenium and Scrapy. We will cover all the topics when we have the time.
Selenium and Scrapy have a sharper learning curve compared to Request that is utilized to have HTML data as well as BeautifulSoup that is utilized as a parser for HTML.
Scrapy is an entire web extraction framework that takes care of all the things from having HTML to process data.
Selenium is the browser automation tool, which can for instance allow you to steer between different pages.
Web Scraping Challenges: Durability
The key challenge of web scraping is the durability of its code. Web developers at CoinMarketCap are continually updating websites as well as old codes might not work after some time.
A promising solution is using Application Programming Interfaces (APIs) given by different platforms and websites. Although, the free versions of APIs are restricted. The data format while using the APIs is completely different from general web scraping i.e., XML or JSON, whereas in normal web scraping, you mostly deal with HTML data format.
If you want to know more about scraping CoinMarketCap data then contact X-Byte Enterprise Crawling or ask for a free quote!