Spotify Data Analyzing And Visualization Using Python

Spotify turns out to provide a fantastic API for connecting to its massive collection of songs and their features. You may gain visual insights from your favorite songs, or incorporate a playback into your online application. There’s also a strong song search engine and a recommendation system to help you listen to more of your favorite music.

Initiate by creating an account on the official Spotify website, which is completely free and requires very little work.

Then, go to your application dashboard and choose “Create an app.” Fill in the blanks and get ready to explore.

Start your favorite Python IDE using your ClientID and Client Secret. It’s time to start coding.

Instead of hitting the endpoints directly, we’ll utilize SpotiPy, a wrapper over the Spotify API, to perform attractive one-line calls. Let’s get things set up.

!pip install spotipy

The spotipy is then initialized. The Spotify developer’s passwords are kept in the variables CLIENT_ID and CLIENT_SECRET of the Spotify object.

import spotipy
	from spotipy.oauth2 import SpotifyClientCredentials
	client_credentials_manager = SpotifyClientCredentials(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
	sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Scrape Artists and Tracks

Data querying is the next stage. Note that you can only get information on 50 tracks at a time. You may search for specific objects using the q argument in the sp.search() function.

artist_name = []
	track_name = []
	track_popularity = []
	artist_id = []
	track_id = []
	for i in range(0,1000,50):
	    track_results = sp.search(q='year:2021', type='track', limit=50,offset=i)
	    for i, t in enumerate(track_results['tracks']['items']):
	        artist_name.append(t['artists'][0]['name'])
	        artist_id.append(t['artists'][0]['id'])
	        track_name.append(t['name'])
	        track_id.append(t['id'])
	        track_popularity.append(t['popularity'])

Place the queried information into the Pandas Dataframe

import pandas as pd
	track_df = pd.DataFrame({'artist_name' : artist_name, 'track_name' : track_name, 'track_id' : track_id, 'track_popularity' : track_popularity, 'artist_id' : artist_id})
	print(track_df.shape)
	track_df.head()
table-1
artist_popularity = []
	artist_genres = []
	artist_followers = []
	for a_id in track_df.artist_id:
	  artist = sp.artist(a_id)
	  artist_popularity.append(artist['popularity'])
	  artist_genres.append(artist['genres'])
	  artist_followers.append(artist['followers']['total'])

Add the track_df data frame

track_df = track_df.assign(artist_popularity=artist_popularity, artist_genres=artist_genres, artist_followers=artist_followers)
	track_df.head()
table-2

Numerical Features of Retrieval Tracks

You will then plunge into numerical analysis of music, but first, you will need to gather some information. Fortunately, Spotify will offer you with detailed information on 82 million songs, which is ideal for our needs.

To begin, go to the Spotify API reference page to learn about the features that contribute to a track’s profile.

Second, get the characteristics of the tracks and put them in the data frame.

track_features = []
	for t_id in track_df['track_id']:
	  af = sp.audio_features(t_id)
	  track_features.append(af)
	tf_df = pd.DataFrame(columns = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature'])
	for item in track_features:
	  for feat in item:
	    tf_df = tf_df.append(feat, ignore_index=True)
	tf_df.head()

The features data frame for the songs looks like this:

table-3

Let’s remove some few unnecessary columns and examine the data frames’ structure:

cols_to_drop2 = ['key','mode','type', 'uri','track_href','analysis_url']
	tf_df = tf_df.drop(columns=cols_to_drop2)
	print(track_df.info())
	print(tf_df.info())
Reading the data into a data frame using Pandas
code-1

Column types inference is the last step before data exploration and visualization.

track_df['artist_name'] = track_df['artist_name'].astype("string")
	track_df['track_name'] = track_df['track_name'].astype("string")
	track_df['track_id'] = track_df['track_id'].astype("string")
	track_df['artist_id'] = track_df['artist_id'].astype("string")
	tf_df['duration_ms'] = pd.to_numeric(tf_df['duration_ms'])
	tf_df['instrumentalness'] = pd.to_numeric(tf_df['instrumentalness'])
	tf_df['time_signature'] = tf_df['time_signature'].astype("category")
	print(track_df.info())
	print(tf_df.info())

The data frames that result have the following structure:

code-2

Exploring 2021 Trends

Are you in search of popular tracks of 2021?

track_df.sort_values(by=['track_popularity'], ascending=False)[['track_name', 'artist_name']].head(20)
table-4

Who is Most Followed?

by_art_fol = pd.DataFrame(track_df.sort_values(by=['artist_followers'], ascending=False)[['artist_followers','artist_popularity', 'artist_name','artist_genres']])
	by_art_fol.astype(str).drop_duplicates().head(20)
table-5

Let’s examine how many genres the track_df data frame contains:

def to_1D(series):
	 return pd.Series([x for _list in series for x in _list])
	to_1D(track_df['artist_genres']).value_counts().head(20)
code-3

Visualizing the above results:

import matplotlib.pyplot as plt
	fig, ax = plt.subplots(figsize = (14,4))
	ax.bar(to_1D(track_df['artist_genres']).value_counts().index[:10],
	        to_1D(track_df['artist_genres']).value_counts().values[:10])
	ax.set_ylabel("Frequency", size = 12)
	ax.set_title("Top genres", size = 14)
graph-1

Searching for the top 10 artists filtered by the number of the followers

top_10_genres = list(to_1D(track_df['artist_genres']).value_counts().index[:20])
	top_artists_by_genre = []
	for genre in top_10_genres:
	  for index, row in by_art_fol.iterrows():
	    if genre in row['artist_genres']:
	      top_artists_by_genre.append({'artist_name':row['artist_name'], 'artist_genre':genre})
	      break
	pd.json_normalize(top_artists_by_genre)
table-6

Searching for the top 20 filtered by popularity for every top 10 genres:

by_track_pop = pd.DataFrame(track_df.sort_values(by=['track_popularity'], ascending=False)[['track_popularity','track_name', 'artist_name','artist_genres', 'track_id']])
	by_track_pop.astype(str).drop_duplicates().head(20)
	top_songs_by_genre = []
	for genre in top_10_genres:
	  for index, row in by_track_pop.iterrows():
	    if genre in row['artist_genres']:
	      top_songs_by_genre.append({'track_name':row['track_name'], 'track_popularity':row['track_popularity'],'artist_name':row['artist_name'], 'artist_genre':genre})
	      break
	pd.json_normalize(top_songs_by_genre)
table-7

Visualizing Features of the Tracks

import seaborn as sn
	sn.set(rc = {'figure.figsize':(12,10)})
	sn.heatmap(tf_df.corr(), annot=True)
	plt.show()
graph-2

A multivariate KDE might also be plotted for a specific pair of variables:

sn.set(rc = {'figure.figsize':(20,20)})
	sn.jointplot(data=tf_df, x="loudness", y="energy", kind="kde")
graph-3

What distinguishes the most popular tracks from the rest of the dataset? Here, we will learn by producing a feature portrait of the relevant sets based on the mean values of selected characteristics.

feat_cols = ['danceability', 'energy', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence']
	top_100_feat = pd.DataFrame(columns=feat_cols)
	for i, track in by_track_pop[:100].iterrows():
	  features = tf_df[tf_df['id'] == track['track_id']]
	  top_100_feat = top_100_feat.append(features, ignore_index=True)
	top_100_feat = top_100_feat[feat_cols]
	

	from sklearn import preprocessing
	mean_vals = pd.DataFrame(columns=feat_cols)
	mean_vals = mean_vals.append(top_100_feat.mean(), ignore_index=True)
	mean_vals = mean_vals.append(tf_df[feat_cols].mean(), ignore_index=True)
	print(mean_vals)
	

	import plotly.graph_objects as go
	import plotly.offline as pyo
	fig = go.Figure(
	    data=[
	        go.Scatterpolar(r=mean_vals.iloc[0], theta=feat_cols, fill='toself', name='Top 100'),
	        go.Scatterpolar(r=mean_vals.iloc[1], theta=feat_cols, fill='toself', name='All'),
	    ],
	    layout=go.Layout(
	        title=go.layout.Title(text='Feature comparison'),
	        polar={'radialaxis': {'visible': True}},
	        showlegend=True
	    )
	)
	

	#pyo.plot(fig)
	fig.show()

The most famous songs appear to be slightly more rhythmic and have more positivity. They’re also devoid of efficiency, effectiveness and vibrancy.

Get Suggestions

Our research concludes with music recommendations based on artist_id, genre, and track id. Spotify never runs out of content recommendations since the output is randomized.

rec = sp.recommendations(seed_artists=["3PhoLpVuITZKcymswpck5b"], seed_genres=["pop"], seed_tracks=["1r9xUipOqoNwggBpENDsvJ"], limit=100)
	for track in rec['tracks']:
	  print(track['artists'][0]['name'], track['name'])
code-4

This blog will help you understand the idea regarding Spotify Web APIs methods and how the fetched information will be visualized and analyzed.

For any web scraping services, contact X-Byte Enterprise Crawling today!

Request for a quote!

Send Message

    Send Message