How Web Scraping Is Used In Apple Music Streaming Data Analysis?

We will analyze my personal music streaming statistics from Apple Music in this study. Apple Music is an Apple Inc. music and video streaming service. My personal broadcasting on the platform is represented by the dataset utilized here.

These topics will be discussed here.

Data requests and downloads
Cleaning and preparing data
Analyzing data and gaining interesting insights from it

Requesting and Downloading Data

These are the steps to take. Apple will provide you with your personal information if you ask for it.

Go to apple.com/privacy.
Please sign in to your account.
Make a click on Make a request for a copy of your information.
Make sure Apple Media Services Information is checked, then click Continue at the bottom.
Select the default size and click Finish Request.

Data Preparation and Cleaning

Import any libraries that are required.
Obtain the dataset (csv file)
Examine the dataframe’s form and columns.
Look for any missing values.
Examine the column’s fundamental statistics.

Importing Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly_express as px

pd.set_option('display.max_columns', None)

Load the Dataset

music_df.head()

	Apple Id Number	Apple Music Subscription	Artist Name	Build Version	Client IP Address	Content Name	Content Provider	Content Specific Type	Device Identifier	End Position In Milliseconds	End Reason Type	Event End Timestamp	Event Reason Hint Type	Event Received Timestamp	Event Start Timestamp	Event Type	Feature Name	Genre	Item Type	Media Duration In Milliseconds	Media Type	Metrics Bucket Id	Metrics Client Id	Milliseconds Since Play	Offline	Original Title	Play Duration Milliseconds	Provided Audio Bit Depth	Provided Audio Channel	Provided Audio Sample Rate	Provided Bit Rate	Provided Codec	Provided Playback Format	Source Type	Start Position In Milliseconds	Store Country Name	Targeted Audio Bit Depth	Targeted Audio Channel	Targeted Audio Sample Rate	Targeted Bit Rate	Targeted Codec	Targeted Playback Format	User’s Audio Quality	User’s Playback Format	UTC Offset In Seconds
0	11569060994	True	Bazzi	Music/3.1 iOS/13.0 model/iPhone9,3 hwp/t8010 b…	106.66.247.0	Paradise	The Warner Music Group	Song	f625ff5caca143772ec5bb7962ef7f5f9267f36a	3312.0	MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM	2019-06-28T16:46:33.382Z	NOT_SPECIFIED	2019-06-28T17:21:01.944Z	2019-06-28T16:46:30.070Z	PLAY_END	library / downloaded_music / songs	Pop	ITUNES_STORE_CONTENT	169087.0	AUDIO	7044.0	3z44Gmyhz4lXz4xCzBqJzr9hlFsSr	2068562	True	NaN	3312.0	NaN	NaN	NaN	NaN	NaN	NaN	ORIGINATING_DEVICE	0	India	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	19800
1	11569060994	True	Raftaar	Music/1.0 macOS/10.15 build/19A582a model/MacB…	117.206.166.3	Aage Chal	Hungama Digital Media Entertainment Pvt.	Song	1C36BB164346	131214.0	MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM	2020-06-18T15:29:01.975Z	NOT_SPECIFIED	2020-06-21T04:13:37.975Z	2020-06-18T15:26:50.761Z	PLAY_END	library	Indian Pop	ITUNES_STORE_CONTENT	229800.0	AUDIO	3331.0	3z4uZCHNzEVrz4vkzAsJzGurr2k8E	218676000	False	NaN	131214.0	NaN	NaN	NaN	NaN	NaN	NaN	ORIGINATING_DEVICE	0	India	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	19800
2	11569060994	True	Master Rakesh, Dr Zeus	Music/3.1 iOS/13.0 model/iPhone9,3 hwp/t8010 b…	117.228.170.82	Kangna (feat. Deepti & Shortie)	The Orchard Enterprises Inc.	Song	f625ff5caca143772ec5bb7962ef7f5f9267f36a	0.0	MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM	2019-08-27T13:22:00.883Z	NOT_SPECIFIED	2019-08-27T13:22:05.551Z	2019-08-27T13:22:00.883Z	PLAY_END	library / album_detail	Asia	ITUNES_STORE_CONTENT	209118.0	AUDIO	7044.0	3z44Gmyhz4lXz4xCzBqJzr9hlFsSr	4668	False	NaN	0.0	NaN	NaN	NaN	NaN	NaN	NaN	ORIGINATING_DEVICE	0	India	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	19800
3	11569060994	True	The Weeknd	Music/3.1 iOS/11.3 model/iPhone9,3 hwp/t8010 b…	106.77.1.155	Can’t Feel My Face	Universal Music International	Song	f625ff5caca143772ec5bb7962ef7f5f9267f36a	31027.0	SCRUB_BEGIN	2018-04-03T18:15:01.395Z	NOT_SPECIFIED	2018-04-03T18:15:14.359Z	2018-04-03T18:15:00.362Z	PLAY_END	library / downloaded_music / songs	R&B/Soul	ITUNES_STORE_CONTENT	213577.0	AUDIO	4877.0	3z4pGutFz1mxz4yYz9qazYSewotZt	12964	False	NaN	1033.0	NaN	NaN	NaN	NaN	NaN	NaN	ORIGINATING_DEVICE	29994	India	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	19800
4	11569060994	True	Nucleya, DIVINE	Music/3.1 iOS/11.2 model/iPhone9,3 hwp/t8010 b…	42.106.57.192	Paintra (From “Mukkabaaz”)	Eros International USA Inc	Song	f625ff5caca143772ec5bb7962ef7f5f9267f36a	119958.0	MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM	2017-12-20T12:42:47.599Z	NOT_SPECIFIED	2017-12-20T12:42:47.771Z	2017-12-20T12:41:25.722Z	PLAY_END	library / album_detail	Bollywood	ITUNES_STORE_CONTENT	232222.0	AUDIO	4877.0	3z4pGutFz1mxz4yYz9qazYSewotZt	172	False	NaN	81877.0	NaN	NaN	NaN	NaN	NaN	NaN	ORIGINATING_DEVICE	38081	India	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	19800

There are 2,11,47 music streaming tracks with 45 features in total. To gain insights from our information, our first task is to remove the columns that aren’t needed. There are several columns in which all of the values are NULL. We must first eliminate such columns.

nans = [col for col in music_df.columns if music_df[col].isnull().all()==True]
print(nans)

['Original Title',
'Provided Audio Bit Depth',
'Provided Audio Channel',
'Provided Audio Sample Rate',
'Provided Bit Rate',
'Provided Codec',
'Provided Playback Format',
'Targeted Audio Bit Depth',
'Targeted Audio Channel',
'Targeted Audio Sample Rate',
'Targeted Bit Rate',
'Targeted Codec',
'Targeted Playback Format',
'User’s Audio Quality',
'User’s Playback Format']

# drop the above columns from the dataframe
music_df.drop(nans, axis=1, inplace=True)

There are a few columns like “Apple Id Number” and “Build Version” that aren’t really useful, so we’ll remove those as well.

to_delete = ['Apple Id Number', 'Build Version', 'Client IP Address', 'Device Identifier', 'Metrics Bucket Id', 'Metrics Client Id', 'UTC Offset In Seconds', 'Store Country Name']
music_df.drop(to_delete, axis=1, inplace=True)

From the original 45 columns, we now have 22 columns in our dataframe. The final issue is converting object-formatted timestamp columns to the actual TimeStamp variable.

music_df['Event End Timestamp'] = pd.to_datetime(music_df['Event End Timestamp'], format='%Y-%m-%dT%H:%M:%S')
music_df['Event Received Timestamp'] = pd.to_datetime(music_df['Event Received Timestamp'], format='%Y-%m-%dT%H:%M:%S')
music_df['Event Start Timestamp'] = pd.to_datetime(music_df['Event Start Timestamp'], format='%Y-%m-%dT%H:%M:%S')

Questions and Answers

1. Who are the Top 10 Favorite Artists?

fig = px.bar(top_10_artist, title="Top 10 favourite artists", labels={"index":"Artists", 'value':"No. of times song played"}, color_discrete_sequence=px.colors.qualitative.Set2)
fig.show()

2. Which are the Top 20 Songs Played? (Favorite Songs)

fig = px.bar(top_20_songs, title="Top 20 favourite songs", labels={"index":"Songs", 'value':"No. of times song played"}, color_discrete_sequence=px.colors.qualitative.Bold)
fig.update_xaxes(tickangle=22)
fig.show()

3. Who are the Top 10 Favorite Content providers?

fig = px.bar(top_10_labels, title="Top 20 favourite labels", labels={"index":"Music Labels", 'value':"No. of times song label played"}, color_discrete_sequence=px.colors.qualitative.Pastel)
fig.update_xaxes(tickangle=25)
fig.show()

To check top tracks from a specific music label provider, we will create a little helper function.

def top_10_song_of_label(label):
    """
    Function to see what are the top musics played from particular label. 
    """
    # use groupby method and sort ascending
    label_df = music_df[music_df['Content Provider'] == label]
    top_10_song = label_df['Content Name'].value_counts()[:10]
    print(top_10_song)
    fig = px.bar(top_10_song, labels={"index": "Song Names", "value": "No. of time song played", "variable":"Song name"}, title=f"Top songs from {label}")
    fig.show()
and it goes like this – for example, top Warner Music Group songs
top_10_song_of_label('The Warner Music Group')

Hola (feat. Maluma)                                     82
I Don't Care                                            69
Thinking Out Loud                                       63
Attention                                               62
Perfect                                                 60
1, 2, 3 (feat. Jason Derulo & De La Ghetto)             59
Dirty Sexy Money (feat. Charli XCX & French Montana)    52
Hymn for the Weekend                                    51
Crown                                                   50
10,000 Hours                                            48
Name: Content Name, dtype: int64

Top Songs from T-Series

top_10_song_of_label(‘Super Cassettes Industries Pvt Limited a.k.a. T-Series’)

Ishq Tera              66
Chota Sa Fasana        60
Maahi Ve               59
High Rated Gabru       50
Tu Chale               45
Tera Yaar Hoon Main    45
Befikra                41
Zindagi Do Pal Ki      40
Duniyaa                40
Chalte Chalte          40
Name: Content Name, dtype: int64

4. Which are the Top 10 Songs According to Playtime?

fig = px.bar(top_longest_played[:10], labels={"Content Name": "Song Names", "value": "Play Time (in mins)", "variable":"Duration"}, color_discrete_sequence=colors.G10_r)
fig.show()

5. What is the Usual Reason to End the Song?

6. Which is Your Most Favorite Genre?

fig = px.bar(top_genre, color_discrete_sequence=colors.T10_r)
fig.show()

7. Which Media Type Do You Prefer Most on Apple Music?

fig = px.pie(music_df, names='Media Type', color_discrete_sequence=colors.Dark2, title="Most preferable Media Type (eg. Audio/Video)")
fig.show()

8. What Would You Prefer Listening to Music When You Are Online/Offline?

fig = px.pie(music_df, names="Offline", title="Do you prefer listening to music Offline?")
fig.show()

9. Which Time do You Prefer to Listen to Music?

fig = px.bar(hours, title="Most active hours (24hr)", labels={"value": "count", "Event Start Timestamp":"Timings (hours)"}, color_discrete_sequence=colors.Prism)
fig.update_xaxes(dtick=1)
fig.show()

10. Which Month have You Listened to Songs Most?

m = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept', 'Oct', 'Nov','Dec']
fig = px.bar(months, title="Most active Months", text=m, labels={"value": "count", "Event Start Timestamp":"Months"}, color_discrete_sequence=colors.Light24)
fig.update_xaxes(dtick=1)
fig.show()

11. Which Year Have You Listened to Songs Most on Apple Music?

fig = px.bar(years, title="Most active year", labels={"value": "count", "Event Start Timestamp":"Year"}, color_discrete_sequence=colors.Prism_r)
fig.update_xaxes(dtick=1)
fig.show()

12. Total Time Spent Listening to Music

total_mins = total_time/60000
print("Total minutes spent: {:.2f} mins".format(total_mins))
total_hours = total_mins/60
print("Total hours spent: {:.2f} hours".format(total_hours))
Total minutes spent: 24568.91 mins Total hours spent: 409.48 hours

From beginning to end, the maximum amount of time you could listen to music is,

total_possible_hours = total_possible_time * 24
print("Total possible hours from start to end: {} hours".format(total_possible_hours))
Total possible hours from start to end: 31632 hours

The important question now is how much of my total available time was spent listening to music.

hours_spent_list = np.array([total_hours, total_possible_hours])
hours_spent_list_labels = [" Actual Hours Spent", "Possible Hours"]

fig, ax = plt.subplots(figsize=(12,6))
ax.pie(hours_spent_list, labels= hours_spent_list_labels, autopct='%1.1f%%',  explode=[0.2,0.2], startangle=180, shadow = True);
plt.title("Hours Spent Percentage");

13. Daily Average Songs Played

total_songs = music_df.shape[0]
print("Daily average of songs played: {:.2f} songs".format(total_songs/total_possible_time))
Daily average of songs played: 16.04 songs

You can Connect with us at X-Byte Enterprise Crawling for further queries and Request for a quote!!

✯ Alpesh Khunt ✯

Alpesh Khunt, CEO and Founder of X-Byte Enterprise Crawling created data scraping company in 2012 to boost business growth using real-time data. With a vision for scalable solutions, he developed a trusted web scraping platform that empowers businesses with accurate insights for smarter decision-making.