
These topics will be discussed here.
- Data requests and downloads
- Cleaning and preparing data
- Analyzing data and gaining interesting insights from it
Requesting and Downloading Data
These are the steps to take. Apple will provide you with your personal information if you ask for it.
- Go to apple.com/privacy.
- Please sign in to your account.
- Make a click on Make a request for a copy of your information.
- Make sure Apple Media Services Information is checked, then click Continue at the bottom.
- Select the default size and click Finish Request.



Data Preparation and Cleaning
- Import any libraries that are required.
- Obtain the dataset (csv file)
- Examine the dataframe’s form and columns.
- Look for any missing values.
- Examine the column’s fundamental statistics.
Importing Libraries
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import plotly_express as px pd.set_option('display.max_columns', None)
Load the Dataset
music_df.head()
Apple Id Number | Apple Music Subscription | Artist Name | Build Version | Client IP Address | Content Name | Content Provider | Content Specific Type | Device Identifier | End Position In Milliseconds | End Reason Type | Event End Timestamp | Event Reason Hint Type | Event Received Timestamp | Event Start Timestamp | Event Type | Feature Name | Genre | Item Type | Media Duration In Milliseconds | Media Type | Metrics Bucket Id | Metrics Client Id | Milliseconds Since Play | Offline | Original Title | Play Duration Milliseconds | Provided Audio Bit Depth | Provided Audio Channel | Provided Audio Sample Rate | Provided Bit Rate | Provided Codec | Provided Playback Format | Source Type | Start Position In Milliseconds | Store Country Name | Targeted Audio Bit Depth | Targeted Audio Channel | Targeted Audio Sample Rate | Targeted Bit Rate | Targeted Codec | Targeted Playback Format | User’s Audio Quality | User’s Playback Format | UTC Offset In Seconds | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 11569060994 | True | Bazzi | Music/3.1 iOS/13.0 model/iPhone9,3 hwp/t8010 b… | 106.66.247.0 | Paradise | The Warner Music Group | Song | f625ff5caca143772ec5bb7962ef7f5f9267f36a | 3312.0 | MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM | 2019-06-28T16:46:33.382Z | NOT_SPECIFIED | 2019-06-28T17:21:01.944Z | 2019-06-28T16:46:30.070Z | PLAY_END | library / downloaded_music / songs | Pop | ITUNES_STORE_CONTENT | 169087.0 | AUDIO | 7044.0 | 3z44Gmyhz4lXz4xCzBqJzr9hlFsSr | 2068562 | True | NaN | 3312.0 | NaN | NaN | NaN | NaN | NaN | NaN | ORIGINATING_DEVICE | 0 | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 19800 |
1 | 11569060994 | True | Raftaar | Music/1.0 macOS/10.15 build/19A582a model/MacB… | 117.206.166.3 | Aage Chal | Hungama Digital Media Entertainment Pvt. | Song | 1C36BB164346 | 131214.0 | MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM | 2020-06-18T15:29:01.975Z | NOT_SPECIFIED | 2020-06-21T04:13:37.975Z | 2020-06-18T15:26:50.761Z | PLAY_END | library | Indian Pop | ITUNES_STORE_CONTENT | 229800.0 | AUDIO | 3331.0 | 3z4uZCHNzEVrz4vkzAsJzGurr2k8E | 218676000 | False | NaN | 131214.0 | NaN | NaN | NaN | NaN | NaN | NaN | ORIGINATING_DEVICE | 0 | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 19800 |
2 | 11569060994 | True | Master Rakesh, Dr Zeus | Music/3.1 iOS/13.0 model/iPhone9,3 hwp/t8010 b… | 117.228.170.82 | Kangna (feat. Deepti & Shortie) | The Orchard Enterprises Inc. | Song | f625ff5caca143772ec5bb7962ef7f5f9267f36a | 0.0 | MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM | 2019-08-27T13:22:00.883Z | NOT_SPECIFIED | 2019-08-27T13:22:05.551Z | 2019-08-27T13:22:00.883Z | PLAY_END | library / album_detail | Asia | ITUNES_STORE_CONTENT | 209118.0 | AUDIO | 7044.0 | 3z44Gmyhz4lXz4xCzBqJzr9hlFsSr | 4668 | False | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | ORIGINATING_DEVICE | 0 | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 19800 |
3 | 11569060994 | True | The Weeknd | Music/3.1 iOS/11.3 model/iPhone9,3 hwp/t8010 b… | 106.77.1.155 | Can’t Feel My Face | Universal Music International | Song | f625ff5caca143772ec5bb7962ef7f5f9267f36a | 31027.0 | SCRUB_BEGIN | 2018-04-03T18:15:01.395Z | NOT_SPECIFIED | 2018-04-03T18:15:14.359Z | 2018-04-03T18:15:00.362Z | PLAY_END | library / downloaded_music / songs | R&B/Soul | ITUNES_STORE_CONTENT | 213577.0 | AUDIO | 4877.0 | 3z4pGutFz1mxz4yYz9qazYSewotZt | 12964 | False | NaN | 1033.0 | NaN | NaN | NaN | NaN | NaN | NaN | ORIGINATING_DEVICE | 29994 | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 19800 |
4 | 11569060994 | True | Nucleya, DIVINE | Music/3.1 iOS/11.2 model/iPhone9,3 hwp/t8010 b… | 42.106.57.192 | Paintra (From “Mukkabaaz”) | Eros International USA Inc | Song | f625ff5caca143772ec5bb7962ef7f5f9267f36a | 119958.0 | MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM | 2017-12-20T12:42:47.599Z | NOT_SPECIFIED | 2017-12-20T12:42:47.771Z | 2017-12-20T12:41:25.722Z | PLAY_END | library / album_detail | Bollywood | ITUNES_STORE_CONTENT | 232222.0 | AUDIO | 4877.0 | 3z4pGutFz1mxz4yYz9qazYSewotZt | 172 | False | NaN | 81877.0 | NaN | NaN | NaN | NaN | NaN | NaN | ORIGINATING_DEVICE | 38081 | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 19800 |
There are 2,11,47 music streaming tracks with 45 features in total. To gain insights from our information, our first task is to remove the columns that aren’t needed. There are several columns in which all of the values are NULL. We must first eliminate such columns.
nans = [col for col in music_df.columns if music_df[col].isnull().all()==True] print(nans)
['Original Title', 'Provided Audio Bit Depth', 'Provided Audio Channel', 'Provided Audio Sample Rate', 'Provided Bit Rate', 'Provided Codec', 'Provided Playback Format', 'Targeted Audio Bit Depth', 'Targeted Audio Channel', 'Targeted Audio Sample Rate', 'Targeted Bit Rate', 'Targeted Codec', 'Targeted Playback Format', 'User’s Audio Quality', 'User’s Playback Format']
# drop the above columns from the dataframe music_df.drop(nans, axis=1, inplace=True)
There are a few columns like “Apple Id Number” and “Build Version” that aren’t really useful, so we’ll remove those as well.
to_delete = ['Apple Id Number', 'Build Version', 'Client IP Address', 'Device Identifier', 'Metrics Bucket Id', 'Metrics Client Id', 'UTC Offset In Seconds', 'Store Country Name'] music_df.drop(to_delete, axis=1, inplace=True)
From the original 45 columns, we now have 22 columns in our dataframe. The final issue is converting object-formatted timestamp columns to the actual TimeStamp variable.
music_df['Event End Timestamp'] = pd.to_datetime(music_df['Event End Timestamp'], format='%Y-%m-%dT%H:%M:%S') music_df['Event Received Timestamp'] = pd.to_datetime(music_df['Event Received Timestamp'], format='%Y-%m-%dT%H:%M:%S') music_df['Event Start Timestamp'] = pd.to_datetime(music_df['Event Start Timestamp'], format='%Y-%m-%dT%H:%M:%S')
Questions and Answers
1. Who are the Top 10 Favorite Artists?
fig = px.bar(top_10_artist, title="Top 10 favourite artists", labels={"index":"Artists", 'value':"No. of times song played"}, color_discrete_sequence=px.colors.qualitative.Set2) fig.show()

2. Which are the Top 20 Songs Played? (Favorite Songs)
fig = px.bar(top_20_songs, title="Top 20 favourite songs", labels={"index":"Songs", 'value':"No. of times song played"}, color_discrete_sequence=px.colors.qualitative.Bold) fig.update_xaxes(tickangle=22) fig.show()

3. Who are the Top 10 Favorite Content providers?
fig = px.bar(top_10_labels, title="Top 20 favourite labels", labels={"index":"Music Labels", 'value':"No. of times song label played"}, color_discrete_sequence=px.colors.qualitative.Pastel) fig.update_xaxes(tickangle=25) fig.show()

To check top tracks from a specific music label provider, we will create a little helper function.
def top_10_song_of_label(label): """ Function to see what are the top musics played from particular label. """ # use groupby method and sort ascending label_df = music_df[music_df['Content Provider'] == label] top_10_song = label_df['Content Name'].value_counts()[:10] print(top_10_song) fig = px.bar(top_10_song, labels={"index": "Song Names", "value": "No. of time song played", "variable":"Song name"}, title=f"Top songs from {label}") fig.show() and it goes like this – for example, top Warner Music Group songs top_10_song_of_label('The Warner Music Group')
Hola (feat. Maluma) 82 I Don't Care 69 Thinking Out Loud 63 Attention 62 Perfect 60 1, 2, 3 (feat. Jason Derulo & De La Ghetto) 59 Dirty Sexy Money (feat. Charli XCX & French Montana) 52 Hymn for the Weekend 51 Crown 50 10,000 Hours 48 Name: Content Name, dtype: int64

Top Songs from T-Series
top_10_song_of_label(‘Super Cassettes Industries Pvt Limited a.k.a. T-Series’)
Ishq Tera 66 Chota Sa Fasana 60 Maahi Ve 59 High Rated Gabru 50 Tu Chale 45 Tera Yaar Hoon Main 45 Befikra 41 Zindagi Do Pal Ki 40 Duniyaa 40 Chalte Chalte 40 Name: Content Name, dtype: int64

4. Which are the Top 10 Songs According to Playtime?
fig = px.bar(top_longest_played[:10], labels={"Content Name": "Song Names", "value": "Play Time (in mins)", "variable":"Duration"}, color_discrete_sequence=colors.G10_r) fig.show()

5. What is the Usual Reason to End the Song?

6. Which is Your Most Favorite Genre?
fig = px.bar(top_genre, color_discrete_sequence=colors.T10_r) fig.show()

7. Which Media Type Do You Prefer Most on Apple Music?
fig = px.pie(music_df, names='Media Type', color_discrete_sequence=colors.Dark2, title="Most preferable Media Type (eg. Audio/Video)") fig.show()

8. What Would You Prefer Listening to Music When You Are Online/Offline?
fig = px.pie(music_df, names="Offline", title="Do you prefer listening to music Offline?") fig.show()

9. Which Time do You Prefer to Listen to Music?
fig = px.bar(hours, title="Most active hours (24hr)", labels={"value": "count", "Event Start Timestamp":"Timings (hours)"}, color_discrete_sequence=colors.Prism) fig.update_xaxes(dtick=1) fig.show()

10. Which Month have You Listened to Songs Most?
m = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept', 'Oct', 'Nov','Dec'] fig = px.bar(months, title="Most active Months", text=m, labels={"value": "count", "Event Start Timestamp":"Months"}, color_discrete_sequence=colors.Light24) fig.update_xaxes(dtick=1) fig.show()

11. Which Year Have You Listened to Songs Most on Apple Music?
fig = px.bar(years, title="Most active year", labels={"value": "count", "Event Start Timestamp":"Year"}, color_discrete_sequence=colors.Prism_r) fig.update_xaxes(dtick=1) fig.show()

12. Total Time Spent Listening to Music
total_mins = total_time/60000 print("Total minutes spent: {:.2f} mins".format(total_mins)) total_hours = total_mins/60 print("Total hours spent: {:.2f} hours".format(total_hours)) Total minutes spent: 24568.91 mins Total hours spent: 409.48 hours
From beginning to end, the maximum amount of time you could listen to music is,
total_possible_hours = total_possible_time * 24 print("Total possible hours from start to end: {} hours".format(total_possible_hours)) Total possible hours from start to end: 31632 hours
The important question now is how much of my total available time was spent listening to music.
hours_spent_list = np.array([total_hours, total_possible_hours]) hours_spent_list_labels = [" Actual Hours Spent", "Possible Hours"] fig, ax = plt.subplots(figsize=(12,6)) ax.pie(hours_spent_list, labels= hours_spent_list_labels, autopct='%1.1f%%', explode=[0.2,0.2], startangle=180, shadow = True); plt.title("Hours Spent Percentage");

13. Daily Average Songs Played
total_songs = music_df.shape[0] print("Daily average of songs played: {:.2f} songs".format(total_songs/total_possible_time)) Daily average of songs played: 16.04 songs
You can Connect with us at X-Byte Enterprise Crawling for further queries and Request for a quote!!