Using Spotify data to find the happiest emo song

Published in

Analytics Vidhya

7 min readJan 17, 2021

I recently discovered Charlie Thompson’s spotifyr package which provides an easy way to analyse Spotify data using the statistical programming language R. Many people have done interesting things with this package such as Caitlin Hudon’s search for the most depressing Christmas song and Charlie Thompson’s own search for the most depressing Radiohead song. Inspired by these analyses, as a self-confessed emo kid in a previous life, I decided to use it to try and find the happiest emo song.

If you’d like to try using spotifyr for yourself, you can install it using the instructions on Charlie Thompson’s site. (At the time of writing, spotifyr isn’t hosted on CRAN, so I installed it from Github.)

All of my code for this project can be found here.

Please note that some song titles contain potentially offensive language!

Which songs?

The first challenge of the project was to work out which songs to consider. To avoid any debates over what should or should not be labelled as emo, I let Spotify decide and only included songs from their 80-song “Emo Forever” playlist. While this is hardly a comprehensive list, it gave me a fairly representative sample of relatively well-known emo pop songs from the genre’s heyday in the 00s and early 10s.

Songs in a playlist can be fetched with spotifyr by copying the playlist’s Spotify URI (found in the “Share” menu on the Spotify app) and using the get_playlist_tracks function.

Musical polarity

In the background, every track on Spotify has various attributes associated with it. As well as familiar musical features such as key and tempo, these include more subjective measures of a track’s mood. These algorithmically generated measures are purely based on a track’s audio features rather than its lyrics (although Spotify are secretive about exactly what properties of a track contribute towards them).

The two most relevant attributes for this analysis are valence and energy. According to the official documentation, valence measures “the musical positiveness conveyed by a track”, whereas energy “represents a perceptual measure of intensity and activity”. Both are measured on a scale from 0 to 1.

This plot illustrates the overall mood of songs with different valence and energy values, along with some well-known examples.

Low valence and energy: sad. Low valence, high energy: angry. High valence, low energy: calm. High valence and energy: happy.

(Credit to Miriam Quick in this article for these examples and descriptions of the four quadrants.)

The happiest songs are found in the upper right quadrant of this grid, so to quantify the “musical polarity” of each track in the playlist I calculated the distance of its (valence, energy) coordinate from (0, 0):

Musical polarity equals square root of variance squared plus energy squared

Then I arranged the playlist according to this measure. These were the top 5 songs:

Top 3: 1. The Middle by Jimmy Eat World. 2. Nikki by Forever the Sickest Kids. 3. Warm Me Up by The Audition.

And the bottom 5:

Bottom 3: 3. Hush by Automatic Loveletter. 2. Emily by From First to Last. 1. Call Off the Bells by The Early November.

The Middle is the happiest song I know that’s still generally considered ‘emo’, and before starting this project I’d predicted that it would score highly, so I was pleased to see it take the top spot here. Meanwhile Call Off the Bells stands out as having a much lower polarity score than the other songs in the playlist. What if we considered lyrics as well?

Lyrical polarity

spotifyr pairs well with the genius package which provides access to song lyrics from genius.com via R. genius’s genius_lyrics function takes an artist and track title and returns its lyrics.

I started by cleaning the data to remove punctuation in some track names and to edit some non-unique artist names (such as Aiden, who happen to share their name with an Italian rapper). The function would sometimes randomly return NA values, so I wrote a quick wrapper function to make it try multiple times before giving up with a warning if it still failed.

Once I’d imported the lyrics for the songs in the playlist I used the polarity function from the qdap package to perform an approximate sentiment analysis. This function identifies positive or negative words in a piece of text, then looks at the cluster of words surrounding each one to identify whether it’s being negated (e.g. “not good”) or amplified (e.g. “very good”). Roughly, the higher the density of positive clusters, the higher the text’s overall polarity score (which can be positive or negative).

These turned out to be the 5 most lyrically positive songs:

And the 5 most lyrically negative:

A strong showing for one of my personal favourites Alive With the Glory of Love here, a song about the determination of a couple during the Holocaust which is uplifting despite its dark subject matter. I was less familiar with Car Underwater, but given that the lyrics of the chorus are

I’m in a car underwater with time to kill, thinking back I forgot to tell you this
I didn’t care that you left and abandoned me, what hurts more is I would still die for you

… then this seems pretty accurate.

Combining everything

Having calculated measures for both musical and lyrical positivity, it was time to try combining them into one general measure of positivity. This is inherently difficult — the two measures of positivity are fairly arbitrary and measured on different scales, and different people are impacted to different extents by music and lyrics, so this was a subjective decision. As a result I tried a few different methods to combine them, aiming to choose a relatively simple one that best matched my own opinion of the songs’ positivity.

The measure I settled on was as follows. First I mapped both measures to be between 0 and 1 by dividing the musical polarity by √2 (the maximum possible value for the musical polarity) and by taking the standard logistic function of the lyrical polarity. Then I found the distance of the transformed (musical polarity, lyrical polarity) coordinate from (0, 0).

This was the top 5:

Top 3: 1. Alive With the Glory of Love by Say Anything. 2. Warm Me Up by The Audition. 3. Misery Business by Paramore.

And the bottom 5:

Bottom 3: 3. Radio by Alkaline Trio. 2. Car Underwater by Armor for Sleep. 1. Call Off the Bells by The Early November.

Alive With the Glory of Love takes the overall top spot with its combination of high musical and lyrical polarity, despite the other entries in the top 5 having higher musical polarity scores. Meanwhile Call Off The Bells ends up with the lowest overall polarity score due to its extremely low musical polarity, despite having a moderately high lyrical polarity.

Augmenting the dictionary

qdap’s polarity score depends on the “dictionary” of positive and negative words used. The default dictionary is a fairly generic set of words and it isn’t necessarily suitable for all contexts. For instance, if you wanted to analyse text from tweets, you’d want to augment the default dictionary so that it included internet slang. Are there words that appear frequently in the lyrics that aren’t included in qdap’s default dictionary?

To answer this I counted how many times each word appeared altogether in the songs’ lyrics, then used dplyr’s anti_join function to identify which ones were missing from qdap’s default dictionary and ordered the result by frequency. There were a few interesting omissions:

There were 167 occurrences of the word “down”, which appeared in various negative phrases such as “looked down on”, “face down on the floor” and “let you down”
There were 59 occurrences of “alright” — and since feeling alright is reasonably positive given the genre, this could be considered a positive word in this context

After augmenting the dictionary so that “down” was considered to be a negative word and “alright” was considered positive, the top 5 songs by overall polarity were as follows:

Top 2: 1. Everything is Alright by Motion City Soundtrack. 2. Alive With the Glory of Love by Say Anything.

Everything is Alright takes the top spot, which isn’t hugely surprising! This demonstrates a limitation of the polarity calculation — the song is about struggling with anxiety and OCD, when everything is certainly not alright despite what you tell people, showing qdap’s inability to detect sarcasm. (This also makes me think that Motion City Soundtrack and My Chemical Romance have very different approaches to talking about their problems.)

Meanwhile the bottom 5 remained mostly unchanged:

Overall I think the calculation based on the default dictionary more accurately reflects the songs’ actual positivity. Perhaps some tweaking of parameters is needed so that these words don’t have quite as much impact.

Future ideas

This is the currently the extent of my analysis but there are a few avenues I’d like to explore in the future:

More songs — I only considered a small selection of fairly mainstream emo pop songs here, but emo and its related subgenres are still thriving (I’m personally a big fan of Pinegrove, The Hotelier and Foxing, to name but a few). It would be fun to consider a wider definition of emo and less well-known songs in a future iteration of this project.
A more accurate positivity metric — as mentioned above, the combined musical and lyrical positivity metric I eventually used was fairly arbitrary. It would be great to do some further reading into how people are influenced by musical and lyrical factors to come up with a more reliable metric.

What do you think is the happiest emo song? Let me know in the responses. I’d also love to hear your thoughts and feedback on this article, and if you liked it, please share it!