Creating a Song Recommendation System with Streamlit and Heroku

Chapter 1: Introduction

Music enthusiasts like us often yearn for new tracks that align with our individual tastes. Whether it’s the dynamic beats of hip-hop, the lively rhythms of K-pop, or the soothing sounds of progressive jazz, a song recommendation system can enhance our listening experience by suggesting songs we might enjoy.

With the wealth of music data available and advancements in deep learning, we can create a straightforward song recommendation system that aligns with our musical preferences. These preferences include factors such as music genre, release year range, and various audio characteristics like energy, instrumentalness, and acousticness. In this guide, we will build this recommendation engine using Streamlit, the k-Nearest Neighbors (k-NN) model with Scikit-learn, and deploy our application on Heroku.

To see the application we will develop, check out the final version here: song-recommendation-streamlit.herokuapp.com.

Chapter 2: Dataset Acquisition

Before we dive into the application development, we need to gather a music dataset. We will utilize the Spotify and Genius Track Dataset sourced from Kaggle. This dataset encompasses detailed information on thousands of albums, artists, and songs obtained from Spotify's API. Moreover, it includes various audio features and song lyrics.

Dataset Overview

The dataset is divided into three main CSV files:

spotify_artists.csv: Contains genre information for each artist.
spotify_albums.csv: Contains the release dates for the albums.
spotify_tracks.csv: Includes audio features for each song.

Data Preparation

Our data preprocessing goal is to merge these datasets, ensuring each song has its corresponding genre, release year, and audio features. This consolidated dataset will serve as input for our recommendation system.

To start, we will load the three datasets using Pandas.

# Load the datasets

import pandas as pd

artist_df = pd.read_csv("spotify_artists.csv")

album_df = pd.read_csv("spotify_albums.csv")

track_df = pd.read_csv("spotify_tracks.csv")

Next, we will join the album release dates and artist genre information with the track data.

def join_genre_and_date(artist_df, album_df, track_df):

# Data joining and processing code...

We will also filter the data to include only songs released after 1990 to streamline our dataset, ensuring quicker load times during application operation.

# Filtering songs after 1990

track = track[track.release_year >= 1990]

Filtering the Dataset

To further refine our dataset, we can filter by specific genres.

def get_filtered_track_df(df, genres_to_include):

# Filtering code...

Now that we have our filtered dataset, we can save it for later use:

filtered_track_df.to_csv("filtered_track_df.csv", index=False)

Chapter 3: Developing the Main Application

Having preprocessed our dataset, we can now create the main application using Streamlit, a Python framework designed for building web applications for Machine Learning and Data Science.

Installing Required Libraries

Before starting, ensure you have the necessary libraries installed:

pip install streamlit pandas plotly scikit-learn

Next, we’ll set up our main application file named app.py and import the libraries we need:

import streamlit as st

import pandas as pd

from sklearn.neighbors import NearestNeighbors

import plotly.express as px

Loading Data

We will implement a function to load our preprocessed data with caching for enhanced performance:

@st.cache(allow_output_mutation=True)

def load_data():

df = pd.read_csv("data/filtered_track_df.csv")

return df.explode("genres")

Building the k-NN Model

With the data loaded, we can develop our machine learning model using k-NN to recommend songs based on user input.

def n_neighbors_uri_audio(genre, start_year, end_year, test_feat):

# k-NN model code...

Application Layout

Now we can design the user interface. We’ll create a dashboard that allows users to select their genre and various audio features.

st.title("Song Recommendation Engine")

Song Recommendation

To generate song recommendations, we will implement a "Recommend More Songs" feature using Streamlit's session state functionality, which allows us to maintain state across interactions.

if 'previous_inputs' not in st.session_state:

st.session_state['previous_inputs'] = [genre, start_year, end_year] + test_feat

Displaying Recommendations

Finally, we can display the recommended songs along with an option to view more. This will enhance user interaction and allow exploration of various tracks.

if st.button("Recommend More Songs"):

# Code to display more songs...

Chapter 4: Deployment on Heroku

Before wrapping up, let’s explore how to deploy our application on Heroku.

Preparing for Deployment

We need to create several essential files for deployment:

requirements.txt: Lists all dependencies.
setup.sh: Sets up the app on Heroku.
Procfile: Specifies the entry point for Heroku.

Deployment Steps

After creating a GitHub repository and pushing your code, you can deploy with the following commands:

heroku login

heroku create [app_name]

git push heroku main

Congratulations! Your app should now be live at [app_name].herokuapp.com.

Chapter 5: Conclusion

In this tutorial, we constructed a song recommendation engine capable of matching user preferences to songs. We utilized the k-NN model, built a web app with Streamlit, and deployed it on Heroku.

Feel free to experiment with the recommendation system and share your feedback! For further insights, consider exploring some of my other posts.

References

Streamlit Website
Scikit-learn Documentation
Heroku Documentation
Pandas Documentation
Spotify Developer API
Spotify and Genius Track Dataset, Kaggle

charmingcompanions.com