Creating a Song Recommendation System with Streamlit and Heroku
Written on
Chapter 1: Introduction
Music enthusiasts like us often yearn for new tracks that align with our individual tastes. Whether it’s the dynamic beats of hip-hop, the lively rhythms of K-pop, or the soothing sounds of progressive jazz, a song recommendation system can enhance our listening experience by suggesting songs we might enjoy.
With the wealth of music data available and advancements in deep learning, we can create a straightforward song recommendation system that aligns with our musical preferences. These preferences include factors such as music genre, release year range, and various audio characteristics like energy, instrumentalness, and acousticness. In this guide, we will build this recommendation engine using Streamlit, the k-Nearest Neighbors (k-NN) model with Scikit-learn, and deploy our application on Heroku.
To see the application we will develop, check out the final version here: song-recommendation-streamlit.herokuapp.com.
Chapter 2: Dataset Acquisition
Before we dive into the application development, we need to gather a music dataset. We will utilize the Spotify and Genius Track Dataset sourced from Kaggle. This dataset encompasses detailed information on thousands of albums, artists, and songs obtained from Spotify's API. Moreover, it includes various audio features and song lyrics.
Dataset Overview
The dataset is divided into three main CSV files:
- spotify_artists.csv: Contains genre information for each artist.
- spotify_albums.csv: Contains the release dates for the albums.
- spotify_tracks.csv: Includes audio features for each song.
Data Preparation
Our data preprocessing goal is to merge these datasets, ensuring each song has its corresponding genre, release year, and audio features. This consolidated dataset will serve as input for our recommendation system.
To start, we will load the three datasets using Pandas.
# Load the datasets
import pandas as pd
artist_df = pd.read_csv("spotify_artists.csv")
album_df = pd.read_csv("spotify_albums.csv")
track_df = pd.read_csv("spotify_tracks.csv")
Next, we will join the album release dates and artist genre information with the track data.
def join_genre_and_date(artist_df, album_df, track_df):
# Data joining and processing code...
We will also filter the data to include only songs released after 1990 to streamline our dataset, ensuring quicker load times during application operation.
# Filtering songs after 1990
track = track[track.release_year >= 1990]
Filtering the Dataset
To further refine our dataset, we can filter by specific genres.
def get_filtered_track_df(df, genres_to_include):
# Filtering code...
Now that we have our filtered dataset, we can save it for later use:
filtered_track_df.to_csv("filtered_track_df.csv", index=False)
Chapter 3: Developing the Main Application
Having preprocessed our dataset, we can now create the main application using Streamlit, a Python framework designed for building web applications for Machine Learning and Data Science.
Installing Required Libraries
Before starting, ensure you have the necessary libraries installed:
pip install streamlit pandas plotly scikit-learn
Next, we’ll set up our main application file named app.py and import the libraries we need:
import streamlit as st
import pandas as pd
from sklearn.neighbors import NearestNeighbors
import plotly.express as px
Loading Data
We will implement a function to load our preprocessed data with caching for enhanced performance:
@st.cache(allow_output_mutation=True)
def load_data():
df = pd.read_csv("data/filtered_track_df.csv")
return df.explode("genres")
Building the k-NN Model
With the data loaded, we can develop our machine learning model using k-NN to recommend songs based on user input.
def n_neighbors_uri_audio(genre, start_year, end_year, test_feat):
# k-NN model code...
Application Layout
Now we can design the user interface. We’ll create a dashboard that allows users to select their genre and various audio features.
st.title("Song Recommendation Engine")
Song Recommendation
To generate song recommendations, we will implement a "Recommend More Songs" feature using Streamlit's session state functionality, which allows us to maintain state across interactions.
if 'previous_inputs' not in st.session_state:
st.session_state['previous_inputs'] = [genre, start_year, end_year] + test_feat
Displaying Recommendations
Finally, we can display the recommended songs along with an option to view more. This will enhance user interaction and allow exploration of various tracks.
if st.button("Recommend More Songs"):
# Code to display more songs...
Chapter 4: Deployment on Heroku
Before wrapping up, let’s explore how to deploy our application on Heroku.
Preparing for Deployment
We need to create several essential files for deployment:
- requirements.txt: Lists all dependencies.
- setup.sh: Sets up the app on Heroku.
- Procfile: Specifies the entry point for Heroku.
Deployment Steps
After creating a GitHub repository and pushing your code, you can deploy with the following commands:
heroku login
heroku create [app_name]
git push heroku main
Congratulations! Your app should now be live at [app_name].herokuapp.com.
Chapter 5: Conclusion
In this tutorial, we constructed a song recommendation engine capable of matching user preferences to songs. We utilized the k-NN model, built a web app with Streamlit, and deployed it on Heroku.
Feel free to experiment with the recommendation system and share your feedback! For further insights, consider exploring some of my other posts.
References
- Streamlit Website
- Scikit-learn Documentation
- Heroku Documentation
- Pandas Documentation
- Spotify Developer API
- Spotify and Genius Track Dataset, Kaggle