Machine Learning in Music¶

Leo Fafoutis¶

CMSC422 Final Project¶

Intro and Motivation¶

When people think of how machine learning can be used in the field of music, most people think of music generation. While this seems to be the most obvious use of AI, it is not the only one. While we can use AI to create music from scratch, we can also make tools to aid us in the creation of our own music. By being able to classify music based on its features, we can use machine learning to create features based on a genre. This can help create a guide or template for musicians creating music or be used to aid in the education of music. However, the primary use of this tool would be to classify music in large music databases. For example, if someone were to upload their music to Spotify without a genre, we could use this model to classify that song and then recommend it to others.

Taking advantage of Support Vector Machines (SVMs), Decision Trees, and KNN, we can build a machine learning model that will help us classify music.

Music Dataset¶

The dataset we are using is from Kaggle. It is a Music Features dataset that uses things such as tempo or rolloff to classify music into separate genres. It contains 1000 entries with 28 features that classify music into blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock.

We can begin by importing the pandas library to work with our csv file. We can store our csv file in a dataframe and remove the 'filename' column with pandas .drop() function, since it will not be used as a feature.

In [1]:
import pandas as pd

# Using pandas read_csv, place the data into a pandas dataframe
df = pd.read_csv("data/data.csv")

# Remove filename from the dataframe, we have genre as a column already
df = df.drop(['filename'], axis=1)

df.head()
Out[1]:
tempo beats chroma_stft rmse spectral_centroid spectral_bandwidth rolloff zero_crossing_rate mfcc1 mfcc2 ... mfcc12 mfcc13 mfcc14 mfcc15 mfcc16 mfcc17 mfcc18 mfcc19 mfcc20 label
0 103.359375 50 0.380260 0.248262 2116.942959 1956.611056 4196.107960 0.127272 -26.929785 107.334008 ... 14.336612 -13.821769 7.562789 -6.181372 0.330165 -6.829571 0.965922 -7.570825 2.918987 blues
1 95.703125 44 0.306451 0.113475 1156.070496 1497.668176 2170.053545 0.058613 -233.860772 136.170239 ... -2.250578 3.959198 5.322555 0.812028 -1.107202 -4.556555 -2.436490 3.316913 -0.608485 blues
2 151.999081 75 0.253487 0.151571 1331.073970 1973.643437 2900.174130 0.042967 -221.802549 110.843070 ... -13.037723 -12.652228 -1.821905 -7.260097 -6.660252 -14.682694 -11.719264 -11.025216 -13.387260 blues
3 184.570312 91 0.269320 0.119072 1361.045467 1567.804596 2739.625101 0.069124 -207.208080 132.799175 ... -0.613248 0.384877 2.605128 -5.188924 -9.527455 -9.244394 -2.848274 -1.418707 -5.932607 blues
4 161.499023 74 0.391059 0.137728 1811.076084 2052.332563 3927.809582 0.075480 -145.434568 102.829023 ... 7.457218 -10.470444 -2.360483 -6.783623 2.671134 -4.760879 -0.949005 0.024832 -2.005315 blues

5 rows × 29 columns

Visualizing the Data Using PCA¶

Before we get into using any classification algorithms, we should first look at our data. We can use a dimensionality reduction method called Principal Component Analysis (PCA) to visualize our high dimentional dataset in 2 dimensions. Using matpotlib to graph our results, we can standarize our features and use sklearn's PCA to reduce the dimensionality of our data.

In [2]:
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Begin by separating the features X and class y
X = df.iloc[:, :-1]  # All features except the last column we store in X
y = df.iloc[:, -1]   # Last column will be the class

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Perform PCA for reduction into 2 dim
pca = PCA(n_components=2) 
X_pca = pca.fit_transform(X_scaled)

# Make new df_pca for our component and target variables
df_pca = pd.DataFrame({'PC1': X_pca[:, 0], 'PC2': X_pca[:, 1], 'Genre': y})

# Plot using matpotlib and color code the classes
plt.figure(figsize=(10, 8))
genres = df_pca['Genre'].unique()
colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k', 'lime', 'orange', 'purple']
for genre, color in zip(genres, colors):
    genre_data = df_pca[df_pca['Genre'] == genre]
    plt.scatter(genre_data['PC1'], genre_data['PC2'], color=color, label=genre)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA of Music Genres')
plt.legend()
plt.show()

From this graph, we can get a better idea of what models to use and how our data is distributed. We can see how after using PCA, jazz and blues music are closely related to each other and to classical music. On the other end we see how separate metal and pop music can be. It is important to note because our models may misclassify our results and it helps to understand why this might happen. For our case in music classification, we do not mind a few misclassifications since a piece of music can belong to multiple genres.

SVM¶

The first type of machine learning model we will use is the Support Vector Machine (SVM)%20are,classification%2C%20regression%20and%20outliers%20detection.) from scikit-learn. This model works best in high dimentional datasets and allows for multiclass classification through the use of one-versus-all/one-versus-one. To learn more about the advantages and disadvantages, click here. Our SVM will attempt to draw a hyperplane with a maximum margin, using support vectors, that best separates the data. Then when we introduce a new datapoint, we can project it onto our vector and determine which side of the plane it lies on.

We start by importing the necessary packages from sklearn. We import train_test_split to allow testing/training within the same dataset and classification_report to view our model's scores.

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Begin by separating the features X and class y
X = df.iloc[:, :-1]  # All features except the last column we store in X
y = df.iloc[:, -1]   # Last column will be the class

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Create the SVM from scikit
svm = SVC()

# Train the svm model
svm.fit(X_train, y_train)

# Predict the labels for the test set
y_pred = svm.predict(X_test)

print(classification_report(y_test, y_pred, zero_division=0))
              precision    recall  f1-score   support

       blues       0.19      0.15      0.17        20
   classical       0.40      0.95      0.57        20
     country       0.00      0.00      0.00        20
       disco       0.18      0.10      0.13        20
      hiphop       0.25      0.05      0.08        20
        jazz       0.00      0.00      0.00        20
       metal       0.25      0.85      0.39        20
         pop       0.54      0.75      0.63        20
      reggae       0.00      0.00      0.00        20
        rock       0.04      0.05      0.04        20

    accuracy                           0.29       200
   macro avg       0.18      0.29      0.20       200
weighted avg       0.18      0.29      0.20       200

We can then view our results in a graph using numpy and matplotlib.

In [4]:
import numpy as np

# Get the class report
report = classification_report(y_test, y_pred, output_dict=True, zero_division=1)

# Get the precision, recall, and f1-score from the report
classes = list(report.keys())[:-3]
metrics = ['precision', 'recall', 'f1-score']
scores = np.zeros((len(classes), len(metrics)))

# Go through each class and add the metrics to the scores
for i, clas in enumerate(classes):
    for j, metric in enumerate(metrics):
        scores[i, j] = report[clas][metric]

# Plot the bar graph
fig, ax = plt.subplots(figsize=(16, 9))
x = np.arange(len(classes))
width = 0.2

for j, metric in enumerate(metrics):
    ax.bar(x + j * width, scores[:, j], width, label=metric)

ax.set_ylabel('Score')
ax.set_title('SVM Results')
ax.set_xticks(x)
ax.set_xticklabels(classes)
ax.legend()

plt.show()

Now that we have made our first SVM model, let's take a look at some of the results. First let's better understand what each of these outputs mean. Precision is a way to measures the accuracy of true positive over both true and false positives. Recall is the way to measure the true positives over the true positives plus false negatives. These two metric are related and used to find the f1-score. Scikit-learn provides more detail here. By looking at the f1-score, we can see how our model performs.

Although the f1-score, recall, and precision depend on the type of model you are making, generally a higher f1-score will correlate to a better model. For our case, we see that our highest scores are classical and pop sitting around 0.6. However, this model is preforming very poorly for most other catergories. Let's see if there is a way we can improve our SVM to better classify our data.

SVM Improvements¶

There are a few improvements we can make to our model. First we will sklearn's StandardScaler. SVM works by maximizing the distance between the support vectors and the separating hyperplane. However, if one feature is much larger than the others, it will dominate the placement of the plane. This preprocessing tool allows us to standardize our dataset and helps SVM create a more accurate hyperplane by making sure all features are used equally. We can then use the Pipeline tool from sklearn to help streamline the process and make sure our results are clean and readable.

The second way we can improve our SVM is through the use of the GridSearchCV. When using most machine learning models, there are configuration options called hyperparameters that we need to set before training. In SVM, we tune the kernel, C, and gamma hyperparameters to increase our model's ourput. By taking advantage of GridSearchCV, we can find the best set of parameters to improve our SVM model.

In [5]:
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# Begin by separating the features X and class y
X = df.iloc[:, :-1]  # All features except the last column we store in X
y = df.iloc[:, -1]   # Last column will be the class

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Create a pipeline using the StandardScaler
pipeline = Pipeline([('scaler', StandardScaler()), ('svm', SVC())])

# Use the param_grid to tune the hyperparameter
param_grid = {
    'svm__C': [0.1, 1, 10],  # C parameter
    'svm__kernel': ['linear', 'rbf'],  # Kernel
}

# Use grid search with cross-validation
grid_search = GridSearchCV(pipeline, param_grid, scoring='f1_weighted', cv=5)
grid_search.fit(X_train, y_train)

# Get the best model from the grid_search
svm_model = grid_search.best_estimator_

# Predict
y_pred = svm_model.predict(X_test)

print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

       blues       0.87      0.65      0.74        20
   classical       0.89      0.85      0.87        20
     country       0.65      0.75      0.70        20
       disco       0.46      0.60      0.52        20
      hiphop       0.70      0.70      0.70        20
        jazz       0.64      0.70      0.67        20
       metal       0.86      0.95      0.90        20
         pop       0.88      0.70      0.78        20
      reggae       0.50      0.45      0.47        20
        rock       0.42      0.40      0.41        20

    accuracy                           0.68       200
   macro avg       0.69      0.68      0.68       200
weighted avg       0.69      0.68      0.68       200

In [6]:
# Get the class report
report = classification_report(y_test, y_pred, output_dict=True, zero_division=1)

# Get the precision, recall, and f1-score from the report
classes = list(report.keys())[:-3]
metrics = ['precision', 'recall', 'f1-score']
scores = np.zeros((len(classes), len(metrics)))

# Go through each class and add the metrics to the scores
for i, clas in enumerate(classes):
    for j, metric in enumerate(metrics):
        scores[i, j] = report[clas][metric]

# Plot the bar graph
fig, ax = plt.subplots(figsize=(16, 9))
x = np.arange(len(classes))
width = 0.2

for j, metric in enumerate(metrics):
    ax.bar(x + j * width, scores[:, j], width, label=metric)

ax.set_ylabel('Score')
ax.set_title('SVM Results')
ax.set_xticks(x)
ax.set_xticklabels(classes)
ax.legend()

plt.show()

From our improvements, we can see an overall increase in scores across all metrics. Our metric have nearly doubled with the lowest score being rock at 0.4. Our greatest classification is in metal and classical with scores of around 0.9!

Decision Trees¶

The next classification model we can use is a Decision Tree. Decision trees are machine learning models that form a tree data structure that splits its nodes on features. If we introduce a new datapoint to our model and want it to make a prediction, we go down the tree and end up at a leaf node which will reveal the class of the new datapoint. It is a fairly intuative model and is a good way to visualize how one machine learning algorithm can work.

Let's start by making a basic tree by using sklearns DecisionTreeClassifier. This time, let's save our results to a unique variable to compare with later.

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

# Begin by separating the features X and class y
X = df.iloc[:, :-1]  # All features except the last column we store in X
y = df.iloc[:, -1]   # Last column will be the class

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Create a decision tree classifier
decision_tree = DecisionTreeClassifier()

# Fit the classifier to the training set
decision_tree.fit(X_train, y_train)

# Predict
y_pred = decision_tree.predict(X_test)

# Evaluate the classifier
report_basic = classification_report(y_test, y_pred, output_dict=True)
print(report_basic)
{'blues': {'precision': 0.5238095238095238, 'recall': 0.55, 'f1-score': 0.5365853658536585, 'support': 20}, 'classical': {'precision': 0.9285714285714286, 'recall': 0.65, 'f1-score': 0.7647058823529412, 'support': 20}, 'country': {'precision': 0.36363636363636365, 'recall': 0.4, 'f1-score': 0.380952380952381, 'support': 20}, 'disco': {'precision': 0.42857142857142855, 'recall': 0.6, 'f1-score': 0.5, 'support': 20}, 'hiphop': {'precision': 0.4782608695652174, 'recall': 0.55, 'f1-score': 0.5116279069767442, 'support': 20}, 'jazz': {'precision': 0.5333333333333333, 'recall': 0.4, 'f1-score': 0.4571428571428572, 'support': 20}, 'metal': {'precision': 0.7272727272727273, 'recall': 0.8, 'f1-score': 0.761904761904762, 'support': 20}, 'pop': {'precision': 0.5333333333333333, 'recall': 0.4, 'f1-score': 0.4571428571428572, 'support': 20}, 'reggae': {'precision': 0.3888888888888889, 'recall': 0.35, 'f1-score': 0.36842105263157887, 'support': 20}, 'rock': {'precision': 0.36363636363636365, 'recall': 0.4, 'f1-score': 0.380952380952381, 'support': 20}, 'accuracy': 0.51, 'macro avg': {'precision': 0.5269314260618609, 'recall': 0.51, 'f1-score': 0.5119435445910161, 'support': 200}, 'weighted avg': {'precision': 0.5269314260618608, 'recall': 0.51, 'f1-score': 0.5119435445910161, 'support': 200}}

Now that we have made our decision tree, we can use the .plot_tree from sklearn to view our decision tree.

In [8]:
from sklearn import tree
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(24, 16))
tree.plot_tree(decision_tree, feature_names=X.columns, class_names=y.unique(), filled=True, ax=ax)
plt.show()

Although the tree itself is hard to read, we get a general idea of how our algorithm works and get a glimpse into the complexity that goes on in some of these models.

Decision Tree Improvements¶

Looking at our output before, we see our decision tree did alright at prediciton the class of music it was given. Decision trees are a great way to classify data, but they do have their disadvantages. For example, they are unstable and a slight change in the data can greatly change the output of the model. Decision trees can also easily overfit data by making an extremely complex tree. To combat this and improve our tree, let's change our structure of our tree. Using the GridSearchCV again, we can change our tree structure and feature selection to see what works best.

In [9]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for hyperparameter tuning
param_grid = {
    'max_depth': [None, 5, 10, 15],  # Max depth of tree
    'min_samples_split': [2, 5, 10],  # Min number of samples to split
    'min_samples_leaf': [1, 2, 4],  # Min number of samples at a leaf node
    'max_features': ['auto', 'sqrt', 'log2']  # Max number of features for the best split
}

# Create decision tree classifier
clf = DecisionTreeClassifier()

# Perform grid search to find the best hyperparameters
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(X, y)

# Get the best gridsearch params and model
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

# Fit the new model on the dataset
best_model.fit(X, y)

# Predict with new model
y_pred = best_model.predict(X_test)

# Save results to eval the classifier later
report_tuned = classification_report(y_test, y_pred, output_dict=True)
print(report_tuned)
{'blues': {'precision': 0.6785714285714286, 'recall': 0.95, 'f1-score': 0.7916666666666667, 'support': 20}, 'classical': {'precision': 0.9473684210526315, 'recall': 0.9, 'f1-score': 0.9230769230769231, 'support': 20}, 'country': {'precision': 0.7619047619047619, 'recall': 0.8, 'f1-score': 0.7804878048780488, 'support': 20}, 'disco': {'precision': 0.7894736842105263, 'recall': 0.75, 'f1-score': 0.7692307692307692, 'support': 20}, 'hiphop': {'precision': 0.8181818181818182, 'recall': 0.9, 'f1-score': 0.8571428571428572, 'support': 20}, 'jazz': {'precision': 1.0, 'recall': 0.8, 'f1-score': 0.888888888888889, 'support': 20}, 'metal': {'precision': 1.0, 'recall': 0.85, 'f1-score': 0.9189189189189189, 'support': 20}, 'pop': {'precision': 0.7391304347826086, 'recall': 0.85, 'f1-score': 0.7906976744186046, 'support': 20}, 'reggae': {'precision': 0.8888888888888888, 'recall': 0.8, 'f1-score': 0.8421052631578948, 'support': 20}, 'rock': {'precision': 0.9411764705882353, 'recall': 0.8, 'f1-score': 0.8648648648648648, 'support': 20}, 'accuracy': 0.84, 'macro avg': {'precision': 0.8564695908180899, 'recall': 0.8400000000000001, 'f1-score': 0.8427080631244437, 'support': 200}, 'weighted avg': {'precision': 0.8564695908180897, 'recall': 0.84, 'f1-score': 0.8427080631244438, 'support': 200}}

Now that we have created our new decision tree, we can compare the results and see if there were significant improvements between the basic and tuned models. To do this, we can create a bar plot using matplotlib and using the averages, compare the results.

In [10]:
# Define the metrics for report
metric_names = ['precision', 'recall', 'f1-score']

# Extract the values for the two trees
values_basic = [report_basic['macro avg'][metric] for metric in metric_names]
values_tuned = [report_tuned['macro avg'][metric] for metric in metric_names]
positions = range(len(metric_names))

# Create the bar plot
plt.figure(figsize=(10, 8))
plt.bar(positions, values_basic, width=0.4, label='Basic Model')
plt.bar([p + 0.4 for p in positions], values_tuned, width=0.4, label='Tuned Model')

plt.xticks([p + 0.2 for p in positions], metric_names)
plt.title('Basic Model vs. Tuned Model')
plt.xlabel('Metrics')
plt.ylabel('Values')
plt.legend()

plt.show()

We can see a massive improvement between our two models, with our basic model scoring an average of around 0.5 and our tuned scoring a 0.9 across all metrics! Although we can keep tuning our decision tree more and more, there is another machine learning model that might be useful is the Random Forest. Although this project will not cover Random Forests, you can learn more and try to make your own model by clicking here.

KNN¶

Now we can look at another learning algorithm called K-Nearest Neighbors KNN. KNN can be used to classify data by grouping our training examples together, then when a new datapoint is introduced, it looks for it's k (number) of nearest neighbors to determine the class. For example, if our k = 5, we would look at the 5 closest datapoints to our new datapoint to determine the class.

We can make our own KNN algorithm fairly easily using the sklearn KNeighborsClassifier.

In [11]:
from sklearn.neighbors import KNeighborsClassifier

# Split the dataset
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Make training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create the KNN classifier
k = 5  # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)

# Train the KNN
knn.fit(X_train, y_train)

# Predict
y_pred = knn.predict(X_test)

report = classification_report(y_test, y_pred)
print(report)
              precision    recall  f1-score   support

       blues       0.12      0.20      0.15        20
   classical       0.59      0.77      0.67        13
     country       0.28      0.26      0.27        27
       disco       0.28      0.33      0.30        21
      hiphop       0.08      0.07      0.07        15
        jazz       0.39      0.32      0.35        22
       metal       0.58      0.56      0.57        25
         pop       0.36      0.62      0.46        13
      reggae       0.44      0.35      0.39        23
        rock       0.00      0.00      0.00        21

    accuracy                           0.33       200
   macro avg       0.31      0.35      0.32       200
weighted avg       0.31      0.33      0.32       200

Now our KNN classifier is working, but the results are fairly poor. We see an average of around 0.32 for our f1-score.

Using what you learned from our previous two examples, try to make your own improvements to the KNN classifier and compare the results with the one above. Start by changing the number of neighbors (k) and go from there!

Conclusion and Future Work¶

We looked at two different machine learning models to help us classify music features into 10 different genres. First, we used dimentionality reduction tools (PCA) to visualize our data in a 2 dimentional space. Then, we saw how we could take advantage of the Support Vector Machine (SVM) in multi-class classification by using the one-versus-one/one-versus-all methods. We then created our model and improved upon it with standardization and hyperparameter tuning techniques. In the end, our SVM managed to achieve an average f1-score of around 0.68. Next, we looked at how decision trees are structured and created 2 separate models. By tuning our tree size and structure, we saw our model improve by nearly 80%! Finally, I asked the reader to improve our KNN classifier based on what we learned.

In the future, we can take a look at how other unsupervised learning models could be used to help classify data without using our labels. We could also use Neural Networks and create a complex set of hidden layers to understand the features. In terms of our dataset, we could begin adding more features that are not explicity in the audio file, like key signature or chord progression. We could also include non-western music from around the world and begin creating different layers of classification based on region as well. For example, we could take traditional Japanese Jazz and compare it to Jazz in the USA. (An example of difference in musical cultures).

I hope this project helped the reader better understand some machine learning algorithms, and if you want to learn more click here. Thank you for your time.

In [ ]: