brain of mat kelcey...
measuring baseline random performance for an N way classifier
April 11, 2020 at 12:34 PM | categories: short_tute, three_strikes_rulethis post is part of my three-strikes-rule series; the third time someone asks me about something, i have to write it up
>>> import numpy as np
>>> from sklearn.metrics import *
consider an 5 way classifier with varying level of support per class; specifically 100 examples of class0, class1 and 20 examples of class2, 3 and 4.
>>> training_data_support = [100, 100, 20, 20, 20]
what's the simplest way to measure what baseline performance is from a random classifier is? We often want to know this value to ensure we don't have silly bugs and/or we are getting some signal from the data beyond random choice.
firstly let's expand things to a dense set of labels (since sklearn metrics don't work with a sparse set)
>>> y_true = np.concatenate([np.repeat(i, n) for i, n in enumerate(training_data_support)])
>>> y_true
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
we can do random predictions proportional to the support in the training data
>>> training_data_proportions = training_data_support / np.sum(training_data_support)
>>> y_pred = np.random.choice(range(len(training_data_support)),
>>> p=training_data_proportions,
>>> size=sum(training_data_support))
>>> y_pred
array([1, 0, 0, 1, 1, 1, 0, 3, 1, 1, 3, 1, 1, 1, 2, 0, 1, 4, 1, 1, 1, 1,
0, 2, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 4, 2, 1, 4, 0, 2, 1, 0, 1,
1, 0, 1, 4, 0, 2, 0, 0, 1, 1, 0, 1, 2, 4, 3, 0, 1, 1, 2, 1, 2, 3,
0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 0, 1, 3, 3, 1, 1, 1, 3, 1, 0, 1,
0, 0, 1, 0, 1, 1, 4, 1, 3, 3, 1, 1, 1, 0, 0, 1, 0, 2, 1, 0, 1, 0,
4, 0, 2, 0, 3, 1, 0, 1, 1, 2, 1, 1, 1, 3, 0, 2, 0, 0, 0, 1, 1, 0,
2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 1, 1, 4, 1, 0, 3, 2, 0, 2, 0,
1, 3, 4, 1, 2, 0, 1, 0, 0, 1, 4, 0, 1, 3, 4, 1, 0, 1, 0, 1, 1, 4,
0, 0, 3, 0, 1, 1, 1, 2, 2, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0,
0, 4, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 3, 1, 1, 2, 1, 3,
1, 1, 0, 1, 3, 0, 4, 3, 0, 2, 0, 0, 3, 4, 0, 1, 0, 4, 2, 1, 1, 1,
0, 1, 0, 1, 2, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 1])
from this we can calculate standard metrics; if we can't beat these, we've done something realllllly wrong :/
>>> confusion_matrix(y_true, y_pred)
array([[29, 47, 9, 9, 6],
[40, 36, 11, 6, 7],
[ 7, 10, 1, 2, 0],
[ 7, 5, 2, 3, 3],
[ 7, 10, 2, 1, 0]])
>>> print(classification_report(y_true, y_pred))
precision recall f1-score support
0 0.32 0.29 0.31 100
1 0.33 0.36 0.35 100
2 0.04 0.05 0.04 20
3 0.14 0.15 0.15 20
4 0.00 0.00 0.00 20
accuracy 0.27 260
macro avg 0.17 0.17 0.17 260
weighted avg 0.27 0.27 0.27 260