brain of mat kelcey...

measuring baseline random performance for an N way classifier

April 11, 2020 at 12:34 PM | categories: short_tute, three_strikes_rule

this post is part of my three-strikes-rule series; the third time someone asks me about something, i have to write it up

>>> import numpy as np
>>> from sklearn.metrics import *


consider an 5 way classifier with varying level of support per class; specifically 100 examples of class0, class1 and 20 examples of class2, 3 and 4.

>>> training_data_support = [100, 100, 20, 20, 20]


what's the simplest way to measure what baseline performance is from a random classifier is? We often want to know this value to ensure we don't have silly bugs and/or we are getting some signal from the data beyond random choice.

firstly let's expand things to a dense set of labels (since sklearn metrics don't work with a sparse set)

>>> y_true = np.concatenate([np.repeat(i, n) for i, n in enumerate(training_data_support)])
>>> y_true

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])



we can do random predictions proportional to the support in the training data

>>> training_data_proportions = training_data_support / np.sum(training_data_support)
>>> y_pred = np.random.choice(range(len(training_data_support)),
>>>                           p=training_data_proportions,
>>>                           size=sum(training_data_support))
>>> y_pred

array([1, 0, 0, 1, 1, 1, 0, 3, 1, 1, 3, 1, 1, 1, 2, 0, 1, 4, 1, 1, 1, 1,
0, 2, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 4, 2, 1, 4, 0, 2, 1, 0, 1,
1, 0, 1, 4, 0, 2, 0, 0, 1, 1, 0, 1, 2, 4, 3, 0, 1, 1, 2, 1, 2, 3,
0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 0, 1, 3, 3, 1, 1, 1, 3, 1, 0, 1,
0, 0, 1, 0, 1, 1, 4, 1, 3, 3, 1, 1, 1, 0, 0, 1, 0, 2, 1, 0, 1, 0,
4, 0, 2, 0, 3, 1, 0, 1, 1, 2, 1, 1, 1, 3, 0, 2, 0, 0, 0, 1, 1, 0,
2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 1, 1, 4, 1, 0, 3, 2, 0, 2, 0,
1, 3, 4, 1, 2, 0, 1, 0, 0, 1, 4, 0, 1, 3, 4, 1, 0, 1, 0, 1, 1, 4,
0, 0, 3, 0, 1, 1, 1, 2, 2, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0,
0, 4, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 3, 1, 1, 2, 1, 3,
1, 1, 0, 1, 3, 0, 4, 3, 0, 2, 0, 0, 3, 4, 0, 1, 0, 4, 2, 1, 1, 1,
0, 1, 0, 1, 2, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 1])



from this we can calculate standard metrics; if we can't beat these, we've done something realllllly wrong :/

>>> confusion_matrix(y_true, y_pred)

array([[29, 47,  9,  9,  6],
[40, 36, 11,  6,  7],
[ 7, 10,  1,  2,  0],
[ 7,  5,  2,  3,  3],
[ 7, 10,  2,  1,  0]])

>>> print(classification_report(y_true, y_pred))

precision    recall  f1-score   support
0       0.32      0.29      0.31       100
1       0.33      0.36      0.35       100
2       0.04      0.05      0.04        20
3       0.14      0.15      0.15        20
4       0.00      0.00      0.00        20
accuracy                           0.27       260
macro avg       0.17      0.17      0.17       260
weighted avg       0.27      0.27      0.27       260