brain of mat kelcey
deriving class_weights from validation data (#three_strikes_rule)
March 03, 2020
this post is part of my three-strikes-rule series; the third time someone asks me about something, i have to write it up
when training a model we often want to weight instances differently to reflect either
- an imbalance in the amount of data per class or
- an imbalance in the difficulty across classes
the required weights for case 1) can be done by checking ratios in training data; case 2) though requires checking how a model does against validation data.
this post walks through a simple example of 2)
>>> import numpy as np
>>> from sklearn.metrics import log_loss
consider a 4-way classification problem with 10 examples
>>> K = 4
>>> y_true = [3,1,1,0,2,0,1,2,3,3]
>>> N = len(y_true)
pretend the model produces the following predictions. we can see that the model does well for classes 2 & 3, but 0 & 1 are confused a bit.
>>> # y_true
>>> y_pred = np.array([[0.0, 0.1, 0.1, 0.8], # 3
>>> [0.4, 0.5, 0.0, 0.1], # 1
>>> [0.5, 0.3, 0.1, 0.1], # 1
>>> [0.4, 0.3, 0.1, 0.2], # 0
>>> [0.1, 0.1, 0.8, 0.0], # 2
>>> [0.7, 0.1, 0.1, 0.1], # 0
>>> [0.3, 0.6, 0.1, 0.0], # 1
>>> [0.0, 0.1, 0.9, 0.0], # 2
>>> [0.1, 0.0, 0.0, 0.9], # 3
>>> [0.0, 0.1, 0.1, 0.8]]) # 3
we can use the sklearn api to calculate the per class loss.
for a softmax based classifier log_loss
is what we want (since it's what is being directly
optimised).( though the formulation is simple it's worth using a lib to avoid weird
numerical instabilities. note: though for a reasonable number of classes might need to
set a higher epsilon in the log_loss
call )
as expected we can see the loss for confused classes 0 & 1 are higher than classes 2 & 3.
( note: sklearn log_loss
doesn't work on a sparse version of y_true
so we need to convert to a one_hot representation for the call )
>>> y_true_one_hot = np.zeros((N, K))
>>> y_true_one_hot[np.arange(N), y_true] = 1.0
>>>
>>> losses = []
>>> for clazz in range(K):
>>> losses.append(log_loss(y_true=y_true_one_hot[:, clazz],
>>> y_pred=y_pred[:, clazz]))
>>>
>>> losses
[0.3044334455393211,
0.3291423130879737,
0.09606671609189957,
0.10908727165739374]
next we need to convert these to values suitable for class_weights
.
one way to do this is with a smoothing (via an exponentiation) followed by a normalisation.
as expected, each smoothing value puts more class weights on the confused classes 0 & 1.
the degenerate smoothing=0
case results in a uniform weighting as expected.
>>> def class_weights_for(losses, smoothing):
>>> # smoothing -> 0.0 => uniform class weights; acts as a noop, i.e. class weights are all 1.0
>>> # smoothing -> 1.0 => uses log loss value for class weights; probably too destructive ...
>>> smoothed_losses = np.power(losses, smoothing) # smooth values
>>> normalised = smoothed_losses / np.sum(smoothed_losses)
>>> return normalised * len(normalised) # rescale all to (0, 1)
>>>
>>> for smoothing in [0, 0.001, 0.01, 0.1, 1.0]:
>>> print("smoothing=%0.3f => class_weights=%s" % (smoothing, class_weights_for(losses, smoothing)))
smoothing=0.000 => class_weights=[1. 1. 1. 1.]
smoothing=0.001 => class_weights=[1.0005254 1.00060348 0.99937205 0.99949908]
smoothing=0.010 => class_weights=[1.00525187 1.00603665 0.9937238 0.99498768]
smoothing=0.100 => class_weights=[1.05225581 1.0604995 0.93762546 0.94961924]
smoothing=1.000 => class_weights=[1.45187861 1.56971809 0.45815338 0.52024992]