brain of mat kelcey


deriving class_weights from validation data (#three_strikes_rule)

March 03, 2020

this post is part of my three-strikes-rule series; the third time someone asks me about something, i have to write it up

when training a model we often want to weight instances differently to reflect either

  1. an imbalance in the amount of data per class or
  2. an imbalance in the difficulty across classes

the required weights for case 1) can be done by checking ratios in training data; case 2) though requires checking how a model does against validation data.

this post walks through a simple example of 2)

>>> import numpy as np
>>> from sklearn.metrics import log_loss

consider a 4-way classification problem with 10 examples

>>> K = 4
>>> y_true = [3,1,1,0,2,0,1,2,3,3]
>>> N = len(y_true)

pretend the model produces the following predictions. we can see that the model does well for classes 2 & 3, but 0 & 1 are confused a bit.

>>>                                           # y_true
>>> y_pred = np.array([[0.0, 0.1, 0.1, 0.8],  # 3
>>>                    [0.4, 0.5, 0.0, 0.1],  # 1
>>>                    [0.5, 0.3, 0.1, 0.1],  # 1
>>>                    [0.4, 0.3, 0.1, 0.2],  # 0
>>>                    [0.1, 0.1, 0.8, 0.0],  # 2
>>>                    [0.7, 0.1, 0.1, 0.1],  # 0
>>>                    [0.3, 0.6, 0.1, 0.0],  # 1
>>>                    [0.0, 0.1, 0.9, 0.0],  # 2
>>>                    [0.1, 0.0, 0.0, 0.9],  # 3
>>>                    [0.0, 0.1, 0.1, 0.8]]) # 3

we can use the sklearn api to calculate the per class loss. for a softmax based classifier log_loss is what we want (since it's what is being directly optimised).( though the formulation is simple it's worth using a lib to avoid weird numerical instabilities. note: though for a reasonable number of classes might need to set a higher epsilon in the log_loss call )

as expected we can see the loss for confused classes 0 & 1 are higher than classes 2 & 3.

( note: sklearn log_loss doesn't work on a sparse version of y_true so we need to convert to a one_hot representation for the call )

>>> y_true_one_hot = np.zeros((N, K))
>>> y_true_one_hot[np.arange(N), y_true] = 1.0
>>>
>>> losses = []
>>> for clazz in range(K):
>>>     losses.append(log_loss(y_true=y_true_one_hot[:, clazz],
>>>                            y_pred=y_pred[:, clazz]))
>>>
>>> losses

[0.3044334455393211,
     0.3291423130879737,
     0.09606671609189957,
     0.10908727165739374]

next we need to convert these to values suitable for class_weights. one way to do this is with a smoothing (via an exponentiation) followed by a normalisation.

as expected, each smoothing value puts more class weights on the confused classes 0 & 1.

the degenerate smoothing=0 case results in a uniform weighting as expected.

>>> def class_weights_for(losses, smoothing):
>>>   # smoothing -> 0.0 => uniform class weights; acts as a noop, i.e. class weights are all 1.0
>>>   # smoothing -> 1.0 => uses log loss value for class weights; probably too destructive ...
>>>   smoothed_losses = np.power(losses, smoothing)           # smooth values
>>>   normalised = smoothed_losses / np.sum(smoothed_losses)
>>>   return normalised * len(normalised)                     # rescale all to (0, 1)
>>>
>>> for smoothing in [0, 0.001, 0.01, 0.1, 1.0]:
>>>   print("smoothing=%0.3f => class_weights=%s" % (smoothing, class_weights_for(losses, smoothing)))

smoothing=0.000 => class_weights=[1. 1. 1. 1.]
    smoothing=0.001 => class_weights=[1.0005254  1.00060348 0.99937205 0.99949908]
    smoothing=0.010 => class_weights=[1.00525187 1.00603665 0.9937238  0.99498768]
    smoothing=0.100 => class_weights=[1.05225581 1.0604995  0.93762546 0.94961924]
    smoothing=1.000 => class_weights=[1.45187861 1.56971809 0.45815338 0.52024992]