index semi supervised naive bayes >>
firstly what is a semi supervised algorithm?
a supervised learning algorithm is one where all of the training examples are labelled. in this case we are trying to predict a feature of a new example.
eg. given catergorised training data
height | colour | class |
10cm | red | widget |
15cm | blue | widget |
10cm | blue | foobar |
what might we predict the class of a new item to be?
height | colour | class |
15cm | green | ??? |
an unsupervised learning algorithm is one where none of our training examples are labelled. in this case we are just looking to find patterns in the data.
clustering is a classic example of an unsupervised algorithm.
a semi supervised learning algorithm is one where, wait for it, some of the training examples are labelled.
a good example of this is categorising text documents. we might have a large existing corpus of documents but can only afford to label some of the data. it is worth trying to use the entire corpus though since it might contain some inherent structure.
let's have a look at a semi supervised version of naive bayes
february two thousand and ten