semi supervised algorithms

firstly what is a semi supervised algorithm?

supervised learning

a supervised learning algorithm is one where all of the training examples are labelled.
in this case we are trying to predict a feature of a new example.

eg. given catergorised training data

height colour class

10cm red widget
15cm blue widget
10cm blue foobar

height	colour	class
10cm	red	widget
15cm	blue	widget
10cm	blue	foobar

what might we predict the class of a new item to be?

height colour class

15cm green ???

height	colour	class
15cm	green	???

unsupervised learning

an unsupervised learning algorithm is one where none of our training examples are labelled.
in this case we are just looking to find patterns in the data.

clustering is a classic example of an unsupervised algorithm.

semi supervised learning

a semi supervised learning algorithm is one where, wait for it, some of the training examples are labelled.

a good example of this is categorising text documents.
we might have a large existing corpus of documents but can only afford to label some of the data.
it is worth trying to use the entire corpus though since it might contain some inherent structure.

let's have a look at a semi supervised version of naive bayes

february two thousand and ten