index    semi supervised naive bayes  >>

semi supervised algorithms

firstly what is a semi supervised algorithm?

supervised learning

a supervised learning algorithm is one where all of the training examples are labelled.
in this case we are trying to predict a feature of a new example.

eg. given catergorised training data
heightcolourclass
10cmredwidget
15cmbluewidget
10cmbluefoobar

what might we predict the class of a new item to be?
heightcolourclass
15cmgreen???

unsupervised learning

an unsupervised learning algorithm is one where none of our training examples are labelled.
in this case we are just looking to find patterns in the data.

clustering is a classic example of an unsupervised algorithm.

semi supervised learning

a semi supervised learning algorithm is one where, wait for it, some of the training examples are labelled.

a good example of this is categorising text documents.
we might have a large existing corpus of documents but can only afford to label some of the data.
it is worth trying to use the entire corpus though since it might contain some inherent structure.

let's have a look at a semi supervised version of naive bayes

february two thousand and ten