# simple supervised learning

## part 4: should i read it? multinomial bayes

### what is it?

multi nominal bayes is a variation of naive bayes that considers not the frequency of articles of a class but the frequencies of the words in a class

### example revisited

let's revisit our test data from the last experiment, with some slight variatons
 text feed should read it? linux on the linux the register yes (rule 1) cat on ferrari autoblog no on the hollywood perezhilton no on lamborghini on cat autoblog yes (rule 2) hollywood cat perezhilton no the lamborghini perezhilton yes (rule 2) cat on linux cat the register yes (rule 1)

considering just a few words this breaks down to

 word totaloccurences number in should read number inshould ignore on 6 4 2 linux 3 3 0 the 3 2 1 hollywood 2 0 2 9 5

from this table we have some various word / class related probabilities including...

P('on' | read) = 4/9 = 44%
P('linux' | ignore) = 0/5 = 0%

we can use a multinomial distribution to determine the probability for a given test article ### should we read 'linux the linux'

so for test article, 'linux the linux', we have probabiltiy of should read which is 7.5%

and for the same article, 'linux the linux', we have probabiltiy of should ignore being which is strictly 0% but using a laplace estimator (as seen in last experiment) we have ### run against the big data set so how does this algorithm run against the 13,500 articles we have for theregister, perezhilton and autoblog then?
whereas naive bayes did worse than the simple word occurences, the multinomial bayes kicks ass!

the graph to the left shows the accuracy of the three classification algorithms we discussed so far
(thick lines denote the median performance of the algorithm over a number of runs
crosses denote a cross validation run)

well i've had enough of bayes, lets try a classifier based on markov chains!