<< real data example (with normalisation)
index
conclusions
- definitely can find interesting patterns in data
- challenge is to determine what the patterns represent; no guarantee a feature describes a class of document
todos
- run feature data through some classifiers, weka even, to see how well it does
- try some different normalisation schemes; tried tf/idf but it didn't seem much different to the unnormalised version
all the code is on github
my other random stuff can be found at matpalm.com
<< real data example (with normalisation)
index