this week i'm between jobs so i have (a little) more time than usual to hack.
i've got a list of pending things to do but can't decide what to do next, here's my list in (sort of) priority order...
- fix up my numerical underflow / overflow problems in my recent semi supervised classification project.
- work through the exerecises from the first few chapters to introductory statistics with r and all of statistics. i'm particularly keen to write a intro stats blog post about statistical signifigance.
- do this mongdb tute i found; shouldn't take too long.
- do a weka screencast. i did some little talks at work lately about weka and they seemed to be interesting enough to others that it might be worth doing a screencast on it.
- do some work on modelling of periodic functions. seemed like trending topics is an interesting area at the moment and this would be a good chance to learn some more about R. fourier series look like a potential solution. there is also some interesting stuff to do in this area around majority evaluation from a stream of data.
- finish my work on detecting resemblance with hadoop. something that's been hanging over my head for about 2 years is the first piece of work i did that led me onto hadoop. i've had a long running project on resemblance that ended up with me writing a map/reduce framework in erlang (until i (re)discovered hadoop).
- revisit mahout, it's looking a bit more polished nowadays.
- redo and finish my project on latent semantic analysis; need to include some comparison work with probabilistic latent semantic analysis and latent dirichlet allocation (which is close to winning the scariest-formulas-on-a-wikipedia-page award)
- finish my twitter classifier; haven't work on it since lists were introduced and i think they would be an interesting addition to the algorithm.