prototyping in ruby was a great way to prove the concept but my main motivation for this project was to play some more with pig.
the main approach will be
- maintain a relation with one record per ngram we want to monitoring for trending
- fold 1 hours worth of new data at a time into the model
- check the entries for the latest hour for any trends
the full version is on github. read on for a line by line walkthrough