Posts Tagged ‘statistics’

e12.1 statistical synonyms

Saturday, January 23rd, 2010

i’ve had an idea brewing in my head for awhile now seeded by a great talk by peter norvig about statistically approaches to find patterns in data.

one thing he alludes to is the generation of synoyms based on n-gram models.

the basic intuition is this; if a corpus contains occurrences of the phrases ‘a x b’ and ‘a y b’ then to some degree x and y are synonymous.

the question becomes how do we calculate the strength of the relationship? how is it a function of the frequencies of a, b, x, y, ‘a x b’, ‘a y b’, ‘a ? b’ in the corpus. what else can we take into account?

simple statistics with R

Saturday, October 3rd, 2009

i’m learning a new statistics language called R and it’s pretty cool.

make a vector …

> c(3,1,4,1,5,9,2,6,5,3,5,8)
 [1] 3 1 4 1 5 9 2 6 5 3 5 8

turn it into a frequency table …

> table(c(3,1,4,1,5,9,2,6,5,3,5,8))
1 2 3 4 5 6 8 9
2 1 2 1 3 1 1 1

sort by frequency …

> sort(table(c(3,1,4,1,5,9,2,6,5,3,5,8)))
2 4 6 8 9 1 3 5
1 1 1 1 1 2 2 3

and plot!

> barplot(sort(table(c(3,1,4,1,5,9,2,6,5,3,5,8))))
Rplot

so simple!

do a degree via youtube

Thursday, October 1st, 2009

i’m amazed by how much great content is on youtube, how could you NOT learn something!?

13 x 1hr Statistical Aspects of Data Mining (Stats 202)

20 x 1hr Machine Learning