example 1: two clear features

here's another trivial example to make sure we know exactly what's going on

consider the set of, again, extremely contrived documents

d1: c a a b c b c
d2: a b c a b c c
d3: d e f f d
d4: f d e d f

which is represented as the 6x4 document term matrix.

d1 d2 d3 d4

a 2 2 0 0

b 2 2 0 0

c 3 3 0 0

d 0 0 2 2

e 0 0 1 1

f 0 0 2 2

once more we can see two clear clusterings of the documents; d1 with d2 and d3 with d4

singular value decomposition

the singular value decomposition is a matrix decomposition algorithm (ie it breaks a matrix up into a series of products of matrices)

in particular the SVD decomposes a matrix A into the produce of three specially formed matrices (mysteriously named) U, S and V

each of these three matrices represents a different interpretation of the original data.

here is a decomposition of A performed using SVDLIBC

A

	d1	d2	d3	d4
a	2	2	0	0
b	2	2	0	0
c	3	3	0	0
d	0	0	2	2
e	0	0	1	1
f	0	0	2	2

=

U

	f1	f2	f3	f4
a	0.48	0	0	0
b	0.48	0	0	0
c	0.72	0	0	0
d	0	-0.66	0	0
e	0	-0.33	0	0
f	0	-0.66	0	0

x

S

	f1	f2	f3	f4
f1	5.83	0	0	0
f2	0	4.24	0	0
f3	0	0	0	0
f4	0	0	0	0

x

Vt

	d1	d2	d3	d4
f1	0.70	0.38	0	0
f2	0	0	-0.70	-0.70
f3	0	0	0	0
f4	0	0	0	0

S describes the relative strengths of the features
U describes the relationship between terms (rows) and features (columns)
Vt describes the relationshop between features (rows) and documents (columns)
even though the decomposition is expressed in terms of V transpose we'll usually talk about V so that the features are the columns in both U and V

interpretation of S

the matrix S is always a diagonal matrix with non-negative decending values, formally known as the singular values

each non zero value represents a feature and in this example we have two. (there can't be more features than there are documents)
the magnitude of the values describes how much variance each feature describes in the data
for this example we can see that each feature is roughly the same strength; there are roughly the same number of documents for each feature

interpretation of VS

the matrix product VS describes the relation between documents (VS's rows) and the features (VS's columns)

plotting f1 vs f2 we can see the expected seperation of doc1 and doc2 from doc3 and doc4

f1 f2 f3 f4

d1 4.123 0.000 0.000 0.000

d2 4.123 0.000 0.000 0.000

d3 0.000 -3.000 0.000 0.000

d4 0.000 -3.000 0.000 0.000

interpretation of US

the matrix product US describes the relation between terms (US's rows) and the features (US's columns)

again we see what we expected; terms a, b & c (only present in the first two documents) are aligned with feature 1
whereas terms d, e & (only present in the second two documents) are aligned with feature 2
we can also see that c has a stronger association with feature 1 than a or b does denoting that c occured more frequently
likewise e has a weaker association with feature 2 than d or e does denoting that e occured less frequently

f1 f2 f3 f4

a 2.82 0 0 0

b 2.82 0 0 0

c 4.24 0 0 0

d 0 -2.82 0 0

e 0 -1.41 0 0

f 0 -2.82 0 0

let's now look at slighty more complex example

<< intro index example 2 >>

Aug 2009
me on twitter
me on google+

	f1	f2	f3	f4
d1	4.123	0.000	0.000	0.000
d2	4.123	0.000	0.000	0.000
d3	0.000	-3.000	0.000	0.000
d4	0.000	-3.000	0.000	0.000