usually brute force, ie try every combo, is cause for a heartattack but it's actually not so bad in this case... there are only 16 folders to consider on the backup and each is either included or it's not gives a total of 216 or 65,536 combos to try. well it's 216-1 really since the case of nothing on the disk is not that exciting
anyways, here's an impl of a brute_force.rb in ruby and even though ruby is known for it's lack of performance it still gets the global optimum in under 1.5 seconds on my machine
it makes use of the binary representation of a number to denote whether a folder is on the disk or not eg for 3 items or varying sizes...
item | size |
1 | 4 |
2 | 6 |
3 | 3 |
...we can try all 23 combos
n | binary | includeitem | size |
0 | 000 | { } | 0 |
1 | 001 | { 1 } | 4 |
2 | 010 | { 2 } | 6 |
3 | 011 | { 2 1 } | 6+4=10 |
4 | 100 | { 3 } | 3 |
5 | 101 | { 3 1 } | 3+4=7 |
6 | 110 | { 3 2 } | 3+6=9 |
7 | 111 | { 3 2 1 } | 3+6+4=13 |
but if we're going to resort to brute force we might as well add a little bit of elegance...
consider the order we try the combos and what calculations are actually done
n | binary | includeitem | description |
0 | 000 | { } | empty the disk and add nothing |
1 | 001 | { 1 } | empty the disk and add item 1 |
2 | 010 | { 2 } | empty the disk and add item 2 |
3 | 011 | { 2 1 } | empty the disk and add item 2 and item 1 |
4 | 100 | { 3 } | empty the disk and add item 3 |
5 | 101 | { 3 1 } | empty the disk and add item 3 and item 1 |
6 | 110 | { 3 2 } | empty the disk and add item 3 and item 2 |
7 | 111 | { 3 2 1 } | empty the disk and add item 3 and item 2 and item 1 |
if we were doing this manually we'd be too lazy to test the combos like this; instead consider if we tried the combos in a slightly different order by changing only one bit at time (analogous to removing or adding just one item at time) this type of order is called a grey encoding and can be calculated easily using g(n)=n^(n>>1)
n | g(n) | binary | includeitem | size |
0 | 0 | 000 | { } | 0 |
1 | 1 | 001 | { 1 } | 4 |
2 | 3 | 011 | { 2 1 } | 6+4 = 10 |
3 | 2 | 010 | { 2 } | 6 |
4 | 6 | 110 | { 3 2 } | 3+6 = 9 |
5 | 7 | 111 | { 3 2 1 } | 3+6+4 = 13 |
6 | 5 | 101 | { 3 1 } | 3+4 = 7 |
7 | 4 | 100 | { 3 } | 3 |
using this order we can calculate the total size using a delta from the last size rather than doing the sum every time
g(n) | binary | includeitem | description | size |
0 | 000 | { } | empty the disk | 0 |
1 | 001 | { 1 } | add item 1 | 0+4 = 4 |
3 | 011 | { 2 1 } | add item 2 | 4+6 = 10 |
2 | 010 | { 2 } | take item 1 out | 10-4 = 6 |
6 | 110 | { 3 2 } | add item 3 | 6+3 = 9 |
7 | 111 | { 3 2 1 } | add item 1 | 9+4 = 13 |
5 | 101 | { 3 1 } | take item 2 out | 13-6 = 7 |
4 | 100 | { 3 } | take item 1 out | 7-4 = 3 |
this guarantee of only one add or subtract per combo really pays off when there are more items. here's some code for brute_force_2.rb using grey coding and an implementation in c too brute_force2.c
so brute force is awesome, until we add a few more folders...
# folders | # combos | brute_force.rbtime | brute_force2.rbtime | brute_force2.ctime |
16 | 216 = 65,536 | 1.5 sec | 0.5 sec | 0.01 sec |
20 | 220 = 1,048,576 | 25 sec | 5 sec | 0.1 sec |
30 | 230 = 1,073,741,824 | ? | 129 min | 2 min 19 sec |
40 | 240 = 1,099,511,627,776 | ? | ~ 3 months | ~ 1.5 days |
50 | 250 = 1,125,899,906,842,624 | ? | ? | ~~ 4.5 years |
yikes! it's easy to see then why algorithms that are exponential in the input size are not the best. in fact 270 folders is 1,897,137,590,064,188,545,819,787,018,382,342,682,267,975,428,761,855,001,222,473,056,385,648,716,020,711,424 combos more combos than the number of atoms in the universe (i've always wanted to say that)
let's try the other extreme of naive algorithms, the random walk!
Dec 2008