<< distributing index ruby implementation >>
i've built two scripts to help in generating test data
the first generates numbers between two values with a specific median value, it's cryptically called generate_test_data.rb
bash> ./generate_test_data.rb min_value median_value max_value number_of_values (optional_seed)
the second spreads values from stdin evenly over a number of files, it's even more cryptically called spread_across_files.rb
bash> ./spread_across_files.rb file_prefix number_of_files
bash> ./generate_test_data.rb 200 275 300 5e6 | ./spread_across_files.rb num 4generates files num.0, num.1, num.2 and num.3, each with about 1,250,000 numbers with values between 200 and 300 and an overall median of 275
to be honest though spread_across_files is a bit of reinventing the wheel, split is just as good.
<< distributing index ruby implementation >>
nov 2008