<< multiple process erlang implementation index running on amazon ec2 >>
lets do some testing on a quad core box with 2gb ram. we'll generate 10 million numbers into a ramdisk (/dev/shm)
bash> ./generate_test_data.rb 0 31415 1e5 10e6 | ./spread_across_files.rb /dev/shm/numbers 4
first up is the simple ruby impl
bash> time cat /dev/shm/numbers.[0-3] | ./median.rb 2m 38sthis version goes memory nuts and quickly has consumed pretty much all resident memory it can get (just under 2gb) i wonder if this is usual ruby behaviour? it's not being a good citizen in terms of garbage collection!!! the process sits at cpu user / system / idle around 25 / 0 / 75 (ie a single core) once the swapping starts it's reduced to something a bit more erratic like 20 / 5 / 75
next is the multi process erlang impl that doesnt use frequency dictionaries
bash> time erl -sname bob -noshell -run controller init worker /dev/shm/numbers.[0-3] 47sit sits at about user / system / idle cpu usage of about 70 / 15 / 15 and is using much less ram; about 50mb resident and another 150mb or so for data.
let's have a crack at the frequency dictionary version it's an unfair comparison to the ruby one though since it uses a different data structure
bash> time erl -noshell -run generate_binary_dicts main /dev/shm/numbers.[0-3] 1m 16s bash> time erl -sname bob -noshell -run controller init worker_freq /dev/shm/numbers.[0-3].dict 2swhat's going on? though the processing using the dictionary is wildly fast, the parsing to generate the dictionary takes longer than the version that parses and processes the list
in fact just testing the parsing time we see quite a variation in something which only differs in the building of the dictionary. dictionaries in erlang must be slower than i thought!
mat@ubishop:~/dev/median$ erl -noshell -run parse_file test to_list /dev/shm/numbers.1e6 to_list time 8.098419 sec mat@ubishop:~/dev/median$ erl -noshell -run parse_file test to_dict /dev/shm/numbers.1e6 to_dict time 13.150428 sec
poking around we see it's the io for reading the file from disk in parse.erl
we can try changing from stream based file:io (bolded) ...
to_dict(Filename) -> {ok,File} = file:open(Filename,read), to_dict(File, dict:new(), next_line(File)). to_dict(File,Freqs,eof) -> file:close(File), Freqs; to_dict(File,Freqs,Line) -> Value = line_to_int(Line), to_dict(File, dict:update_counter(Value,1,Freqs), next_line(File)). next_line(File) -> io:get_line(File,''). line_to_int(Line) -> Line_without_cr = lists:sublist(Line, length(Line)-1), list_to_integer(Line_without_cr).
... to a binary parsing (bolded)
to_dict_b(InFilename) -> {ok,B} = file:read_file(InFilename), to_dict_b(InFilename,B,dict:new()). to_dict_b(_InFilename, <<>>, Dict) -> Dict; to_dict_b(InFilename, Binary, Dict) -> { ReducedBinary, Line } = next_line_b(Binary), Int = list_to_integer(Line), NewDict = dict:update_counter(Int,1,Dict), to_dict_b(InFilename, ReducedBinary, NewDict). next_line_b(Binary) -> next_line_b(Binary, []). next_line_b(<<>>, _Collected) -> ignore_last_line_if_didnt_end_in_newline; next_line_b(<<"\n",Rest/binary>>, Collected) -> { Rest, binary_to_list(list_to_binary(lists:reverse(Collected))) }; % black magic voodoo line next_line_b(<<C:1/binary,Rest/binary>>, Collected) -> next_line_b(Rest, [C|Collected]).
this gives a speed up from 1m 16s with cpu user / system / idle of 75 / 10 / 15 to 40s with a much better cpu utilisation of user / system / idle 94 / 5 / 1 furthermore the number of context switches is about 20 times less (!!)
but what is the trade off? apart from the binary voodoo black magic lines required; (in particular binary-to-list-to-binary-list-reverse-to-wtf?!?! ) it is more expensive in memory. whereas the slower version occupies about 40mb resident memory the binary version is more like 100mb
this binary approach (even though there is a memory cost) doesnt feel right though.... there must be a better way to get a similiar effeciency using standard libraries
<< multiple process erlang implementation index running on amazon ec2 >>
nov 2008