MapReduce for Ruby: Ridiculously Easy Distributed Programming
Google's MapReduce is now available for Ruby (via gem install starfish). MapReduce is the technique used by Google to do monstrous distributed programming over 30 terabyte files.
Here is the basic code that will get you up and running with MapReduce in Starfish.
# item.rb ActiveRecord::Base.establish_connection( :adapter => "mysql", :host => "localhost", :username => "root", :password => "", :database => "some_database" ) class Item < ActiveRecord::Base; end server do |map_reduce| map_reduce.type = Item end client do |item| logger.info item.id end
Now just run:
starfish item.rb
and Starfish takes care of the rest. The code above does the following:
- The server grabs all the items via: Item.find(:all)
- Each of the clients grab an item from the collection
- When there are no more items to be grabbed, everything shuts down
Just add REST (and it's come by default with the Edge Rails) and you'll have your own S3 for free ;)