I recently was working on a Ruby project where I had to query an SQL server, and the connection setup/query/response/close latency was about 5 seconds. I needed to generate many unique data slices for a reporting project, and the naive implementation generated some 6,000 database queries, which would take about eight hours to run.

I could fetch all the data in one query that had a large group by clause, which would return all the data I needed in 60 seconds instead of 8 hours.

But then I had a pile of data in rows. That’s not very handy when you have to slice it by seven different characteristics.

So I wrote a recursive Ruby hash structure with semantics that allow me to retrieve by arbitrary attributes:

h=RecursiveHash.new([:business, :product_line, :format, :country, :state, :count], 0)

h.insert [‘ConsumerBusiness’, ‘Product A’, ‘HTML’, “UnitedStates”, 1, 10]

h.retrieve { :product_line => ‘ConsumerBusiness’ }
h.retrieve { :product_line => ‘ConsumerBusiness’, :doc_format => ‘HTML’ }

The default mode is to sum the responses, but it can also return an array of matches.
It’s really fast, and has a simple interface that, to me, is highly readable.

Here it is – RecursiveHash.rb:

class Array; def sum; inject( nil ) { |sum,x| sum ? sum+x : x }; end; end

class RecursiveHash
def initialize(attributes, default_value, mode=:sum)
@attribute_name = attributes[0]
@child_attributes=attributes[1..-1]
@default_value=default_value
@master=Hash.new
@mode=mode
end

def insert(values)
if values.size > 2
#puts "Inserting child hash at key #{values[0]}, child: #{values[1..-1].join(',')}"
if @master[values[0]]==nil
@master[values[0]]=RecursiveHash.new(@child_attributes, @default_value)
end
@master[values[0]].insert(values[1..-1])
else
puts "Inserting value at key #{values[0]}, value: #{values[1]}"
@master[values[0]]=values[1]
end
end

def return_val(obj, attributes)
if obj.is_a? RecursiveHash
return obj.retrieve(attributes)
elsif obj==nil
return @default_value
else
return obj
end
end

def retrieve(attributes)
if attributes[@attribute_name]==nil or attributes[@attribute_name]=='*' or attributes[@attribute_name].is_a? Array
keys=nil
if attributes[@attribute_name].is_a? Array
keys=attributes[@attribute_name]
else
keys=@master.keys
end

v=keys.collect { |key| return_val(@master[key], attributes) }
#puts "v: #{v.join(',')}"
return @mode==:sum ? v.sum : v
else
return return_val(@master[attributes[@attribute_name]], attributes)
end
end

def pprint(n=0, parent_key="N/A")
indent = " " * n
puts "#{indent}#{parent_key} (holds #{@attribute_name})"
@master.each_key { |key|
if @master[key].is_a? RecursiveHash
@master[key].pprint(n+1, key)
else
puts "#{indent} #{key}: #{@master[key]}"
end
}
end
end