reduce, each_with_object, performance and you
I was debugging an oddly slow bit of code today at work. The code in question was fine for smaller collections but egregiously slow as soon as the collection was even medium sized.
After a bit of digging, I was able to pin down one of the bits of code (at least) that was contributing to our performance issues.
It looked like this:
After a bit of digging, I was able to pin down one of the bits of code (at least) that was contributing to our performance issues.
It looked like this:
class MyThing def call data = list.reduce({}) { |data, option| data.merge(option.id.to_s => option.value) } end end
Which, on the surface looks pretty harmless, but after some more digging and some measuring, it looks like `reduce` allocates a new object (and then some) during each iteration!
Let's take a look:
Let's take a look:
def measure start = Time.now x = GC.stat :total_allocated_objects yield end_time = Time.now [GC.stat(:total_allocated_objects) - x, end_time - start] end class MyTest def call big_list = 20_000.times.map do OpenStruct.new(id: Random.rand(10_000), value: "Option #{Random.rand(10_000)}") end measure { data = big_list.reduce({}) { |data, option| data.merge(option.id => option.value) } } end end # => [60,004, 1.134663]
Watch as we turn 20,000 objects in to 60,004!!
Looking at the source for reduce / inject we can see that towards the bottom, we're allocating once to keep track of the current iteration, and potentially again when rb_block_call is called (if the block would allocate, which it does since Hash#merge will return a new copy of the hash).
Compared to `each_with_object` which mutates the same object:
Looking at the source for reduce / inject we can see that towards the bottom, we're allocating once to keep track of the current iteration, and potentially again when rb_block_call is called (if the block would allocate, which it does since Hash#merge will return a new copy of the hash).
Compared to `each_with_object` which mutates the same object:
class MyTest def call big_list = 20_000.times.map do OpenStruct.new(id: Random.rand(10_000), value: "Option #{Random.rand(10_000)}") end measure { data = big_list.each_with_object({}) { |option, data| data[option.id] = option.value } } end end # => [3, 0.029746]
Here we only allocate 3 objects and are done ~50 times faster than when we used `reduce`.
So, given we're not trying to transform our list in to a single value, but rather map our list in to a different shape, `each_with_object` is definitely the way to go both from a semantic point of view, but also for performance.
So, given we're not trying to transform our list in to a single value, but rather map our list in to a different shape, `each_with_object` is definitely the way to go both from a semantic point of view, but also for performance.
Tags: ruby