reduce, each_with_object, performance and you

I was debugging an oddly slow bit of code today at work. The code in question was fine for smaller collections but egregiously slow as soon as the collection was even medium sized.

After a bit of digging, I was able to pin down one of the bits of code (at least) that was contributing to our performance issues.

It looked like this:

class MyThing
  def call
    data = list.reduce({}) { |data, option|
      data.merge(option.id.to_s => option.value)
    }
  end
end

Which, on the surface looks pretty harmless, but after some more digging and some measuring, it looks like `reduce` allocates a new object (and then some) during each iteration!

Let's take a look:

def measure
  start = Time.now
  x = GC.stat :total_allocated_objects
  yield
  end_time = Time.now
  [GC.stat(:total_allocated_objects) - x, end_time - start]
end

class MyTest
  def call
    big_list = 20_000.times.map do
      OpenStruct.new(id: Random.rand(10_000), value: "Option #{Random.rand(10_000)}")
    end

    measure {
      data = big_list.reduce({}) { |data, option|
        data.merge(option.id => option.value)
      }
    }
  end
end
# => [60,004, 1.134663]

Watch as we turn 20,000 objects in to 60,004!!

Looking at the source for reduce / inject we can see that towards the bottom, we're allocating once to keep track of the current iteration, and potentially again when rb_block_call is called (if the block would allocate, which it does since Hash#merge will return a new copy of the hash).

Compared to `each_with_object` which mutates the same object:

class MyTest
  def call
    big_list = 20_000.times.map do
      OpenStruct.new(id: Random.rand(10_000), value: "Option #{Random.rand(10_000)}")
    end

    measure {
      data = big_list.each_with_object({}) { |option, data|
        data[option.id] = option.value
      }
    }
  end
end

# => [3, 0.029746]

Here we only allocate 3 objects and are done ~50 times faster than when we used `reduce`.

So, given we're not trying to transform our list in to a single value, but rather map our list in to a different shape, `each_with_object` is definitely the way to go both from a semantic point of view, but also for performance.

Tags: ruby

Published on: 13 February, 2025

RSS