RSpec bisect saves the day!

Over the last few days, our test suite at work had a pretty gnarly run of test failures. We run almost all of our Ruby tests through RSpec, and group our feature specs to be run separately to our unit specs.

We usually expect some flake in our suite, as it's gigantic, and it's not impossible to write a test that doesn't completely isolate its state. Usually how this presents, though, is one spec failing sporadically so it's easy to know which file is the culprit and thus easy to fix the problem.

The last few days were different, though. Instead of the same spec failing, we saw multiple different and unrelated specs failing. Well, unrelated in that they were all in different parts of our application, but they all failed in the same way! Some string value we expected to be there, was nil!

But why? What was causing this behaviour in our specs? We also observed that sometimes, we just got lucky, and our suite passed without issue. So it was looking more definite that the order of our specs had a big part to play in the flake.

Enter rspec --bisect.

RSpec can be run with an additional flag, which, (when given a failing group of specs), will work out the smallest sets of specs needed to reproduce the failure. This can then be run locally, and with some educated guessing or trial and error, be used to narrow down what state may be leaking between specs.

If you notice flake in your CI/CD pipeline, you should be able to take the exact specs that were run and retry those with --bisect.

For example, if something like this is run on CI: bin/rspec foo_spec.rb bar_spec.rb baz_spec.rb

It can be updated to the following to run in bisect mode. bin/rspec --bisect foo_spec.rb bar_spec.rb baz_spec.rb

If successful, something like the following should be printed: bar_spec.rb[1:2] foo_spec.rb[1:1]

If the specs above are run in the order printed, a failure should consistently present itself. The numbers in the square brackets are the index or path of the specs that are run in each file — so [1:2] would be the second spec in the first group.

In our case, the tests that were failing were all Rails controller specs. We also rely on some I18n magic to persist a user's preferred language between requests (once set). A new controller spec had been introduced that must have been the first to exercise this logic - or at least exercise the logic without cleaning up its state. And so, where we were expecting some strings to exist in one language, they were coming back as nil, because they were not present for the language being leaked between specs. 🙌🏽

Subscribe via RSS

Tags: ruby testing