Retry brittle tests until they fail

Brittle tests are tests that only fail some of the time. Due to this it can be difficult to reproduce the scenario that's causing it to fail.

These brittle tests can fail because of all sorts of things, some of which are:

Randomness in tests, or lack thereof. For example: the same numbers are generated twice, causing conflicts for object IDs in the database or memory.
Test execution order. For example: tests that are run first are affecting the behavior of other tests. Test state leaks into other tests. For this scenario I love using RSpec bisect.
App bugs. Actual app specific bugs that only occur some of the time. What's causing it? Only way to find out is to get a failing test and start debugging.

We want to fix those brittle tests, but the problem with debugging brittle tests is that they never fail when you want them to. A test can run without failure a 100 times, and only fail the 101th time. To manually retry it 101 times is very time consuming and not a lot of fun.

What I found myself doing is squeezing a while-statement around the command I wanted to keep retrying, but that got cumbersome and a typo was easily made. Instead, let's use a small executable for that.

Retry executable

When I'm not sure what the exact reason for the "brittleness" of a test is, I grab my "retry until fail" helper executable, until-fail for short. This is a very small wrapper around a given command that keeps repeating until it encounters a failure.

Retry a command until it fails

Usage:

  $ until-fail true
  # Will repeat forever

  $ until-fail false
  # Fails at the first iteration and breaks out of the retry loop

  $ until-fail ruby -e "rand(0..1) == 1 ? (puts 'failed'; exit(1)) : (puts 'success')"
  # Fails randomly and breaks out of the retry loop when it fails

The until-fail executable repeats the given command, in our case a test, until it runs into the failing scenario that we want to investigate further. This allows us to debug the brittle test while it's in a broken scenario. With that new information we can hopefully fix that test.

Combined with logging or print statements to provide more information about the context of the test failure, it should now be easier to debug the brittle test.

$ until-fail ruby tests/brittle_test.rb
Retry #1
DEBUG: User#accepted_terms_and_conditions == true
Success!

Retry #2
DEBUG: User#accepted_terms_and_conditions == true
Success!

...

Retry #100
DEBUG: User#accepted_terms_and_conditions == true
Success!

Retry #101
DEBUG: User#accepted_terms_and_conditions == false
Failure/Error: expect(signed_up_user_names).to include("Tom")
  expected signed up users to include "Tom"

Where to find it

The script is available in this gist.

It's a small Bash executable you can run in your shell, tested with Bash and ZSH.

Download the file and run chmod +x until-fail on it to make it executable. Move it to a location specified in your $PATH so it can be run from any location by calling until-fail and passing in the command to retry.

$ until-fail ruby some_file.rb
$ until-fail ruby tests/brittle_test.rb

See also the start this post for more information how to create your own executables.

In combination with pry

I use this automatic retry method in combination with a well placed pry statement in Ruby to open a console when the error occurs.

$ until-fail ruby tests/brittle_test.rb
...

Retry #101
From: /path/to/project/tests/brittle_test.rb:11 :

     7: it "includes newly signed up user" do
     8:   begin
     9:     expect(signed_up_user_names).to include("Tom")
    10:   rescue Exception => e
 => 11:     binding.pry
    12:     raise e # Reraise the failure so the retry script stops
    13:   end
    14: end

I use rescue Exception, because RSpec raises an Exception on assertion failures. The rescue is only temporarily, we should remove it when the brittle test has been fixed.

Now I can run the test command wrapped in the until-fail helper, and walk away from the computer. When I come back some time later, hopefully a pry console is ready for me to start debugging the scenario that fails.

I use this little until-fail helper a lot while debugging brittle tests. It has certainly made it easier reproduce brittle tests locally. I've removed randomness in tests, cleaned up state between tests, and a variety of app specific scenarios that were the cause of the brittle test.

Try it out and let me know if it has helped you!