Faker? More like "Flaker," Amirite?

My apologies for the snarky heading. I just couldn't pass up the joke.

For what it's worth, I've greatly enjoyed using Faker over the years, and I think it's great for generating semi-realistic looking data for demo purposes.

Unfortunately, after literal years spent trying to find and eliminate flaky tests in mature Rails apps, I am now of the opinion that Faker is fundamentally unsuitable for generating test data.

If I've overlooked something, I'd love to be wrong!

Jargon Alert

I assume that most readers are familiar with pseudorandom number generators (usually abbreviated to PRNG). If you're not, check out the appendix. (Bonus: that document contains a fun link.)

TL;DR

Using Faker to generate data for tests can create test flakiness that is extremely difficult to track down. To its credit, Faker does have a feature it calls "Deterministic Random", which can mitigate -- but not eliminate -- one cause of test flakiness.

Unfortunately, Faker's use of a PRNG creates unavoidable order dependencies between tests, because the number of times a PRNG has been called is global state.

Worse, this can horribly break RSpec's "bisect" feature (which is specifically designed to make it easier to isolate flaky tests!) by giving it pathological data.

Demonstrating the Problem

The file ./order_dependent_specs.rb contains a suite of ten tests that always run in the same order, with the same PRNG seed. Each test uses Faker to generate a value and asserts that that value is the next expected value in the sequence.

For example, if Faker produces the sequence foo bar [...], test #1 checks for foo, test #2 checks for bar, and so on.

All but one of the tests in this suite should always pass. The last test checks for an environment variable FAIL_ON_LAST. If FAIL_ON_LAST is not blank, the last test will assert that the value it was given is NOT equal to the last value in the expected sequence. The upshot is that this test will fail if and only if it is the final test to run with a non-blank value in ENV["FAIL_ON_LAST"].

This may seem trivial and/or contrived. I assure you, reader: I have personally seen equivalent behavior on a real-world codebase. (See "Final Thoughts", below.)

I've provided some Rake tasks to quickly demonstrate the behavior:

Clone this repository and run bundle install.
Run bundle exec rake spec:pass.

Expected output: ten specs, all of which pass.

Run bundle exec rake spec:fail.

Expected output: the same ten specs, the last of which fails.

So far, so good. Now for the "fun" part: Running RSpec bisect! (More on this feature below.)

Run bundle exec rake spec:bisect.

Expected output: verbose output from RSpec's --bisect feature, which concludes with a line like the following (reported times will vary):
- Bisect complete! Reduced necessary non-failing examples from 9 to 9 in 11.73 seconds.

There are two important things to notice:

The total time it takes for RSpec to run using the --bisect flag is approximately 2 orders of magnitude longer than the time it takes to run the entire test suite once.
When it does finally finish, RSpec reports that it was unable to remove any tests from the set it was given. (Note the "Reduced [...] from 9 to 9" in its output.)

Further Discussion

Bisect

RSpec's "bisect" feature is designed to make it easier to find order dependencies between tests. RSpec bisect attempts to identify a "minimal reproduction command" that will run the fewest tests required to reproduce the failure.

RSpec bisect works by:

Taking a command that should reproduce a failure;
Confirming that running that command does, in fact, cause at least one test to fail;
Confirming that the failure is, in fact, order-dependent;
Trying to figure out which tests can be skipped without causing the failing test to pass.

(If you're curious about this, I'll refer you to RSpec's documentation, which is -- if you'll pardon the pun -- exemplary.)

NOTE: Minitest also has a bisect tool. I've never used it, but I'd expect it to behave similarly.

Faker + RSpec bisect = PAIN (So. Much. Pain.)

Unfortunately, if a particular flaky test is flaky because its pass/fail behavior depends on test data generated by Faker, this creates a worst-case performance scenario for RSpec bisect.

Unless you are very, very, very lucky, the only thing you will accomplish by running rspec --bisect on such a test will be to waste time and electricity.

This is because skipping tests that also use Faker to generate their data will cause the PRNG to be called a different number of times. When this happens, the flaky test will get a different value from Faker -- and, if that change causes the test to pass, RSpec will be unable to exclude it from the "minimal reproduction set."

Unfortunately, when this happens, RSpec bisect prints out a message reading Multiple culprits detected - splitting candidates and will then attempt to split up the same set of tests in a different way. It will do this again and again, trying to find even one test it can remove, and only when it has completely exhausted the search space will it finally terminate.

Of course, on a real-world test suite where the set of tests being bisected takes longer than a second or two to run, the user will probably abort the process. However, even if bisect is allowed to run to completion, it will report that the set of tests you gave it was already minimal.

Alternatives

I honestly don't think it's possible to fix this behavior in the general case.

I've been thinking about this off and on for a year or two, and the only workaround I've come up with is to capture the values returned by Faker in the failing test, and then replay those specific values when that specific test is run.

Obviously, this would require explicit integration with the test framework -- not only that RSpec (or Minitest) is running, but that it's running bisect, and whether the failing test is being run in the initial or subsequent phases.

Furthermore, that workaround only helps if the developer who's running bisect understands the problems of using pseudorandomness in the first place. I may be biased, but I rather suspect that experienced practitioners of TDD have a healthy fear of any kind of randomness in a test suite (unless, of course, they're explicitly doing fuzz testing) -- and thus would be unlikely to use Faker in tests in the first place.

Final Thoughts

I first realized this problem when I was working on a Rails app that had some very simple validation on names. (Yes, I know: validating names is a terrible idea.)

Most names generated by Faker were fine, but around 10-20% of the time, Faker would generate a name that failed validation -- almost always in a test that had nothing to do with name validation. The usual solution was to "just" re-run the build... which typically took 15 minutes, and might need to be repeated more than once if we were unlucky.

Once I figured out what the problem was (after several attempts at RSpec bisect that ran overnight to no avail), I grepped the codebase to find all uses of Faker, replaced them with FactoryBot sequences, and the tests became MUCH more reliable.

I could have figured out how to get Faker not to generate the specific data that was causing problems... but that would only have addressed the symptom. Future developers might have encountered similar flakiness in other scenarios, and might not realized what the underlying problem was.

(Maybe I'm slow, but it took me a few years to figure this out... and that was only after I fixed a more obvious source of test flakiness in the same suite, which was a before_action on all controllers that looked something like Time.zone = current_user.time_zone. Turns out Ruby constants are another great way to leak state between tests... )

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README-appendix.md		README-appendix.md
README.md		README.md
Rakefile		Rakefile
order_dependent_specs.rb		order_dependent_specs.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Faker? More like "Flaker," Amirite?

Jargon Alert

TL;DR

Demonstrating the Problem

Further Discussion

Bisect

Faker + RSpec bisect = PAIN (So. Much. Pain.)

Alternatives

Final Thoughts

About

Uh oh!

Releases

Packages

Uh oh!

Languages

geeksam/flaker

Folders and files

Latest commit

History

Repository files navigation

Faker? More like "Flaker," Amirite?

Jargon Alert

TL;DR

Demonstrating the Problem

Further Discussion

Bisect

Faker + RSpec bisect = PAIN (So. Much. Pain.)

Alternatives

Final Thoughts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages