Skip to content

Ensure test threads share a DB connection#28083

Merged
matthewd merged 1 commit intorails:masterfrom
eileencodes:ensure-test-threads-shared-db-conn
Feb 20, 2017
Merged

Ensure test threads share a DB connection#28083
matthewd merged 1 commit intorails:masterfrom
eileencodes:ensure-test-threads-shared-db-conn

Conversation

@eileencodes
Copy link
Member

This ensures multiple threads inside a transactional test to see consistent
database state.

When a system test starts Puma spins up one thread and Capybara spins up
another thread. Because of this when tests are run the database cannot
see what was inserted into the database on teardown. This is because
there are two threads using two different connections.

This change uses the statement cache to lock the threads to using a
single connection ID instead of each not being able to see each other.
This code only runs in the fixture setup and teardown so it does not
affect real production databases.

When a transaction is opened we set lock_thread to Thread.current so
we can keep track of which connection the thread is using. When we
rollback the transaction we unlock the thread and then there will be no
left-over data in the database because the transaction will roll back
the correct connections.

[ Eileen M. Uchitelle, Matthew Draper ]

cc/ @matthewd

This ensures multiple threads inside a transactional test to see consistent
database state.

When a system test starts Puma spins up one thread and Capybara spins up
another thread. Because of this when tests are run the database cannot
see what was inserted into the database on teardown. This is because
there are two threads using two different connections.

This change uses the statement cache to lock the threads to using a
single connection ID instead of each not being able to see each other.
This code only runs in the fixture setup and teardown so it does not
affect real production databases.

When a transaction is opened we set `lock_thread` to `Thread.current` so
we can keep track of which connection the thread is using. When we
rollback the transaction we unlock the thread and then there will be no
left-over data in the database because the transaction will roll back
the correct connections.

[ Eileen M. Uchitelle, Matthew Draper ]
@sgrif
Copy link
Contributor

sgrif commented Feb 24, 2017

Sorry for dropping in with little context, but this seems like it needs to do some additional work to actually ensure the connections aren't used concurrently. While usually the server is only executing if the test is blocked on a request, that's not always the case. Particularly if Javascript is involved, it seems like we could very easily end up with a race condition between the threads. The underlying C structures that the sqlite and pg gems are wrapping are very specifically not thread safe (I'm unsure about mysql2)

(But also I've not been following super closely since I'm away on leave so feel free to just tell me I'm wrong)

@matthewd
Copy link
Member

@sgrif https://github.com/rails/rails/pull/28083/files#diff-c226a4680f86689c3c170d4bc5911e96R610

Could do with some rearrangement to make it clearer what's going on, but this was the easiest option for a simple drop-in solution for [almost] all actual-adapter interaction in one go.

@matthewd
Copy link
Member

(higher level concurrency issues, like one thread working in a transaction, or some other long term statefulness on the connection, are consciously out of scope)

@sgrif
Copy link
Contributor

sgrif commented Feb 24, 2017

Oh, I get it now. I parsed that line wrong mentally before. Yeah, seems fine then. I do think it would be a good idea to add some more explicit locking throughout the methods higher up in the future. We should also document that the result returned needs to either be thread safe or eagerly buffer since a lazy cursor that isn't thread safe would cause issues.

mtsmfm added a commit to mtsmfm/rails that referenced this pull request Mar 15, 2017
…ldren`)

If we run only following tests:

- test/cases/scoping/default_scoping_test.rb
- test/cases/associations_test.rb

```
$ cat Rakefile.test
require "rake/testtask"

ENV["ARCONN"] = "postgresql"

Rake::TestTask.new do |t|
  t.libs << "test"
  t.test_files = %w(
    test/cases/scoping/default_scoping_test.rb
    test/cases/associations_test.rb
  )
end
```

a test will fail:

```
$ bundle exec rake test -f Rakefile.test
/app/activesupport/lib/active_support/core_ext/enumerable.rb:20: warning: method redefined; discarding old sum
Using postgresql
Run options: --seed 11830

# Running:

.........................................................................................F................

Finished in 6.939055s, 15.2759 runs/s, 27.9577 assertions/s.

  1) Failure:
AssociationProxyTest#test_save_on_parent_saves_children [/app/activerecord/test/cases/associations_test.rb:185]:
Expected: 1
  Actual: 2

106 runs, 194 assertions, 1 failures, 0 errors, 0 skips
rake aborted!
Command failed with status (1)
/usr/local/bin/bundle:22:in `load'
/usr/local/bin/bundle:22:in `<main>'
Tasks: TOP => test
(See full trace by running task with --trace)
```

In rails#28083, change `self.use_transactional_tests` to `false`
but we forget to clean-up fixture.
However we don't have to disable transaction except a few tests.
dsander added a commit to dsander/huginn that referenced this pull request Apr 28, 2017
In Rails 5.1 transactional tests share the same connection id between
the webserver and test runner. This removes the need for special cleanup
strategies.

This speeds up the tests significantly, before:
```
Finished in 3 minutes 30.3 seconds (files took 5.46 seconds to load)
```

After:
```
Finished in 1 minute 41.61 seconds (files took 5.45 seconds to load)
```

rails/rails#28083
dsander added a commit to dsander/huginn that referenced this pull request Apr 29, 2017
In Rails 5.1 transactional tests share the same connection id between
the webserver and test runner. This removes the need for special cleanup
strategies.

This speeds up the tests significantly, before:
```
Finished in 3 minutes 30.3 seconds (files took 5.46 seconds to load)
```

After:
```
Finished in 1 minute 41.61 seconds (files took 5.45 seconds to load)
```

rails/rails#28083
@bf4
Copy link
Contributor

bf4 commented Apr 30, 2017

I'd be willing to make a PR to backport this to earlier versions of Rails if there's an interest.

@eileencodes
Copy link
Member Author

@bf4 I consider this a feature, not a bug fix, so it won't be back ported. It changes expected behavior too much. Sometimes if a bug lives long enough it becomes a feature. That's the case for this change.

dsander added a commit to dsander/huginn that referenced this pull request May 15, 2017
In Rails 5.1 transactional tests share the same connection id between
the webserver and test runner. This removes the need for special cleanup
strategies.

This speeds up the tests significantly, before:
```
Finished in 3 minutes 30.3 seconds (files took 5.46 seconds to load)
```

After:
```
Finished in 1 minute 41.61 seconds (files took 5.45 seconds to load)
```

rails/rails#28083
dsander added a commit to dsander/huginn that referenced this pull request May 16, 2017
In Rails 5.1 transactional tests share the same connection id between
the webserver and test runner. This removes the need for special cleanup
strategies.

This speeds up the tests significantly, before:
```
Finished in 3 minutes 30.3 seconds (files took 5.46 seconds to load)
```

After:
```
Finished in 1 minute 41.61 seconds (files took 5.45 seconds to load)
```

rails/rails#28083
dsander added a commit to dsander/huginn that referenced this pull request May 19, 2017
In Rails 5.1 transactional tests share the same connection id between
the webserver and test runner. This removes the need for special cleanup
strategies.

This speeds up the tests significantly, before:
```
Finished in 3 minutes 30.3 seconds (files took 5.46 seconds to load)
```

After:
```
Finished in 1 minute 41.61 seconds (files took 5.45 seconds to load)
```

rails/rails#28083
@query_cache[sql][binds] = yield
end
result.dup
@lock.synchronize do
Copy link
Contributor

@tgxworld tgxworld Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code only runs in the fixture setup and teardown so it does not
affect real production databases.

@eileencodes @matthewd Is this lock and the lock in abstract_adapter necessary outside of the test environment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.. my theory was that acquiring an uncontended lock wouldn't be noticeably slower than checking whether the lock was needed (given that we're about to perform IO anyway). I guess the fact you're asking suggests I was wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I started noticing it in our flamegraphs but the overhead doesn't contribute significantly when I tried to benchmark it.

@moveson
Copy link

moveson commented Oct 19, 2017

I'm using Rails 5.1/Devise/RSpec/Capybara, and the test thread does not appear to be sharing a connection when running a selenium browser. Reference the following test:

RSpec.describe 'User logs in' do
  let!(:user) { create(:user, email: email, password: password, password_confirmation: password) }
  let(:email) { 'jane@example.com' }
  let(:password) { '12345678' }

  scenario 'with valid email and password' do
    visit new_user_session_path
    fill_in 'Email', with: email
    fill_in 'Password', with: password
    click_button 'Sign in'
    expect(page).to have_content('You are signed in.')
  end
end

This passes when driven_by :rack_test, but when driven_by :selenium it results in "Invalid email or password". It fails in the same way when using selenium-chrome and selenium-chrome-headless.

@eileencodes
Copy link
Member Author

@moveson please open a new issue with a way to reproduce it and demonstrates the failure you're seeing.

@moveson
Copy link

moveson commented Oct 19, 2017

@eileencodes This turned out to be a problem with puma running in cluster mode, which made the Capybara thread unable to see the database. The problem has been addressed in a Rails PR here and will hopefully see the light of day in Rails 5.1.5.

@maschwenk
Copy link
Contributor

Glad to see it helped someone else!

@moveson
Copy link

moveson commented Oct 19, 2017

It's a good fix. Also thanks to @twalpole for diagnosing the problem and pointing me to the solution.

jonathanhefner added a commit to jonathanhefner/rails that referenced this pull request Nov 15, 2020
Before rails#34953, when using the `:async` Active Job queue adapter, jobs
enqueued in `db/seeds.rb`, such as Active Storage analysis jobs, would
cause a hang (see rails#34939).  Therefore, rails#34953 changed all jobs enqueued
in `db/seeds.rb` to use the `:inline` queue adapter instead.  (This
behavior was later limited to only take effect when the `:async` adapter
was configured, see rails#35905.)  However, inline jobs in `db/seeds.rb`
cleared `CurrentAttributes` values (see rails#37526).  Therefore, rails#37568
changed the `:inline` adapter to wrap each job in its own thread, for
isolation.  However, wrapping a job in its own thread affects which
database connection it uses.  Thus inline jobs can no longer execute
within the calling thread's database transaction, including seeing any
uncommitted changes.  Additionally, if the calling thread is not wrapped
with the executor, the inline job thread (which is wrapped with the
executor) can deadlock on the load interlock.  And when testing (with
`connection_pool.lock_thread = true`), the inline job thread can
deadlock on one of the locks added by rails#28083.

Therefore, this commit reverts the solutions of rails#34953 and rails#37568, and
instead wraps evaluation of `db/seeds.rb` with the executor.  This
eliminates the original hang from rails#34939, which was also due to running
multiple threads and not wrapping all of them with the executor.  And,
because nested calls to `executor.wrap` are ignored, any inline jobs in
`db/seeds.rb` will not clear `CurrentAttributes` values.

Alternative fix for rails#34939.
Reverts rails#34953.
Reverts rails#35905.
Partially reverts rails#35896.

Alternative fix for rails#37526.
Reverts rails#37568.

Fixes rails#40552.
jonathanhefner added a commit to jonathanhefner/rails that referenced this pull request Nov 18, 2020
Before rails#34953, when using the `:async` Active Job queue adapter, jobs
enqueued in `db/seeds.rb`, such as Active Storage analysis jobs, would
cause a hang (see rails#34939).  Therefore, rails#34953 changed all jobs enqueued
in `db/seeds.rb` to use the `:inline` queue adapter instead.  (This
behavior was later limited to only take effect when the `:async` adapter
was configured, see rails#35905.)  However, inline jobs in `db/seeds.rb`
cleared `CurrentAttributes` values (see rails#37526).  Therefore, rails#37568
changed the `:inline` adapter to wrap each job in its own thread, for
isolation.  However, wrapping a job in its own thread affects which
database connection it uses.  Thus inline jobs can no longer execute
within the calling thread's database transaction, including seeing any
uncommitted changes.  Additionally, if the calling thread is not wrapped
with the executor, the inline job thread (which is wrapped with the
executor) can deadlock on the load interlock.  And when testing (with
`connection_pool.lock_thread = true`), the inline job thread can
deadlock on one of the locks added by rails#28083.

Therefore, this commit reverts the solutions of rails#34953 and rails#37568, and
instead wraps evaluation of `db/seeds.rb` with the executor.  This
eliminates the original hang from rails#34939, which was also due to running
multiple threads and not wrapping all of them with the executor.  And,
because nested calls to `executor.wrap` are ignored, any inline jobs in
`db/seeds.rb` will not clear `CurrentAttributes` values.

Alternative fix for rails#34939.
Reverts rails#34953.
Reverts rails#35905.
Partially reverts rails#35896.

Alternative fix for rails#37526.
Reverts rails#37568.

Fixes rails#40552.
jonathanhefner added a commit to jonathanhefner/rails that referenced this pull request Dec 2, 2020
Before rails#34953, when using the `:async` Active Job queue adapter, jobs
enqueued in `db/seeds.rb`, such as Active Storage analysis jobs, would
cause a hang (see rails#34939).  Therefore, rails#34953 changed all jobs enqueued
in `db/seeds.rb` to use the `:inline` queue adapter instead.  (This
behavior was later limited to only take effect when the `:async` adapter
was configured, see rails#35905.)  However, inline jobs in `db/seeds.rb`
cleared `CurrentAttributes` values (see rails#37526).  Therefore, rails#37568
changed the `:inline` adapter to wrap each job in its own thread, for
isolation.  However, wrapping a job in its own thread affects which
database connection it uses.  Thus inline jobs can no longer execute
within the calling thread's database transaction, including seeing any
uncommitted changes.  Additionally, if the calling thread is not wrapped
with the executor, the inline job thread (which is wrapped with the
executor) can deadlock on the load interlock.  And when testing (with
`connection_pool.lock_thread = true`), the inline job thread can
deadlock on one of the locks added by rails#28083.

Therefore, this commit reverts the solutions of rails#34953 and rails#37568, and
instead wraps evaluation of `db/seeds.rb` with the executor.  This
eliminates the original hang from rails#34939, which was also due to running
multiple threads and not wrapping all of them with the executor.  And,
because nested calls to `executor.wrap` are ignored, any inline jobs in
`db/seeds.rb` will not clear `CurrentAttributes` values.

Alternative fix for rails#34939.
Reverts rails#34953.
Reverts rails#35905.
Partially reverts rails#35896.

Alternative fix for rails#37526.
Reverts rails#37568.

Fixes rails#40552.

(cherry picked from commit 648da12)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants