Skip to content

Don't apply locking around basic #load / #require#20989

Merged
matthewd merged 1 commit into
rails:masterfrom
matthewd:no-lock-basic-require
Jul 23, 2015
Merged

Don't apply locking around basic #load / #require#20989
matthewd merged 1 commit into
rails:masterfrom
matthewd:no-lock-basic-require

Conversation

@matthewd

Copy link
Copy Markdown
Member

That's outside our remit, and dangerous... if a caller has their own locking to protect against the natural race danger, we'll deadlock against it.

@mperham reported an issue whereby one thread would end up waiting at:

monitor.rb:110:in `sleep'
monitor.rb:110:in `wait'
monitor.rb:110:in `wait'
monitor.rb:122:in `wait_while'
activesupport/lib/active_support/concurrency/share_lock.rb:58:in `block in start_exclusive'
monitor.rb:211:in `mon_synchronize'
activesupport/lib/active_support/concurrency/share_lock.rb:49:in `start_exclusive'
activesupport/lib/active_support/concurrency/share_lock.rb:113:in `exclusive'
activesupport/lib/active_support/dependencies/interlock.rb:11:in `loading'
activesupport/lib/active_support/dependencies.rb:37:in `load_interlock'
activesupport/lib/active_support/dependencies.rb:265:in `load_dependency'
activesupport/lib/active_support/dependencies.rb:304:in `require'
redis-3.2.1/lib/redis/client.rb:465:in `_parse_driver'
redis-3.2.1/lib/redis/client.rb:433:in `_parse_options'
redis-3.2.1/lib/redis/client.rb:74:in `initialize'
redis-3.2.1/lib/redis.rb:31:in `new'
redis-3.2.1/lib/redis.rb:31:in `initialize'
sidekiq/lib/sidekiq/redis_connection.rb:28:in `new'
sidekiq/lib/sidekiq/redis_connection.rb:28:in `build_client'
sidekiq/lib/sidekiq/redis_connection.rb:19:in `block in create'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:169:in `call'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:169:in `try_create'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:81:in `block (2 levels) in pop'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:77:in `loop'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:77:in `block in pop'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:76:in `synchronize'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:76:in `pop'
connection_pool-2.2.0/lib/connection_pool.rb:89:in `checkout'
connection_pool-2.2.0/lib/connection_pool.rb:61:in `block in with'
connection_pool-2.2.0/lib/connection_pool.rb:60:in `handle_interrupt'
connection_pool-2.2.0/lib/connection_pool.rb:60:in `with'

with others stuck at:

connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:76:in `synchronize'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:76:in `pop'
connection_pool-2.2.0/lib/connection_pool.rb:89:in `checkout'
connection_pool-2.2.0/lib/connection_pool.rb:61:in `block in with'
connection_pool-2.2.0/lib/connection_pool.rb:60:in `handle_interrupt'
connection_pool-2.2.0/lib/connection_pool.rb:60:in `with'

So, thread A acquires the connection_pool lock, and is able to proceed. Meanwhile thread B enters app space. A now attempts to make an explicit require call (which would be naturally safe; protected by ruby's internal loading tracking), but we intervene and force it to first acquire our load lock -- it must now wait until thread B either completes, or hits a load-lock acquisition of its own. B now follows the same path, and blocks on its attempt to acquire the connection_pool lock = deadlock.

Change the above scenario from an explicit require to a mention of a missing constant, and we'll still deadlock after this PR -- but that one is more defensible: we have no other choice. It's also (IMO) a lower risk exposure: "user" code, which is where auto-loadable constants are likely to be mentioned, is unlikely to be casually executing inside an exclusive lock... require inside an exclusive lock is a not-unreasonable prospect for library code.

.. it's also fixable, by just mentioning the constant before you acquire the other lock. That doesn't apply to require, because we were acquiring the load lock before we even checked whether the file had already been loaded.


Moreover, the previous implementation was Philosophically Wrong: AS::Dep isn't supposed to be enhancing or in any way interfering with the operation of load/require. It only defines them in order to exclude them from its normal behaviour.

To do so would also give a false sense of protection, against a set of dangers that would persist/return when the load interlock is removed (e.g., in a production environment where eager_load and cache_classes are both true).

That's outside our remit, and dangerous... if a caller has their own
locking to protect against the natural race danger, we'll deadlock
against it.
matthewd added a commit that referenced this pull request Jul 23, 2015
Don't apply locking around basic #load / #require
@matthewd matthewd merged commit d517239 into rails:master Jul 23, 2015
@matthewd

Copy link
Copy Markdown
Member Author

cc @thedarkone @tenderlove @fxn

@matthewd

Copy link
Copy Markdown
Member Author

See also #20928 (comment).

@thedarkone's conclusion there holds for anything that's triggering "actual" constant autoloading inside some 3rd party lock... this PR reflects the fact that this particular instance was instead caused more by my overreach when adding locking in #17102.

@thedarkone

Copy link
Copy Markdown
Contributor

@matthewd you brought this on yourself :bowtie:, that's what you get for attempting to make const loading thread safe 😮.

Now with d517239 I see a problem: we're accessing Dependencies.constant_watch_stack without a lock (as in, a thread outside of "app space" attempts to require something while concurrently another thread in "app space" is busy loading constants).

@matthewd

Copy link
Copy Markdown
Member Author

@thedarkone ah, good catch! 😕

I think we can add a:

def exclusive?
  @exclusive_thread == Thread.current
end

(because constant_watch_stack is only watching? when we're inside load_file via require_or_load)

Or do you think we should just switch the watch stack to a TS::Array?

Actually... given that it's only mutated when the load lock is held... are we already safe? I'll defer entirely to your expertise on thread safety of ruby primitives... but if the only thing we're doing concurrently is calling empty? on an empty array... that sounds like it could be okay... maybe?

@thedarkone

Copy link
Copy Markdown
Contributor

No no no, I sidestepped the thread-safety WatchStack (and its @watching.empty?), that wasn't what I what I was thinking about.

(because constant_watch_stack is only watching? when we're inside load_file via require_or_load)

True, constant_watch_stack is only watching? whenever a thread is in load_file or require_or_load, but since constant_watch_stack is a global, other threads that are not under interlock control might also see watching? # => true.

What I mean is, for example, a normal Rails request dispatch thread is in an exclusive section and starts loading a missing constant, at this point watching? # => true for all threads on a VM. Some other thread, that doesn't venture into interlock protected app space (an example of this are sidekiq threads before entering Processor#executor block), now requires something, sees watching? # => true and proceeds to make a mess of it by calling Dependencies.new_constants_in(Object).

I think we can fix this with your suggested exclusive? check:

if Dependencies.load? && ActiveSupport::Dependencies.constant_watch_stack.watching? && Dependencies.interlock.exclusive?
  Dependencies.new_constants_in(Object) { yield }
else
  yield
end

@matthewd

Copy link
Copy Markdown
Member Author

Ah, right.. because this is up in require/load, we can't necessarily assume that all relevant code is protected by the "app space" share lock.

On the other hand, I guess that's technically true even today: any code running outside the protection of Rack::Lock is in danger of interfering...

Hmm... maybe that's by design? Or we technically need another separate lock? 😱 If we just changed that conditional to ensure we hold the exclusive lock, the result would be whatever constants we load, getting misattributed to whichever file was being simultaneously autoloaded. (Note, the only purpose of this one is to ensure they don't show up in the other one.)

So, I think it's correct that we go down the first branch there, even in other threads... but it does seem separately race-prone. 😕

mperham added a commit to sidekiq/sidekiq that referenced this pull request Jul 23, 2015
@mperham

mperham commented Jul 23, 2015

Copy link
Copy Markdown
Contributor

Progress! 🎉

No more lockups but I can't get the code to reload. I fire off a bunch of jobs, I see the expected time output. I edit hard_worker.rb to change the time output and fire off another set of jobs. The time output remains as the original. Should app/workers/hard_worker.rb auto-reload now?

@thedarkone

Copy link
Copy Markdown
Contributor

On the other hand, I guess that's technically true even today: any code running outside the protection of Rack::Lock is in danger of interfering...

That is technically true, but on the other hand if anyone opens an issue reporting that const loading is broken when multiple threads are used, it is swiftly closed with an explanation that const loading is single threaded only.

You're right that an additional interlock.exclusive? isn't going to solve anything... 😟

@mperham as per original discussion #17102 (comment), it is expected that sidekiq is dispatching through Rails middleware stack. Namely through ActionDispatch::Reloader: https://github.com/rails/rails/blob/master/railties/lib/rails/application/default_middleware_stack.rb#L61. I assumed that Sidekiq.server_middleware was exactly that.

@mperham

mperham commented Jul 23, 2015

Copy link
Copy Markdown
Contributor

How do I recreate the code reloading middleware without needing to build an entire fake Rack stack?

@matthewd

Copy link
Copy Markdown
Member Author

@mperham for now, I think you'll need to do something equivalent to

callback = lambda do
ActiveSupport::Dependencies.interlock.attempt_loading do
ActiveSupport::DescendantsTracker.clear
ActiveSupport::Dependencies.clear
end
end
if config.reload_classes_only_on_change
reloader = config.file_watcher.new(*watchable_args, &callback)
self.reloaders << reloader
# Prepend this callback to have autoloaded constants cleared before
# any other possible reloading, in case they need to autoload fresh
# constants.
ActionDispatch::Reloader.to_prepare(prepend: true) do
# In addition to changes detected by the file watcher, if routes
# or i18n have been updated we also need to clear constants,
# that's why we run #execute rather than #execute_if_updated, this
# callback has to clear autoloaded constants after any update.
reloader.execute
end
else
ActionDispatch::Reloader.to_cleanup(&callback)
end
... which for initial testing purposes, probably means "arrange to call
ActiveSupport::Dependencies.interlock.attempt_loading do
ActiveSupport::DescendantsTracker.clear
ActiveSupport::Dependencies.clear
end
at the end of each job"

There's obviously still some API work to be done, to make this process nicer, and avoid you duplicating details you shouldn't need to know about.

@mperham

mperham commented Jul 23, 2015

Copy link
Copy Markdown
Contributor

It's alive and working! This is great, thank you so much for your help. Does this look reasonable and optimal to you, to reload the code after every job has executed?

https://github.com/mperham/sidekiq/blob/eb3a5cbc8660e863f75ab307939ff30097f85c53/lib/sidekiq/processor.rb#L78

@thedarkone

Copy link
Copy Markdown
Contributor

@mperham that would work, but it is going to attempt to unload/reload the whole app on each job, this also ignores changes to routes.rb, i18n etc. You can try to leverage Rails reloading machinery like this: https://gist.github.com/thedarkone/c996c4230dd794699615.

Note that my code won't work under concurrent load (yet) unless #21006 is fixed.

@mperham

mperham commented Jul 24, 2015

Copy link
Copy Markdown
Contributor

Thank you so much, I have an updated version working in Sidekiq now. routes don't matter to Sidekiq but most everything else does.

I think the 99% use case is little to no load in development mode. I can't think of a time where I had Sidekiq processing something while editing code. It's definitely a price I can bear.

@mperham

mperham commented Jul 24, 2015

Copy link
Copy Markdown
Contributor

sidekiq/sidekiq#2457

@matthewd matthewd deleted the no-lock-basic-require branch August 22, 2015 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants