Don't apply locking around basic #load / #require by matthewd · Pull Request #20989 · rails/rails

matthewd · 2015-07-23T00:40:48Z

That's outside our remit, and dangerous... if a caller has their own locking to protect against the natural race danger, we'll deadlock against it.

@mperham reported an issue whereby one thread would end up waiting at:

monitor.rb:110:in `sleep'
monitor.rb:110:in `wait'
monitor.rb:110:in `wait'
monitor.rb:122:in `wait_while'
activesupport/lib/active_support/concurrency/share_lock.rb:58:in `block in start_exclusive'
monitor.rb:211:in `mon_synchronize'
activesupport/lib/active_support/concurrency/share_lock.rb:49:in `start_exclusive'
activesupport/lib/active_support/concurrency/share_lock.rb:113:in `exclusive'
activesupport/lib/active_support/dependencies/interlock.rb:11:in `loading'
activesupport/lib/active_support/dependencies.rb:37:in `load_interlock'
activesupport/lib/active_support/dependencies.rb:265:in `load_dependency'
activesupport/lib/active_support/dependencies.rb:304:in `require'
redis-3.2.1/lib/redis/client.rb:465:in `_parse_driver'
redis-3.2.1/lib/redis/client.rb:433:in `_parse_options'
redis-3.2.1/lib/redis/client.rb:74:in `initialize'
redis-3.2.1/lib/redis.rb:31:in `new'
redis-3.2.1/lib/redis.rb:31:in `initialize'
sidekiq/lib/sidekiq/redis_connection.rb:28:in `new'
sidekiq/lib/sidekiq/redis_connection.rb:28:in `build_client'
sidekiq/lib/sidekiq/redis_connection.rb:19:in `block in create'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:169:in `call'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:169:in `try_create'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:81:in `block (2 levels) in pop'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:77:in `loop'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:77:in `block in pop'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:76:in `synchronize'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:76:in `pop'
connection_pool-2.2.0/lib/connection_pool.rb:89:in `checkout'
connection_pool-2.2.0/lib/connection_pool.rb:61:in `block in with'
connection_pool-2.2.0/lib/connection_pool.rb:60:in `handle_interrupt'
connection_pool-2.2.0/lib/connection_pool.rb:60:in `with'

with others stuck at:

connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:76:in `synchronize'
connection_pool-2.2.0/lib/connection_pool/timed_stack.rb:76:in `pop'
connection_pool-2.2.0/lib/connection_pool.rb:89:in `checkout'
connection_pool-2.2.0/lib/connection_pool.rb:61:in `block in with'
connection_pool-2.2.0/lib/connection_pool.rb:60:in `handle_interrupt'
connection_pool-2.2.0/lib/connection_pool.rb:60:in `with'

So, thread A acquires the connection_pool lock, and is able to proceed. Meanwhile thread B enters app space. A now attempts to make an explicit require call (which would be naturally safe; protected by ruby's internal loading tracking), but we intervene and force it to first acquire our load lock -- it must now wait until thread B either completes, or hits a load-lock acquisition of its own. B now follows the same path, and blocks on its attempt to acquire the connection_pool lock = deadlock.

Change the above scenario from an explicit require to a mention of a missing constant, and we'll still deadlock after this PR -- but that one is more defensible: we have no other choice. It's also (IMO) a lower risk exposure: "user" code, which is where auto-loadable constants are likely to be mentioned, is unlikely to be casually executing inside an exclusive lock... require inside an exclusive lock is a not-unreasonable prospect for library code.

.. it's also fixable, by just mentioning the constant before you acquire the other lock. That doesn't apply to require, because we were acquiring the load lock before we even checked whether the file had already been loaded.

Moreover, the previous implementation was Philosophically Wrong: AS::Dep isn't supposed to be enhancing or in any way interfering with the operation of load/require. It only defines them in order to exclude them from its normal behaviour.

To do so would also give a false sense of protection, against a set of dangers that would persist/return when the load interlock is removed (e.g., in a production environment where eager_load and cache_classes are both true).

That's outside our remit, and dangerous... if a caller has their own locking to protect against the natural race danger, we'll deadlock against it.

Don't apply locking around basic #load / #require

matthewd · 2015-07-23T00:42:42Z

cc @thedarkone @tenderlove @fxn

matthewd · 2015-07-23T00:49:53Z

See also #20928 (comment).

@thedarkone's conclusion there holds for anything that's triggering "actual" constant autoloading inside some 3rd party lock... this PR reflects the fact that this particular instance was instead caused more by my overreach when adding locking in #17102.

thedarkone · 2015-07-23T01:17:33Z

@matthewd you brought this on yourself , that's what you get for attempting to make const loading thread safe 😮.

Now with d517239 I see a problem: we're accessing Dependencies.constant_watch_stack without a lock (as in, a thread outside of "app space" attempts to require something while concurrently another thread in "app space" is busy loading constants).

matthewd · 2015-07-23T01:44:29Z

@thedarkone ah, good catch! 😕

I think we can add a:

def exclusive?
  @exclusive_thread == Thread.current
end

(because constant_watch_stack is only watching? when we're inside load_file via require_or_load)

Or do you think we should just switch the watch stack to a TS::Array?

Actually... given that it's only mutated when the load lock is held... are we already safe? I'll defer entirely to your expertise on thread safety of ruby primitives... but if the only thing we're doing concurrently is calling empty? on an empty array... that sounds like it could be okay... maybe?

thedarkone · 2015-07-23T03:13:45Z

No no no, I sidestepped the thread-safety WatchStack (and its @watching.empty?), that wasn't what I what I was thinking about.

(because constant_watch_stack is only watching? when we're inside load_file via require_or_load)

True, constant_watch_stack is only watching? whenever a thread is in load_file or require_or_load, but since constant_watch_stack is a global, other threads that are not under interlock control might also see watching? # => true.

What I mean is, for example, a normal Rails request dispatch thread is in an exclusive section and starts loading a missing constant, at this point watching? # => true for all threads on a VM. Some other thread, that doesn't venture into interlock protected app space (an example of this are sidekiq threads before entering Processor#executor block), now requires something, sees watching? # => true and proceeds to make a mess of it by calling Dependencies.new_constants_in(Object).

I think we can fix this with your suggested exclusive? check:

if Dependencies.load? && ActiveSupport::Dependencies.constant_watch_stack.watching? && Dependencies.interlock.exclusive?
  Dependencies.new_constants_in(Object) { yield }
else
  yield
end

matthewd · 2015-07-23T03:31:43Z

Ah, right.. because this is up in require/load, we can't necessarily assume that all relevant code is protected by the "app space" share lock.

On the other hand, I guess that's technically true even today: any code running outside the protection of Rack::Lock is in danger of interfering...

Hmm... maybe that's by design? Or we technically need another separate lock? 😱 If we just changed that conditional to ensure we hold the exclusive lock, the result would be whatever constants we load, getting misattributed to whichever file was being simultaneously autoloaded. (Note, the only purpose of this one is to ensure they don't show up in the other one.)

So, I think it's correct that we go down the first branch there, even in other threads... but it does seem separately race-prone. 😕

mperham · 2015-07-23T04:17:25Z

Progress! 🎉

No more lockups but I can't get the code to reload. I fire off a bunch of jobs, I see the expected time output. I edit hard_worker.rb to change the time output and fire off another set of jobs. The time output remains as the original. Should app/workers/hard_worker.rb auto-reload now?

thedarkone · 2015-07-23T14:26:35Z

On the other hand, I guess that's technically true even today: any code running outside the protection of Rack::Lock is in danger of interfering...

That is technically true, but on the other hand if anyone opens an issue reporting that const loading is broken when multiple threads are used, it is swiftly closed with an explanation that const loading is single threaded only.

You're right that an additional interlock.exclusive? isn't going to solve anything... 😟

@mperham as per original discussion #17102 (comment), it is expected that sidekiq is dispatching through Rails middleware stack. Namely through ActionDispatch::Reloader: https://github.com/rails/rails/blob/master/railties/lib/rails/application/default_middleware_stack.rb#L61. I assumed that Sidekiq.server_middleware was exactly that.

mperham · 2015-07-23T16:12:53Z

How do I recreate the code reloading middleware without needing to build an entire fake Rack stack?

matthewd · 2015-07-23T16:25:48Z

@mperham for now, I think you'll need to do something equivalent to

rails/railties/lib/rails/application/finisher.rb

Lines 88 to 111 in 8f81f7a

    
           callback = lambda do 
        
             ActiveSupport::Dependencies.interlock.attempt_loading do 
        
               ActiveSupport::DescendantsTracker.clear 
        
               ActiveSupport::Dependencies.clear 
        
             end 
        
           end 
        
           if config.reload_classes_only_on_change 
        
             reloader = config.file_watcher.new(*watchable_args, &callback) 
        
             self.reloaders << reloader 
        
             # Prepend this callback to have autoloaded constants cleared before 
        
             # any other possible reloading, in case they need to autoload fresh 
        
             # constants. 
        
             ActionDispatch::Reloader.to_prepare(prepend: true) do 
        
               # In addition to changes detected by the file watcher, if routes 
        
               # or i18n have been updated we also need to clear constants, 
        
               # that's why we run #execute rather than #execute_if_updated, this 
        
               # callback has to clear autoloaded constants after any update. 
        
               reloader.execute 
        
             end 
        
           else 
        
             ActionDispatch::Reloader.to_cleanup(&callback) 
        
           end

... which for initial testing purposes, probably means "arrange to call

rails/railties/lib/rails/application/finisher.rb

Lines 89 to 92 in 8f81f7a

    
           ActiveSupport::Dependencies.interlock.attempt_loading do 
        
             ActiveSupport::DescendantsTracker.clear 
        
             ActiveSupport::Dependencies.clear 
        
           end

at the end of each job"

There's obviously still some API work to be done, to make this process nicer, and avoid you duplicating details you shouldn't need to know about.

mperham · 2015-07-23T16:39:49Z

It's alive and working! This is great, thank you so much for your help. Does this look reasonable and optimal to you, to reload the code after every job has executed?

https://github.com/mperham/sidekiq/blob/eb3a5cbc8660e863f75ab307939ff30097f85c53/lib/sidekiq/processor.rb#L78

thedarkone · 2015-07-23T19:36:20Z

@mperham that would work, but it is going to attempt to unload/reload the whole app on each job, this also ignores changes to routes.rb, i18n etc. You can try to leverage Rails reloading machinery like this: https://gist.github.com/thedarkone/c996c4230dd794699615.

Note that my code won't work under concurrent load (yet) unless #21006 is fixed.

mperham · 2015-07-24T19:40:59Z

Thank you so much, I have an updated version working in Sidekiq now. routes don't matter to Sidekiq but most everything else does.

I think the 99% use case is little to no load in development mode. I can't think of a time where I had Sidekiq processing something while editing code. It's definitely a price I can bear.

mperham · 2015-07-24T20:34:24Z

sidekiq/sidekiq#2457

Don't apply locking around basic #load / #require

10c7f54

That's outside our remit, and dangerous... if a caller has their own locking to protect against the natural race danger, we'll deadlock against it.

matthewd added a commit that referenced this pull request Jul 23, 2015

Merge pull request #20989 from matthewd/no-lock-basic-require

d517239

Don't apply locking around basic #load / #require

matthewd merged commit d517239 into rails:master Jul 23, 2015

mperham added a commit to sidekiq/sidekiq that referenced this pull request Jul 23, 2015

More code reloading work, see rails/rails#20989

a6732d0

thedarkone mentioned this pull request Jul 23, 2015

Under concurrent load Dependencies.interlock might miss file changes #21006

Closed

matthewd deleted the no-lock-basic-require branch August 22, 2015 19:33

Conversation

matthewd commented Jul 23, 2015

Uh oh!

matthewd commented Jul 23, 2015

Uh oh!

matthewd commented Jul 23, 2015

Uh oh!

thedarkone commented Jul 23, 2015

Uh oh!

matthewd commented Jul 23, 2015

Uh oh!

thedarkone commented Jul 23, 2015

Uh oh!

matthewd commented Jul 23, 2015

Uh oh!

mperham commented Jul 23, 2015

Uh oh!

thedarkone commented Jul 23, 2015

Uh oh!

mperham commented Jul 23, 2015

Uh oh!

matthewd commented Jul 23, 2015

Uh oh!

mperham commented Jul 23, 2015

Uh oh!

thedarkone commented Jul 23, 2015

Uh oh!

mperham commented Jul 24, 2015

Uh oh!

mperham commented Jul 24, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants