Skip to content

Avoid GC-induced reentrant calls to add_callback()#1875

Closed
pitrou wants to merge 1 commit intotornadoweb:masterfrom
pitrou:add_callback_gc_deadlock
Closed

Avoid GC-induced reentrant calls to add_callback()#1875
pitrou wants to merge 1 commit intotornadoweb:masterfrom
pitrou:add_callback_gc_deadlock

Conversation

@pitrou
Copy link
Contributor

@pitrou pitrou commented Nov 2, 2016

Fixes #1874.

@bdarnell
Copy link
Member

bdarnell commented Nov 3, 2016

My initial reaction is the same as @ajdavis: it doesn't seem sustainable to disable GC around all uses of a lock that might be acquired in a destructor. There's a second use of callback_lock in ioloop.py; don't we need to disable GC around it, too?

The disable_gc context manager is not thread-safe - one thread may finish and re-enable GC just after another thread has disabled it.

How exactly is the destructor getting run during __del__? I understand that when an object's refcount goes to zero, its destructor is run synchronously. I would assume (but haven't verified for python) that when the cycle GC runs, all destructors triggered by this process will run on another thread dedicated to this purpose.

This also looks like a substantial performance penalty on the critical path. The background thread proposed in the other thread may end up needing the same GC-disabling dance, but at least it localizes the impact to applications needing complex destructors instead of affecting everything that uses tornado.

@ajdavis
Copy link
Contributor

ajdavis commented Nov 3, 2016

Agreed. And the background-thread technique, if you're very very careful, is safe without disabling the GC. (We have __del__ methods in PyMongo that defer their cleanup to a periodic background thread, that works without disabling the GC.)

@pitrou
Copy link
Contributor Author

pitrou commented Nov 3, 2016

Hi Ben,

it doesn't seem sustainable to disable GC around all uses of a lock that might be acquired in a destructor.

I agree on the principle. But add_callback has a special status: it is the very method which would allow to schedule work safely on the IO loop from a destructor, regardless of where the latter executes from.

I would assume (but haven't verified for python) that when the cycle GC runs, all destructors triggered by this process will run on another thread dedicated to this purpose.

Unfortunately not. The cycle GC can run basically whenever it wants. More exactly, on CPython, it is whenever the number of the GC allocations goes above a certain threshold (the exact computation is in gcmodule.c). So allocating a simple tuple can trigger a GC run. Needless to say, this can happen in any non trivial piece of code.

The disable_gc context manager is not thread-safe - one thread may finish and re-enable GC just after another thread has disabled it.

You're right, that's a real bug here :-/

@pitrou
Copy link
Contributor Author

pitrou commented Nov 3, 2016

And the background-thread technique, if you're very very careful, is safe without disabling the GC. (We have del methods in PyMongo that defer their cleanup to a periodic background thread, that works without disabling the GC.)

How do you ensure the __del__ methods themselves aren't called from the background thread? Or are you careful enough that it doesn't harm?

@pitrou
Copy link
Contributor Author

pitrou commented Nov 3, 2016

@bdarnell, I have a question: since list.append is atomic, is the lock actually needed? I see self._closing is "protected" by it, but that's a boolean attribute, so setting or getting it is already thread-safe... Worst case, you append a callback spuriously, which doesn't sound very critical.

@pitrou
Copy link
Contributor Author

pitrou commented Nov 3, 2016

By the way, I agree that the fact this PR introduces a race condition in disable_gc() makes it pretty much unacceptable.

@pitrou
Copy link
Contributor Author

pitrou commented Nov 3, 2016

Ok, the lock was introduced in 2ee58f1f.
The way consuming the callbacks is implemented (replace the callback list with another one), I agree a lock is needed. Would you be open to a lockless implementation?

Edit: #875 makes a lockless implementation problematic.

@pitrou
Copy link
Contributor Author

pitrou commented Nov 3, 2016

Ok, I've submitted a lockless implementation in #1876.

@ajdavis
Copy link
Contributor

ajdavis commented Nov 3, 2016

@pitrou you asked me,

How do you ensure the del methods themselves aren't called from the background thread? Or are you careful enough that it doesn't harm?

My technique is a lot like the one you showed in your PR. Mine's a little less sophisticated, it's just:

tasks_copy = []
while True:
    try:
        # Consume from global "tasks" list.
        tasks_copy.append(tasks.pop())
    except IndexError:
        break

for t in tasks_copy:
    do_cleanup_task(t)

# Now sleep one second and repeat

This works for small numbers of tasks. If the cyclic GC is executed on this thread that's ok, more tasks will appear in the list and the "while" loop will run a little longer. If this happens rarely, then the "while" loop will always finish eventually.

Your deque implementation is much more appropriate for Tornado, since it only pops the callbacks that were present in the queue at the beginning of the consumer loop, and defers new callbacks for later.

@pitrou
Copy link
Contributor Author

pitrou commented Nov 3, 2016

Thanks. I'm closing this PR now, as disabling the GC is clearly not a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants