Skip to content

Fix #2069: Fix race condition between accept() handler and TCPServer.stop()#2075

Merged
bdarnell merged 3 commits intotornadoweb:masterfrom
pitrou:accept_ebadf_race
Jul 4, 2017
Merged

Fix #2069: Fix race condition between accept() handler and TCPServer.stop()#2075
bdarnell merged 3 commits intotornadoweb:masterfrom
pitrou:accept_ebadf_race

Conversation

@pitrou
Copy link
Copy Markdown
Contributor

@pitrou pitrou commented Jun 6, 2017

No description provided.

Copy link
Copy Markdown
Member

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IOLoop isn't supposed to call handlers that have been removed, so you shouldn't be reaching this point. The default IOLoop removes any pending callbacks from self._events, and while I don't think AsyncIOLoop handles this as gracefully it doesn't look like it could call into a removed callback.

That means that the sequence of events is not "queue accept handler, stop server, call accept handler", it's "call accept handler, which synchronously stops the server while add_accept_handler's callback is still on the stack". (this happens sporadically depending on whether the packets arrive fast enough for all the recv calls to be satisfied on the fast synchronous path)

This could have a simpler solution: have add_accept_handler schedule the callbacks with IOLoop.current().add_callback instead of running them directly. This would ensure that you can't close the listening socket while the accept handler is running. However, it would be a slight behavior change in multi-process mode that could result in worse load balancing (since one process will be able to pull multiple connections off the backlog faster. This is less of a concern with SO_REUSEPORT).

A callable is returned (``None`` was returned before).
"""
io_loop = IOLoop.current()
removed = []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred idiom for this is removed = [False] (and then use if removed[0] and removed[0] = True.

@pitrou
Copy link
Copy Markdown
Contributor Author

pitrou commented Jun 12, 2017

I'm not sure why the Python 2.7 builds seem to have get stuck on Travis-CI. Perhaps a Heisenbug?

@bdarnell
Copy link
Copy Markdown
Member

It doesn't look like a heisenbug - both 2.7 and 2.7.8 runs stopped at the same point in the TwistedIOLoop configuration. But I just re-ran those tests and they passed. Maybe something weird happened with the Twisted 17.5.0 release which was around this time.

Any response to my comments above? I can't decide whether it's better to change the interface to this function by adding the returned callable, or keep the interface the same but potentially change the performance.

@pitrou
Copy link
Copy Markdown
Contributor Author

pitrou commented Jun 20, 2017

The interface change doesn't break compatibility, as people can still use the old idiom of calling remove_handler by hand. They will just be subject to the race condition that is fixed by using the returned callback.

The suggestion of using IOLoop.add_callback to schedule accept handlers sounds like it would fix only one possible cause of race condition. What if it's a read handler for an established connection that closes the listening socket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants