Skip to content

Investigate worker overhead #2156

@mrocklin

Description

@mrocklin

Motivated by a desire for reduced latencies on the workers for Actors (we found that 1ms things were taking 5ms) we added a thread that statistically profiles the event loop. This showed overhead from a couple surprising sources:

  1. psutil and the SystemMonitor
  2. Tornado's write_to_fd which apparently isn't entirely non-blocking, see this stack overflow question
  3. Tornado's add_callback overhead, see this stack overflow question

I'm not sure how best to address these. There are probably a few approaches:

  1. Check that we're using psutil appropriately, and that there isn't some better way to regularly poll system use at high-ish frequency (currently we poll every 500ms)
  2. Quantify the cause of add_callback, and see if there aren't some occasions where we can reduce our use of Tornado
  3. Investigate other concurrency frameworks, like asyncio + uvloop. This sounds neat, but is likely expensive for many reasons. I did try using uvloop + asyncio + tornado but it wasn't very effective. The overhead appears to be higher in this stack so that uvloop doesn't seem to do much good.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions