feat(taskworker) Add concurrent worker by markstory · Pull Request #83254 · getsentry/sentry

markstory · 2025-01-10T19:57:17Z

Move the taskworker process to be a multiprocess concurrent worker. This will help enable higher CPU usage in worker pods, as we can pack more concurrent CPU operations into each pod (at the cost of memory).

The main process is responsible for:

Spawning children
Making RPC requests to fill child queues and submit results.

Each child process handles:

Resolving task names
Checking at_most_once keys
Enforcing processing deadlines
Executing task functions

Instead of using more child processes to enforce timeouts, I've used SIGALRM. I've verified that tasks like

@exampletasks.register(name="examples.infinite", retry=Retry(times=2))
def infinite_task() -> None:
    try:
        while True:
            pass
    except Exception as e:
        print("haha caught exception", e)

Do not paralyze workers with infinite loops.

When a worker is terminated, it uses an Event to have children exit, and then drains any results. If there are tasks in the _child_tasks queue will not be completed, and instead will sent to another worker when the processing_deadline on the activations expires.

Still really rough and not working.

codecov · 2025-01-10T20:35:02Z

Codecov Report

Attention: Patch coverage is 87.78626% with 32 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/sentry/taskworker/worker.py	80.25%	31 Missing ⚠️
tests/sentry/taskworker/test_worker.py	99.04%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #83254      +/-   ##
==========================================
+ Coverage   87.49%   87.56%   +0.06%     
==========================================
  Files        9404     9393      -11     
  Lines      537180   536973     -207     
  Branches    21133    21048      -85     
==========================================
+ Hits       470024   470203     +179     
+ Misses      66798    66414     -384     
+ Partials      358      356       -2

src/sentry/taskworker/worker.py

evanh · 2025-01-14T15:50:08Z

src/sentry/taskworker/worker.py

-        task = self._get_known_task(activation)
-        if not task:
+        try:
+            activation = child_tasks.get_nowait()


I don't think get_nowait is necessarily correct here. I understand this is ensuring that the process doesn't block while waiting for a task before checking for the shutdown, but I think some kind of timeout/delay would good to here to avoid spiking the CPU. Maybe like 100ms or something like that?

Good point about the potential CPU burn on an empty queue. I'll put a blocking get with a timeout in.

src/sentry/taskworker/worker.py

Co-authored-by: Evan Hicks <evanh@users.noreply.github.com>

Wait on the empty queue to reduce CPU burn.

sentry · 2025-01-17T19:39:16Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ AssertionError: expected call not found. pytest.runtest.protocol tests/sentry/taskworker... View Issue

_{Did you find this useful? React with a 👍 or 👎}

Move the taskworker process to be a multiprocess concurrent worker. This will help enable higher CPU usage in worker pods, as we can pack more concurrent CPU operations into each pod (at the cost of memory). The main process is responsible for: - Spawning children - Making RPC requests to fill child queues and submit results. Each child process handles: - Resolving task names - Checking at_most_once keys - Enforcing processing deadlines - Executing task functions Instead of using more child processes to enforce timeouts, I've used SIGALRM. I've verified that tasks like ```python @exampletasks.register(name="examples.infinite", retry=Retry(times=2)) def infinite_task() -> None: try: while True: pass except Exception as e: print("haha caught exception", e) ``` Do not paralyze workers with infinite loops. When a worker is terminated, it uses an `Event` to have children exit, and then drains any results. If there are tasks in the `_child_tasks` queue will not be completed, and instead will sent to another worker when the `processing_deadline` on the activations expires. --------- Co-authored-by: Evan Hicks <evanh@users.noreply.github.com>

markstory added 6 commits January 10, 2025 10:30

Start roughing in a multiprocess worker.

447caf2

Still really rough and not working.

Roughly implemented but not working yet.

a104160

Fix mistakes

01c3cbd

Fix more mistakes

13d169a

Fix mypy

4ec94bf

Fix some problems and update tests.

f467b08

markstory requested a review from a team January 10, 2025 19:57

markstory linked an issue Jan 10, 2025 that may be closed by this pull request

Make taskworker worker support concurrent work #80369

Closed

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 10, 2025

More coverage for fetching next task

b6b8352

vercel bot deployed to Preview January 11, 2025 05:21 View deployment

evanh reviewed Jan 14, 2025

View reviewed changes

Update src/sentry/taskworker/worker.py

c284d50

Co-authored-by: Evan Hicks <evanh@users.noreply.github.com>

vercel bot deployed to Preview January 14, 2025 16:12 View deployment

markstory added 2 commits January 14, 2025 11:23

Merge branch 'master' into feat-taskworker-concurrent

40c121e

Add get() with timeout.

9cd9cb7

Wait on the empty queue to reduce CPU burn.

vercel bot deployed to Preview January 14, 2025 17:02 View deployment

Fix child_task queue overflows by conditionally fetching next task

bd5654c

vercel bot deployed to Preview January 14, 2025 22:14 View deployment

evanh approved these changes Jan 16, 2025

View reviewed changes

markstory merged commit 2a26aed into master Jan 17, 2025
48 checks passed

markstory deleted the feat-taskworker-concurrent branch January 17, 2025 14:28

github-actions bot locked and limited conversation to collaborators Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(taskworker) Add concurrent worker#83254

feat(taskworker) Add concurrent worker#83254
markstory merged 11 commits intomasterfrom
feat-taskworker-concurrent

markstory commented Jan 10, 2025

Uh oh!

codecov bot commented Jan 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

evanh Jan 14, 2025

Uh oh!

markstory Jan 14, 2025

Uh oh!

Uh oh!

Uh oh!

sentry bot commented Jan 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

markstory commented Jan 10, 2025

Uh oh!

codecov bot commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

evanh Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

markstory Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sentry bot commented Jan 17, 2025

Suspect Issues

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 10, 2025 •

edited

Loading