Skip to content

Conversation

@Dkhodos
Copy link
Contributor

@Dkhodos Dkhodos commented Aug 14, 2025

Note: Before submitting this pull request, please review our contributing guidelines.

Description

Introduce an option to completely disable broker prefetching without requiring acks_late. When enabled, the worker only fetches a new task when an execution slot is available. This prevents head‑of‑line blocking and starvation caused by long‑running tasks.

  • Real‑world need: we run long‑lasting tasks (hours) across multiple workers/pods. Prefetch caused busy workers to hold tasks in reserve while idle workers sat empty, leading to severe starvation. Disabling prefetch resolves this by fetching only when a slot is free.
  • This work revives and supersedes the stalled discussion and implementation in PR #9818. All reviewer suggestions from that PR have been applied here.

What’s included

  • New config: worker_disable_prefetch (default: False)
  • New CLI flag: --disable-prefetch
  • Core logic: when enabled, wrap channel.qos.can_consume to check reserved_requests against effective concurrency:
    • use autoscaler max_concurrency if present, otherwise pool size
  • Documentation updates:
    • user guide (optimizing, configuration) and FAQ explain usage and impact
  • Tests:
    • unit tests for consumer/autoscale behavior (including autoscaling edge cases)
    • pytest‑click based CLI test covering --disable-prefetch
  • Minor nit applied: compute limit via getattr(c.controller, "max_concurrency", None) or c.pool.num_processes

Backward compatibility

  • Off by default; existing behavior unchanged.
  • When enabled, only reservation timing changes (fetch on free slot); no protocol or API changes.

Implementation notes

  • This is a pragmatic Celery‑side QoS guard. Longer‑term, the ideal place for this behavior would be in Kombu’s QoS, with Celery surfacing the option.

Testing

  • Unit suites for consumer/autoscale/CLI pass locally.
  • Docs build and configuration references updated.

References

  • Prior discussion/implementation: PR #9818

@codecov
Copy link

codecov bot commented Aug 14, 2025

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 78.70%. Comparing base (6804ea8) to head (d8c7d2b).
⚠️ Report is 65 commits behind head on main.

Files with missing lines Patch % Lines
celery/bin/worker.py 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9863      +/-   ##
==========================================
+ Coverage   78.69%   78.70%   +0.01%     
==========================================
  Files         153      153              
  Lines       19243    19257      +14     
  Branches     2557     2561       +4     
==========================================
+ Hits        15143    15156      +13     
  Misses       3803     3803              
- Partials      297      298       +1     
Flag Coverage Δ
unittests 78.68% <92.85%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add dedicated tests for disable_prefetch flag handling in worker.py to improve test coverage. This addresses the coverage issue identified by Codecov in PR #9863.
@Dkhodos
Copy link
Contributor Author

Dkhodos commented Aug 16, 2025

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review. ✅ Project coverage is 78.64%. Comparing base (33eb148) to head (48b254b).

Files with missing lines Patch % Lines
celery/bin/worker.py 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

should be covered well now

Add comprehensive tests for the worker's disable_prefetch flag handling to improve test coverage. Refactored tests to be more concise, focused, and pass all lint checks. This addresses the coverage issue identified by Codecov in PR #9863.
Copy link
Member

@Nusnus Nusnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note the integration tests are currently broken in main, preventing merges of non-trivial changes.

A fix is in progress 🙏

@auvipy auvipy added this to the 5.7.0 milestone Aug 21, 2025
@bakwc
Copy link

bakwc commented Aug 25, 2025

Thanks a lot for working on this! Any chances to get it merged in the nearest future?

@Dkhodos
Copy link
Contributor Author

Dkhodos commented Aug 25, 2025

Thanks a lot for working on this! Any chances to get it merged in the nearest future?

@Nusnus is CI fixed?

@Nusnus
Copy link
Member

Nusnus commented Aug 25, 2025

Thanks a lot for working on this! Any chances to get it merged in the nearest future?

@Nusnus is CI fixed?

Hopefully by (my) EOD today.
I'm going to try a different approach, let's see if it works 🙏

@auvipy auvipy modified the milestones: 5.7.0, 5.6.0 Aug 26, 2025
@auvipy auvipy self-requested a review August 28, 2025 13:12
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do we have relevant issues open in kombu for the following note:

This is a pragmatic Celery‑side QoS guard. Longer‑term, the ideal place for this behavior would be in Kombu’s QoS, with Celery surfacing the option.

@nasis
Copy link

nasis commented Aug 31, 2025

Needed
Looking for it to be merged.

@auvipy auvipy merged commit 246bca1 into celery:main Aug 31, 2025
105 of 108 checks passed
@Dkhodos
Copy link
Contributor Author

Dkhodos commented Aug 31, 2025

also, do we have relevant issues open in kombu for the following note:

This is a pragmatic Celery‑side QoS guard. Longer‑term, the ideal place for this behavior would be in Kombu’s QoS, with Celery surfacing the option.

Added
celery/kombu#2353

Might even have a PR for this later today :)

Side question:
Wen can we expect a package release with this fix? what number would it be 4.6.?

@auvipy
Copy link
Member

auvipy commented Aug 31, 2025

v5.6

@auvipy
Copy link
Member

auvipy commented Sep 28, 2025

we got a regression report of this. can you please check it? we might need to revert it for a while if needed #9913

auvipy added a commit that referenced this pull request Sep 28, 2025
@bakwc
Copy link

bakwc commented Sep 28, 2025

May be you could leave it at least for redis?

@auvipy
Copy link
Member

auvipy commented Sep 29, 2025

yes, that is possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants