-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
fix: (#8786) time out when chord header fails with group body #9788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: (#8786) time out when chord header fails with group body #9788
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9788 +/- ##
==========================================
+ Coverage 78.52% 78.61% +0.09%
==========================================
Files 153 153
Lines 19130 19172 +42
Branches 2533 2539 +6
==========================================
+ Hits 15021 15073 +52
+ Misses 3819 3809 -10
Partials 290 290
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
auvipy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pypy is catching a typing error
)
E TypeError: argument after ** must be a mapping, not Mock
celery/backends/base.py:325: TypeError
______ test_BaseBackend_dict.test_chord_error_from_stack_backend_fallback ______
self = <t.unit.backends.test_base.test_BaseBackend_dict object at 0x0000000007cb6d78>
auvipy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue where chord bodies that are group callbacks could hang when the header fails, ensuring timeouts propagate correctly by preserving original exceptions and notifying all group tasks.
- Introduces
_create_chord_error_with_causehelper to wrap and chain original exceptions. - Adds
_handle_group_chord_errorinBaseBackendto revoke and fail all tasks in a group body. - Updates Redis, GCS, and builtins to use the new helper; extends unit and integration tests and removes previously skipped tests.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| t/unit/backends/test_gcs.py | Added tests for exception flows in GCSBackend’s on_chord_part_return |
| t/unit/backends/test_base.py | New tests for _create_chord_error_with_cause, _create_fake_task_request, and _handle_group_chord_error |
| t/smoke/tests/test_canvas.py | Removed skip marks so canvas smoke tests now run on group bodies |
| t/integration/test_canvas.py | Removed skip marks for integration canvas tests |
| celery/backends/redis.py | Wrapped chord exceptions using the new helper to preserve __cause__ |
| celery/backends/gcs.py | Replaced inline ChordError with helper-based wrapping |
| celery/backends/base.py | Added helper functions and special group error handling in chord_error_from_stack |
| celery/app/builtins.py | Updated unlock chord task to use the new exception helper |
| CONTRIBUTORS.txt | Added a new contributor |
Comments suppressed due to low confidence (1)
celery/backends/base.py:318
- The symbol
groupis not imported in this module, leading to a NameError. You should import the correct group signature type, for example:
from celery.canvas import group if isinstance(callback, group):
Note: Before submitting this pull request, please review our contributing
guidelines.
Description
(Fixes #8786)
This PR resolves an issue with chord timeout behavior that happens when chord bodies are groups and encounter failures in the header.
TL;DR when the body of a chord is a group, not all the tasks in the group were notified of the error happened in the header, so they remained hanged.
Changes
The issue was in the
chord_error_from_stackmethod called byon_chord_part_returnof the various backends (actually the involved ones are theBaseBackendand all those that override this methodon_chord_part_return, henceBaseKeyValueStoreBackend,GCSBackendBaseandRedisBackend.Solution
chord_error_from_stackto detect when the callback is a group and dispatch to specialized handling_handle_group_chord_errormethod that:__cause__chaining, in fact without this step, it will return a genericChordErrorwrapper and not the original exception raised in the header.Testing
**.test_chord_error_propagation_with_different_body_typestemporary skipped for timeout in fix: (#9773) task_id must not be empty with chain as body of a chord #9774._handle_group_chord_errormethod and theon_chord_part_returnof redis and gcs backends.