Skip to content

Conversation

@dfm88
Copy link
Contributor

@dfm88 dfm88 commented Jul 1, 2025

Note: Before submitting this pull request, please review our contributing
guidelines
.

Description

(Fixes #8786)

This PR resolves an issue with chord timeout behavior that happens when chord bodies are groups and encounter failures in the header.

TL;DR when the body of a chord is a group, not all the tasks in the group were notified of the error happened in the header, so they remained hanged.

Changes

The issue was in the chord_error_from_stack method called by on_chord_part_return of the various backends (actually the involved ones are the BaseBackend and all those that override this method on_chord_part_return, hence BaseKeyValueStoreBackend, GCSBackendBase and RedisBackend.

Solution

  1. Group Callback Detection: Modified chord_error_from_stack to detect when the callback is a group and dispatch to specialized handling
  2. Specialized Group Error Handling: Added _handle_group_chord_error method that:
    • Properly extracts original exceptions from ChordError chains
    • Revokes all tasks in the group to prevent hanging
    • Calls error callbacks for each task in the group
    • Ensures cleanup even when exceptions occur during error handling
  3. Exception Preservation: Maintains original exception information through __cause__ chaining, in fact without this step, it will return a generic ChordError wrapper and not the original exception raised in the header.

Testing

  • integration and smoke
  • unit
    • covered all new utility functions and all the path of the new backend _handle_group_chord_error method and the on_chord_part_return of redis and gcs backends.

@codecov
Copy link

codecov bot commented Jul 1, 2025

Codecov Report

❌ Patch coverage is 96.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.61%. Comparing base (6d8bfd1) to head (2438bd7).
⚠️ Report is 45 commits behind head on main.

Files with missing lines Patch % Lines
celery/backends/base.py 94.87% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9788      +/-   ##
==========================================
+ Coverage   78.52%   78.61%   +0.09%     
==========================================
  Files         153      153              
  Lines       19130    19172      +42     
  Branches     2533     2539       +6     
==========================================
+ Hits        15021    15073      +52     
+ Misses       3819     3809      -10     
  Partials      290      290              
Flag Coverage Δ
unittests 78.59% <96.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@auvipy auvipy requested review from auvipy and Copilot July 1, 2025 14:51

This comment was marked as outdated.

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pypy is catching a typing error

)
E TypeError: argument after ** must be a mapping, not Mock

celery/backends/base.py:325: TypeError
______ test_BaseBackend_dict.test_chord_error_from_stack_backend_fallback ______

self = <t.unit.backends.test_base.test_BaseBackend_dict object at 0x0000000007cb6d78>

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@auvipy auvipy requested a review from Copilot July 2, 2025 04:41
@auvipy auvipy added this to the 5.5.4 milestone Jul 2, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where chord bodies that are group callbacks could hang when the header fails, ensuring timeouts propagate correctly by preserving original exceptions and notifying all group tasks.

  • Introduces _create_chord_error_with_cause helper to wrap and chain original exceptions.
  • Adds _handle_group_chord_error in BaseBackend to revoke and fail all tasks in a group body.
  • Updates Redis, GCS, and builtins to use the new helper; extends unit and integration tests and removes previously skipped tests.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
t/unit/backends/test_gcs.py Added tests for exception flows in GCSBackend’s on_chord_part_return
t/unit/backends/test_base.py New tests for _create_chord_error_with_cause, _create_fake_task_request, and _handle_group_chord_error
t/smoke/tests/test_canvas.py Removed skip marks so canvas smoke tests now run on group bodies
t/integration/test_canvas.py Removed skip marks for integration canvas tests
celery/backends/redis.py Wrapped chord exceptions using the new helper to preserve __cause__
celery/backends/gcs.py Replaced inline ChordError with helper-based wrapping
celery/backends/base.py Added helper functions and special group error handling in chord_error_from_stack
celery/app/builtins.py Updated unlock chord task to use the new exception helper
CONTRIBUTORS.txt Added a new contributor
Comments suppressed due to low confidence (1)

celery/backends/base.py:318

  • The symbol group is not imported in this module, leading to a NameError. You should import the correct group signature type, for example:
from celery.canvas import group
        if isinstance(callback, group):

@auvipy auvipy merged commit 9cb389d into celery:main Jul 2, 2025
123 of 124 checks passed
@auvipy auvipy modified the milestones: 5.5.4, 5.6.0 Jul 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected Behaviors for a Chain of Groups, when Groups contain failing tasks

2 participants