Skip to content

fix: race condition with error coalescing#5591

Merged
ruben-arts merged 3 commits intoprefix-dev:mainfrom
baszalmstra:feature/pix-1593-race-condition-with-error-coalescing-when-multiple-pixi
Mar 3, 2026
Merged

fix: race condition with error coalescing#5591
ruben-arts merged 3 commits intoprefix-dev:mainfrom
baszalmstra:feature/pix-1593-race-condition-with-error-coalescing-when-multiple-pixi

Conversation

@baszalmstra
Copy link
Contributor

Description

This PR should once and for all fix the cancellation issues that we have seen. The PR does two things:

  • If a cancellation occurred, it means some other "sibling" task failed. This is now properly handled when solving environments for the lock-file. Cancellations are ignored.
  • The SlotMap issue. This could happen because a task was cancelled, the task's metadata was removed, but the actual running async task was still running. The rogue task could then spawn another task which would crash. Now the code checks whether the parent task should have actually been cancelled, and if so, issues a message that the subtask is also cancelled.

Fixes #5588

How Has This Been Tested?

I have extensively tested this against the reproducer in #5588 where I can no longer reproduce the issue.

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.

Tools: Claude Code

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas

remove_cancellation_token now explicitly cancels the token before removing
it, ensuring child tokens are also cancelled.
Add is_parent_cancelled helper method and early returns at the top of all
task handlers. When a parent context has been cancelled or cleaned up,
child tasks are now dropped immediately, preventing reporter_context from
being called with stale parent contexts (the SlotMap panic).
Add the missing self.parent_contexts.remove(&context) call in
on_source_metadata_result, matching every other result handler.

fix: surface real solve errors instead of cancellation errors

Filter cancellation errors from conda solve results so that when one
platform solve fails, the real error surfaces instead of 'the operation
was cancelled' from sibling solves that were cancelled as a consequence.
@baszalmstra baszalmstra requested a review from ruben-arts March 3, 2026 11:07
@GuillaumeQuenneville
Copy link

GuillaumeQuenneville commented Mar 3, 2026

Thanks for the fix!

I don't know enough about pixi or tokio (or rust for that matter) to know whether this is a case you need to handle. From what I understand, this change doesn't recurse up to the root when checking cancellations which could cause an issue with nested children.

e.g.

Parent (Cancelled)
   |_ child
        |_child_child

If child_child runs first it wouldn't detect the cancellation.

@baszalmstra
Copy link
Contributor Author

The system we use propagates cancellation up and down the tree, so that should function correctly. I'll think if I can come up with a test that shows this.

@GuillaumeQuenneville
Copy link

Great! Thank you

Copy link
Contributor

@ruben-arts ruben-arts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you that works!

@ruben-arts ruben-arts merged commit 4482642 into prefix-dev:main Mar 3, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition with error coalescing when multiple pixi packages in monorepo

3 participants