Skip to content

Fix race condition during the tracing phase of automatic differentiation (vjp, value_and_grad)#338

Merged
davidkoski merged 2 commits intoml-explore:mainfrom
aleroot:main
Jan 21, 2026
Merged

Fix race condition during the tracing phase of automatic differentiation (vjp, value_and_grad)#338
davidkoski merged 2 commits intoml-explore:mainfrom
aleroot:main

Conversation

@aleroot
Copy link
Contributor

@aleroot aleroot commented Jan 16, 2026

This PR introduces thread safety to the graph construction phase of the function transformations (vjp, jvp, and valueAndGrad).When multiple threads attempt to compute gradients or vector-Jacobian products simultaneously, data races occur within the C++ layer, leading to std::bad_access crashes and heap corruption.

Proposed changes

The lock is only held during the graph tracing/construction phase. It does not block the actual evaluation or computation of the arrays (which remains lazy).

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

…ion (vjp, value_and_grad)

While the execution of the graph is thread-safe and lazy, the construction of that graph is not.
@davidkoski
Copy link
Collaborator

Nice find! I just fixed some others here in #339 -- we should use this same evalLock as it is guarding the rest of the mlx global state. Please switch to using that lock.

@aleroot
Copy link
Contributor Author

aleroot commented Jan 17, 2026

Nice find! I just fixed some others here in #339 -- we should use this same evalLock as it is guarding the rest of the mlx global state. Please switch to using that lock.

Updated. Thanks.

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for finding this and fixing!

@davidkoski davidkoski merged commit 3ed7d22 into ml-explore:main Jan 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants