Limit grad recursion depth by not recursing through non-grad inputs by awni · Pull Request #1764 · ml-explore/mlx

awni · 2025-01-10T20:35:47Z

Previously we enclose non-gradient inputs in a lambda which then get recursed through when we build the VJP graph.

This change to allow an internal VJP to take a argnums list so that the VJP can have access to all the inputs including the ones for which no gradient is requested.

Also fixes a bug in python grad where we return the wrong result when given multiple argnums of the same value.

An example case which would previously recurse to 10k+ and now just recurses a couple of times:

fun = lambda x, y: x * y
x = mx.array(2.0)
for _ in range(10000):
    x = mx.abs(x)
y = mx.array(3.0)
dfdx = mx.grad(fun)(x, y)

awni · 2025-01-10T20:51:47Z

@davidkoski this limits the stack-overflow issue with VJP that we were discussing offline. It does require a change in how VJP is called from Python -> C++. I'm not sure how it's done in Swift (if you enclosed the non-grad inputs or not). But it may also require a change there to pass in the non-grad inputs to value_and_grad rather than enclosing them.

angeloskath

A bit of mind-bender but it makes perfect sense afterwards. Nice!

angeloskath · 2025-01-14T19:16:12Z

python/src/trees.cpp

+        l[i] = recurse(l[i]);
      }
-      return nb::cast<nb::object>(subtree);
+      return nb::cast<nb::object>(nb::tuple(l));


Good catch.

awni added 2 commits January 10, 2025 12:32

limit grad recursion depth

bf27c03

add grad of module test

d6223af

awni requested review from angeloskath and barronalex January 10, 2025 20:51

davidkoski mentioned this pull request Jan 10, 2025

Adopt change from mlx - vjp ml-explore/mlx-swift#181

Open

angeloskath approved these changes Jan 14, 2025

View reviewed changes

awni merged commit 33421c1 into main Jan 14, 2025

awni deleted the grad_recursion_depth branch January 14, 2025 22:33

drckf mentioned this pull request Feb 10, 2025

[BUG] Models using nn.Sequential may fail to train following PR #1764 #1853

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit grad recursion depth by not recursing through non-grad inputs#1764

Limit grad recursion depth by not recursing through non-grad inputs#1764
awni merged 2 commits intomainfrom
grad_recursion_depth

awni commented Jan 10, 2025 •

edited

Loading

Uh oh!

awni commented Jan 10, 2025

Uh oh!

angeloskath left a comment

Uh oh!

angeloskath Jan 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

awni commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awni commented Jan 10, 2025

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

angeloskath Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awni commented Jan 10, 2025 •

edited

Loading