Skip to content

compiled autograd: support accumulate_grad_ in DTensor sharding#133580

Closed
bdhirsh wants to merge 1 commit intogh/bdhirsh/606/basefrom
gh/bdhirsh/606/head
Closed

compiled autograd: support accumulate_grad_ in DTensor sharding#133580
bdhirsh wants to merge 1 commit intogh/bdhirsh/606/basefrom
gh/bdhirsh/606/head

Conversation

@bdhirsh
Copy link
Collaborator

@bdhirsh bdhirsh commented Aug 15, 2024

I can add a unit test if someone wants, although I'm planning to have this stack culminate in the DTensor + compiled autograd from this issue passing: #127797

accumulate_grad should (I think?) have a trivially-pointwise sharding rule. We also need to tweak DTensor to handle ops with no returns.

Stack from ghstack (oldest at bottom):

  • (to be filled)

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133580

Note: Links to docs will display an error until the docs builds have been completed.

❌ 13 New Failures, 2 Unrelated Failures

As of commit d319c96 with merge base 454713f (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Oct 15, 2024
@albanD albanD removed their request for review October 15, 2024 15:29
@github-actions github-actions bot closed this Nov 14, 2024
@github-actions github-actions bot deleted the gh/bdhirsh/606/head branch December 15, 2024 02:15
Esquains pushed a commit to Esquains/study1 that referenced this pull request Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant