Skip to content

[async-compile] add progressive compile mode#157305

Closed
bobrenjc93 wants to merge 6 commits intogh/bobrenjc93/504/basefrom
gh/bobrenjc93/504/head
Closed

[async-compile] add progressive compile mode#157305
bobrenjc93 wants to merge 6 commits intogh/bobrenjc93/504/basefrom
gh/bobrenjc93/504/head

Conversation

@bobrenjc93
Copy link
Contributor

@bobrenjc93 bobrenjc93 commented Jun 30, 2025

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157305

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 5392190 with merge base c808af5 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Jul 1, 2025
ghstack-source-id: 03ce1e8
Pull-Request-resolved: #157305
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Jul 1, 2025
ghstack-source-id: b4e227e
Pull-Request-resolved: #157305
@bobrenjc93 bobrenjc93 changed the title [br][pc] attempt 1 [async-compile] add progressive compile mode Jul 1, 2025
@bobrenjc93 bobrenjc93 added the topic: not user facing topic category label Jul 1, 2025
@bobrenjc93 bobrenjc93 marked this pull request as ready for review July 1, 2025 20:36
@bobrenjc93 bobrenjc93 requested a review from aorenste July 1, 2025 20:48
[ghstack-poisoned]
Copy link
Contributor

@aorenste aorenste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future test needed: We need to ensure that if an optimization is running it doesn't block the app from exiting - I think I saw issues with this w/ subprocess and also RE

):
future = self._progression_futures[i]
if future and future.done():
self._switch_to_progression_stage(i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future nit: if we used a deque for _progression_futures then we could pop the finished ones off the front and then we wouldn't even have to check for None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean we are forced to linearize the progressions though right? Especially in a world with RE and action cache it seems like it'd be ideal to "skip" some futures if they are slow.

eg. imagine

compile 1 (expected 10 min, actual 10 min)
compile 2 (expected 20 min, actual 60 min due to queuing)
compile 3 (expected 20 min, actual 20s due to action caching)

We would want compile 3 to kick in before compile 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think there should be no difference between using a deque and setting earlier ones to None (I'm assuming you can scan the deque like a list and not just pop_front)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will factor out in separate pr

if pcd := self._post_compile_data:
# Only clear post_compile_data if this is the final progression stage
if stage_index == len(self._progression_futures) - 1:
self._post_compile_data = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could clear _callback too, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll factor out the refactor into a dataclass as discussed offline in a separate pr

graph_kwargs: _CompileFxKwargs,
) -> None:
if self._optimized_output_code is not None:
self._optimized_output_code.post_compile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. What if there's still a future pending? This should probably be based on pending futures and not fast vs optimized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I fixed this to be easier to reason about

  1. First post compile will always be on fast output code, that will populate _post_compile_data
  2. only after _post_compile_data is populated will we begin the progressions

[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Jul 3, 2025
ghstack-source-id: 3bf6895
Pull-Request-resolved: #157305
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Jul 3, 2025
ghstack-source-id: abbc65b
Pull-Request-resolved: #157305
constants: CompiledFxGraphConstants,
graph_kwargs: _CompileFxKwargs,
) -> None:
assert self._fast_output_code is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me - but with an assert at least it will be obvious if it fails...

@bobrenjc93 bobrenjc93 added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 3, 2025
@bobrenjc93
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 5, 2025
pytorchmergebot pushed a commit that referenced this pull request Jul 5, 2025
followup from #157305 where
@aorenste correctly suggested clearing callback. this refactor
introduces a new dataclass so we don't need to check nullability for
each field

Pull Request resolved: #157619
Approved by: https://github.com/aorenste
ghstack dependencies: #157305, #157614
pytorchmergebot pushed a commit that referenced this pull request Jul 5, 2025
@github-actions github-actions bot deleted the gh/bobrenjc93/504/head branch August 4, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants