[async-compile] add progressive compile mode#157305
[async-compile] add progressive compile mode#157305bobrenjc93 wants to merge 6 commits intogh/bobrenjc93/504/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157305
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 5392190 with merge base c808af5 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
aorenste
left a comment
There was a problem hiding this comment.
Future test needed: We need to ensure that if an optimization is running it doesn't block the app from exiting - I think I saw issues with this w/ subprocess and also RE
| ): | ||
| future = self._progression_futures[i] | ||
| if future and future.done(): | ||
| self._switch_to_progression_stage(i) |
There was a problem hiding this comment.
future nit: if we used a deque for _progression_futures then we could pop the finished ones off the front and then we wouldn't even have to check for None.
There was a problem hiding this comment.
That would mean we are forced to linearize the progressions though right? Especially in a world with RE and action cache it seems like it'd be ideal to "skip" some futures if they are slow.
eg. imagine
compile 1 (expected 10 min, actual 10 min)
compile 2 (expected 20 min, actual 60 min due to queuing)
compile 3 (expected 20 min, actual 20s due to action caching)
We would want compile 3 to kick in before compile 2.
There was a problem hiding this comment.
I would think there should be no difference between using a deque and setting earlier ones to None (I'm assuming you can scan the deque like a list and not just pop_front)
There was a problem hiding this comment.
will factor out in separate pr
| if pcd := self._post_compile_data: | ||
| # Only clear post_compile_data if this is the final progression stage | ||
| if stage_index == len(self._progression_futures) - 1: | ||
| self._post_compile_data = None |
There was a problem hiding this comment.
nit: could clear _callback too, I think
There was a problem hiding this comment.
i'll factor out the refactor into a dataclass as discussed offline in a separate pr
torch/_inductor/compile_fx_async.py
Outdated
| graph_kwargs: _CompileFxKwargs, | ||
| ) -> None: | ||
| if self._optimized_output_code is not None: | ||
| self._optimized_output_code.post_compile( |
There was a problem hiding this comment.
Hm. What if there's still a future pending? This should probably be based on pending futures and not fast vs optimized.
There was a problem hiding this comment.
Actually I fixed this to be easier to reason about
- First post compile will always be on fast output code, that will populate
_post_compile_data - only after _post_compile_data is populated will we begin the progressions
| constants: CompiledFxGraphConstants, | ||
| graph_kwargs: _CompileFxKwargs, | ||
| ) -> None: | ||
| assert self._fast_output_code is not None |
There was a problem hiding this comment.
This worries me - but with an assert at least it will be obvious if it fails...
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Pull Request resolved: #157614 Approved by: https://github.com/aorenste ghstack dependencies: #157305
Pull Request resolved: #157650 Approved by: https://github.com/aorenste ghstack dependencies: #157305, #157614, #157619
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov