Make SAC save_list configurable#1675
Conversation
This PR moves save_list out from activation_checkpoint.py. save_list is now an argument of apply_ac(). The behaviors of all the models shouldn't change as this PR doesn't change what ops are going to be saved.
wwwjn
left a comment
There was a problem hiding this comment.
LGTM! I think it's reasonable for each model to specify which ops to be used in AC. I have a follow up question: Does it make sense to have a default set for SAC, as most of the _save_list now are the same and some models share the same patten of commutation? If user specify a _save_list, we could override the default list
|
@wwwjn That's a good question. I was thinking the same question. I decided not to have a default list because I want this to be explicit -- each model author should be responsible to fine tune the save_list or the performance will not be good. Without the default value, the performance should be bad. The is the explicit part. But I don't have a strong preference here. If we believe there are some ops that we will never want to recompute regardless the model type, then having a default one seems to be reasonable. |
I don't have a strong opinion as well |
| model_compile_enabled: bool, | ||
| use_flex_attn: bool, | ||
| ): | ||
| save_list: set[torch._ops.OpOverload] | None = None, |
There was a problem hiding this comment.
Maybe I should comment this on the previous PR --
I'd recommend we make these into non positional args (after *), similar to _apply_ac_to_transformer_block. Otherwise it'll break BC without good reason.
There was a problem hiding this comment.
lol, I was thinking the same thing when I did this factor. Let me add this.
I haven't looked the pipeline parallelism part. But I'm going to review @H-Huang PRs and future PRs, will address this question at that time. |
Yes, this sounds good to me, can do it in a follow up PR |
| ac_config: ACConfig, | ||
| *, | ||
| base_fqn: str | None = None, | ||
| save_list: set[torch._ops.OpOverload] | None = None, |
There was a problem hiding this comment.
Can save_list default value just be set()?
There was a problem hiding this comment.
Using a mutable data structure as the default argument can cause unexpected bugs. So I would still use None.
This PR refactors the `apply_ac` implementation by moving `save_list` out of `activation_checkpoint.py` and making it an argument of `apply_ac()`. This change improves configurability without altering the behavior of any models, as the set of operations saved remains unchanged after this PR.
This PR refactors the `apply_ac` implementation by moving `save_list` out of `activation_checkpoint.py` and making it an argument of `apply_ac()`. This change improves configurability without altering the behavior of any models, as the set of operations saved remains unchanged after this PR.
This PR refactors the
apply_acimplementation by movingsave_listout ofactivation_checkpoint.pyand making it an argument ofapply_ac(). This change improves configurability without altering the behavior of any models, as the set of operations saved remains unchanged after this PR.