[IR] Creation-time constant fold for constant expressions#209
Merged
yaoyaoding merged 5 commits intohidet-org:mainfrom May 5, 2023
Merged
[IR] Creation-time constant fold for constant expressions#209yaoyaoding merged 5 commits intohidet-org:mainfrom
yaoyaoding merged 5 commits intohidet-org:mainfrom
Conversation
vadiklyutiy
pushed a commit
that referenced
this pull request
Jul 22, 2024
steal_weight option works as expected after removing all references to torch tensors. `HidetModule` has `torch_params` and `hidet_params` attributes. In modules such as `HidetLinear` and `HidetMultiheadAttention` we don't need to keep original weight tensors in `torch_params` and `hidet_params`. Instead, their transposed copies are stored in those modules. Removing original weight tensors allows us to have some additional free space on GPU. Now we are able to compile LLama2-7b model with 12 Gib of weights on 24Gib RTX 3090 GPU. This is the state of GPU memory after compiling with hidet: - Allocated by hidet: 21179 MiB - Allocated by torch: 314 MiB For a reference, torch.compile with `backend=inductor` and `mode=max-autotune`: - Allocated by hidet: 0 Mib - Allocated by torch: 12925 Mib This script is used to test llama2 model: https://drive.google.com/file/d/1Baz5MrC9wWg9ceirmKZtkMPbv4u3CAuc/view?usp=sharing --------- Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy
pushed a commit
that referenced
this pull request
Jul 23, 2024
steal_weight option works as expected after removing all references to torch tensors. `HidetModule` has `torch_params` and `hidet_params` attributes. In modules such as `HidetLinear` and `HidetMultiheadAttention` we don't need to keep original weight tensors in `torch_params` and `hidet_params`. Instead, their transposed copies are stored in those modules. Removing original weight tensors allows us to have some additional free space on GPU. Now we are able to compile LLama2-7b model with 12 Gib of weights on 24Gib RTX 3090 GPU. This is the state of GPU memory after compiling with hidet: - Allocated by hidet: 21179 MiB - Allocated by torch: 314 MiB For a reference, torch.compile with `backend=inductor` and `mode=max-autotune`: - Allocated by hidet: 0 Mib - Allocated by torch: 12925 Mib This script is used to test llama2 model: https://drive.google.com/file/d/1Baz5MrC9wWg9ceirmKZtkMPbv4u3CAuc/view?usp=sharing --------- Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy
pushed a commit
that referenced
this pull request
Dec 26, 2024
steal_weight option works as expected after removing all references to torch tensors. `HidetModule` has `torch_params` and `hidet_params` attributes. In modules such as `HidetLinear` and `HidetMultiheadAttention` we don't need to keep original weight tensors in `torch_params` and `hidet_params`. Instead, their transposed copies are stored in those modules. Removing original weight tensors allows us to have some additional free space on GPU. Now we are able to compile LLama2-7b model with 12 Gib of weights on 24Gib RTX 3090 GPU. This is the state of GPU memory after compiling with hidet: - Allocated by hidet: 21179 MiB - Allocated by torch: 314 MiB For a reference, torch.compile with `backend=inductor` and `mode=max-autotune`: - Allocated by hidet: 0 Mib - Allocated by torch: 12925 Mib This script is used to test llama2 model: https://drive.google.com/file/d/1Baz5MrC9wWg9ceirmKZtkMPbv4u3CAuc/view?usp=sharing --------- Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, when we do some arhtimatic operations for constant, we will keep them in our IR:
With this PR, we will do the computation when we create the Add expression, and replace it with hidet.ir.Constant:
This PR also clean the
hidet.ir.dialects.pattern.py. We do not many fancy pattern matching any more.