Fix PyTorch tied weights being duplicated in the exported ONNX models#1326
Fix PyTorch tied weights being duplicated in the exported ONNX models#1326fxmarty merged 5 commits intohuggingface:mainfrom
Conversation
|
Awesome! 🚀 |
|
I also tried using optimum-cli onnxruntime quantize --avx512 --onnx_model bloom_onnx --output ./quantized_bloom_onnxbut same results (1.48GiB in size) |
|
@xenova This should be fixed. Could you give a second try? I'm now getting a quantized bloom-560m that is 536 MiB. |
michaelbenayoun
left a comment
There was a problem hiding this comment.
LGTM, thanks a lot @fxmarty !
| if is_torch_available() and isinstance(models_and_onnx_configs[first_key][0], nn.Module): | ||
| if is_accelerate_available(): |
There was a problem hiding this comment.
nit:import torch.nn here?
There was a problem hiding this comment.
I put it rather as the top of the file under if is_torch_available().
| if save_path: | ||
| # Overwrite. | ||
| save_path = str(save_path) | ||
| if save_path.endswith(".onnx") and os.path.isfile(save_path): | ||
| os.remove(save_path) | ||
| save_path = Path(save_path).as_posix() | ||
| onnx.save(model, save_path) |
There was a problem hiding this comment.
So no saving is done if the model is small enough and we dont provide a save_path, only the check?
| elif save_path is not None: | ||
| # path/to/model.onnx | ||
| save_path = Path(save_path).as_posix() | ||
|
|
||
| external_file_name = os.path.basename(save_path) + "_data" | ||
| # path/to/model.onnx_data | ||
| external_path = os.path.join(os.path.dirname(save_path), external_file_name) | ||
| if save_path.endswith(".onnx") and os.path.isfile(save_path): | ||
| os.remove(save_path) | ||
| if os.path.isfile(external_path): | ||
| os.remove(external_path) | ||
| onnx.save( | ||
| model, | ||
| save_path, | ||
| save_as_external_data=True, | ||
| all_tensors_to_one_file=True, | ||
| location=external_file_name, | ||
| ) | ||
| try: | ||
| onnx.checker.check_model(save_path) | ||
| except Exception as e: | ||
| if "No Op registered for" in str(e): | ||
| pass | ||
| else: | ||
| raise e |
There was a problem hiding this comment.
In this case we save the model first, meaning that it is possible that an ONNX file is saved with the check failing afterwards?
It's the opposite of the first case so just checking.
There was a problem hiding this comment.
Yes but that is intended. onnx checker can either take a ModelProto or a path, and in case of a model larger than 2 GB, a path needs to be used.
| def _get_weights_to_tie(tied_params, torch_model: "nn.Module") -> Tuple[List[List[str]]]: | ||
| """ | ||
| Find tied weights in the PyTorch model `model`, and separate them in tied weights | ||
| for which an untying strategy exists and do not exist. |
There was a problem hiding this comment.
nit:
| for which an untying strategy exists and do not exist. | |
| for which an untying strategy exists and does not exist. |
There was a problem hiding this comment.
Also I did not really understand the docstring.
There was a problem hiding this comment.
The implementation can only re-tie Gather and MatMul weights on the ONNX side, while there may be other tied weights on the torch side. We use the pytorch model to split the tied weights groups in one for which we have an available implementation to tie on the ONNX side, and an other for which we don't have (we will detect whether weights are duplicated, but won't fix). I updated the docstring
| for params in tied_params: | ||
| skip_group = False | ||
| for param_name in params: | ||
| module_name = ".".join(param_name.split(".")[:-1]) |
There was a problem hiding this comment.
| module_name = ".".join(param_name.split(".")[:-1]) | |
| module_name = param_name.rsplit(".", maxsplit=1)[0] |
Not sure it is more clear though
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| import torch.nn as nn |
There was a problem hiding this comment.
Should this also be under a is_torch_available(), or can we assume it's available here? I see this in other places, so it's most likely not necessary - just wanted to check.
There was a problem hiding this comment.
We can probably assume that a dev has all the dependencies installed so probably not needed.
| # If not found (e.g. "lm_head.weight"), we greedily search for all initializers from potentially matching node names (e.g. "lm_head"). | ||
| # This greedy approach may found more initializers than wanted. | ||
| if not identical_initializer: | ||
| module_name = ".".join(param_name.split(".")[:-1]) |
There was a problem hiding this comment.
Co-authored-by: Joshua Lochner <admin@xenova.com>
This is the case for example when the embedding weight and language modeling head share the same weight.
Although PyTorch has a pass to deduplicate identical initializers (https://github.com/pytorch/pytorch/blob/main/torch/csrc/jit/passes/onnx/deduplicate_initializers.cpp), one issue arises when torch.onnx.export does constant folding, that may e.g. transpose a weight (in the nn.Linear case), that later messes up with the initializer deduplication. See details: pytorch/pytorch#108342
For small models, the duplication can result in ONNX weights >30% larger vs pytorch weight. The issue is less severe for large models. Given that doing constant folding is generally good, just passing
do_constant_folding=Falseis not really a satisfying solution hence this PR.e.g. bloom-560m goes from 3.0 GiB to 2.1 GiB.
Fixes pytorch/pytorch#108342 as a post-processing step for nn.Embedding & nn.Linear.