MinMax based observers: respect device affinity for state_dict#44537
Closed
vkuzo wants to merge 2 commits intogh/vkuzo/144/basefrom
Closed
MinMax based observers: respect device affinity for state_dict#44537vkuzo wants to merge 2 commits intogh/vkuzo/144/basefrom
vkuzo wants to merge 2 commits intogh/vkuzo/144/basefrom
Conversation
Summary: Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
vkuzo
added a commit
that referenced
this pull request
Sep 11, 2020
Summary: Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b3af9e8 Pull Request resolved: #44537
💊 CI failures summary and remediationsAs of commit f1c41b7 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
raghuramank100
approved these changes
Sep 11, 2020
Contributor
raghuramank100
left a comment
There was a problem hiding this comment.
Good catch!. Curious as to what was wrong with the original code though?
Contributor
Author
The original code did the equivalent of the expected behavior is |
…dict" Summary: Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23644493](https://our.internmc.facebook.com/intern/diff/D23644493) [ghstack-poisoned]
vkuzo
added a commit
that referenced
this pull request
Sep 11, 2020
Summary: Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 306ab0a Pull Request resolved: #44537
Contributor
|
This pull request has been merged in 70dfeb4. |
xuzhao9
pushed a commit
that referenced
this pull request
Sep 18, 2020
Summary: Pull Request resolved: #44537 Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23644493 fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e
vkuzo
added a commit
that referenced
this pull request
Jan 21, 2021
Summary: Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
vkuzo
added a commit
that referenced
this pull request
Jan 21, 2021
Summary: Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 737eefd Pull Request resolved: #50868
vkuzo
added a commit
that referenced
this pull request
Jan 25, 2021
…ate_dict" Summary: Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25991570](https://our.internmc.facebook.com/intern/diff/D25991570) [ghstack-poisoned]
vkuzo
added a commit
that referenced
this pull request
Jan 25, 2021
Summary: Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 317bd8b Pull Request resolved: #50868
facebook-github-bot
pushed a commit
that referenced
this pull request
Jan 25, 2021
…50868) Summary: Pull Request resolved: #50868 Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25991570 fbshipit-source-id: 1193a6cd350bddabd625aafa0682e2e101223bb1
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 24, 2026
…ch#44537) Summary: Pull Request resolved: pytorch#44537 Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23644493 fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 24, 2026
…ytorch#50868) Summary: Pull Request resolved: pytorch#50868 Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as pytorch#44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25991570 fbshipit-source-id: 1193a6cd350bddabd625aafa0682e2e101223bb1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
Summary:
Originally, the
min_val,max_val,min_vals,max_valsattributes of observers were Tensors but not buffers. They had custom
state_dict save/load code to ensure their state was saved.
At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
min_val|min_vals|max_val|max_valswould always be loaded to model A's device, even if the rest of model B was on a different deviceIn practice, the case people would sometimes hit is:
This PR fixes the behavior by removing the custom save/load where
possible and letting the default
nn.Modulesave/load code handledevice assignment. We special case
PerChannelMinMaxObserverand itschildren to allow for loading buffers or different size, which is
normal.
There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D23644493