Skip to content

Adds dynamic versioning pattern#40279

Closed
mruberry wants to merge 12 commits intomasterfrom
dynamic_file_format
Closed

Adds dynamic versioning pattern#40279
mruberry wants to merge 12 commits intomasterfrom
dynamic_file_format

Conversation

@mruberry
Copy link
Copy Markdown
Collaborator

@mruberry mruberry commented Jun 19, 2020

BC NOTE:

This change makes it so modules saved with torch.jit.save in PyTorch 1.6 can be loaded by previous versions of PyTorch unless they use torch.div or (soon) torch.full. It also lets tensors saved using torch.save be loaded by previous versions. So this is the opposite of BC-breaking, but I'm using that label to highlight this issue since we don't have a "BC-improving" label.

PR NOTE:
When an operator's semantics change in PyTorch we want to do two things:

  1. Preserve the semantics of older serialized Torchscript programs that use the operator
  2. Ensure the new semantics are respected

Historically, this meant writing a Versioned Symbol that would remap older versions of the operator into current PyTorch code (1), and bumping the produced file format version (2). Unfortunately, bumping the produced file format version is a nuclear option for ensuring semantics are respected, since it also prevents older versions of PyTorch from loading anything (even tensors!) from newer versions.

Dynamic versioning addresses the nuclear consequences of bumping the produced file format version by only bumping it when necessary. That is, when an operator with changed semantics is detected in the serialized Torchscript. This will prevent Torchscript programs that use the changed operator from loading on earlier versions of PyTorch, as desired, but will have no impact on programs that don't use the changed operator.

Note that this change is only applicable when using torch.jit.save and torch.jit.load. torch.save pickles the given object using pickle (by default), which saves a function's Python directly.

No new tests for this behavior are added since the existing tests for versioned division in test_save_load already validate that models with div are loaded correctly at version 4.

@mruberry mruberry requested a review from apaszke as a code owner June 19, 2020 10:58
@mruberry mruberry removed the request for review from apaszke June 19, 2020 10:59
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jun 19, 2020
@mruberry mruberry requested a review from zdevito June 19, 2020 11:12
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Jun 19, 2020

💊 CI failures summary and remediations

As of commit a1b15c6 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: False is not true
  test_mem_leak (__main__.TestProfiler_cuda) 
Checks that there's no memory leak when using profiler with CUDA ... FAIL (6.831s) 
 
====================================================================== 
FAIL [6.831s]: test_mem_leak (__main__.TestProfiler_cuda) 
Checks that there's no memory leak when using profiler with CUDA 
---------------------------------------------------------------------- 
Traceback (most recent call last): 
  File "test_profiler.py", line 42, in test_mem_leak 
    self.assertTrue(max_diff < 100 * 1024) 
AssertionError: False is not true 
 
---------------------------------------------------------------------- 
Ran 1 test in 6.831s 
 
FAILED (failures=1) 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestProfiler_cuda-20200624092838.xml 
Traceback (most recent call last): 
  File "run_test.py", line 727, in <module> 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 27 times.

@lutzroeder
Copy link
Copy Markdown
Contributor

lutzroeder commented Jun 20, 2020

See #31877.

Will this add more complexity to this discussion? How is a user or tool going to be able to tell from an automatically allocated version number which version of PyTorch generated the file or needs to be installed to reproduce the behavior?

@mruberry
Copy link
Copy Markdown
Collaborator Author

See #31877.

Will this add more complexity to this discussion? How is a user or tool going to be able to tell from an automatically allocated version number which version of PyTorch generated the file or needs to be installed to reproduce the behavior?

You can't tell which version of PyTorch generated a particular version number, since it's not a unique mapping (even today) from PyTorch version -> produced format number.

You can tell which versions support it by looking at the supported format number, just like today.

@lutzroeder
Copy link
Copy Markdown
Contributor

lutzroeder commented Jun 21, 2020

You can tell which versions support it by looking at the supported format number, just like today.

Scenario: TorchScript .pt file in Zip format.

How can a skilled user or a tool tell which min version of PyTorch is needed to load this file?

@mruberry
Copy link
Copy Markdown
Collaborator Author

You can tell which versions support it by looking at the supported format number, just like today.

Scenario: TorchScript .pt file in Zip format.

How can a skilled user or a tool tell which min version of PyTorch is needed to load this file?

I don't believe this is a supported scenario today. But I also don't think that's a very interesting scenario. Why would someone upgrade to the minimum version of PyTorch that supports the .pt file in question as opposed to just upgrading to the latest PyTorch?

@lutzroeder
Copy link
Copy Markdown
Contributor

lutzroeder commented Jun 21, 2020

The general issue is how to consistently represent the format information to the user. There are tools (trying) to support PyTorch. Users report bugs and the complexities of the different PyTorch formats are hard to follow. For example, a TorchScript file saved with v4 could be called "TorchScript v4". What about a Zip container that was saved using torch.save? Is it called "PyTorch v4"? PyTorch v4 has not shipped yet. The discussion in #31877 is less about any specific scenario and more about having consistent naming and versioning rules across all PyTorch formats.

@mruberry
Copy link
Copy Markdown
Collaborator Author

The general issue is how to consistently represent the format information to the user. There are tools (trying) to support PyTorch. Users report bugs and the complexities of the different PyTorch formats are hard to follow. For example, a TorchScript file saved with v4 could be called "TorchScript v4". What about a Zip container that was saved using torch.save? Is it called PyTorch v4? PyTorch v4 has not shipped yet? The discussion in #31877 is less about any specific scenario and more about having a consistent naming scheme for all different PyTorch formats.

That's not an issue this PR is trying to address. Although it does add some complications to the discussion. You could say, "this is a file that requires the consumer to support version X," which is effectively what our error messages say if you try to read these files on versions of PyTorch that are too old to understand them.

Copy link
Copy Markdown
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One small nit about where the version logic is handled.

Comment thread caffe2/serialize/inline_container.h Outdated
explicit PyTorchStreamWriter(
const std::function<size_t(const void*, size_t)>& writer_func);
std::string archive_name,
const bool _write_version_at_setup=true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of a flag here. Makes the behavior complicated. I feel like the version should just get written on finalization (seems like we already have a finalized_ flag). And inline container should handle the maxing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I wasn't a fan of it either but was leery of changing inline_container too much. Moving this to finalization should be nicer.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry mruberry added module: bc-breaking Related to a BC-breaking change module: serialization Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects labels Jun 22, 2020
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Mike Ruberry added 2 commits June 23, 2020 00:46
@mruberry mruberry force-pushed the dynamic_file_format branch from 9da9ce6 to 6c0ad6b Compare June 23, 2020 08:00
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Mike Ruberry added 2 commits June 24, 2020 00:46
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in e664458.

@facebook-github-bot facebook-github-bot deleted the dynamic_file_format branch July 13, 2020 17:55
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
BC NOTE:

This change makes it so modules saved with torch.jit.save in PyTorch 1.6 can be loaded by previous versions of PyTorch unless they use torch.div or (soon) torch.full. It also lets tensors saved using torch.save be loaded by previous versions. So this is the opposite of BC-breaking, but I'm using that label to highlight this issue since we don't have a "BC-improving" label.

PR NOTE:
When an operator's semantics change in PyTorch we want to do two things:

1) Preserve the semantics of older serialized Torchscript programs that use the operator
2) Ensure the new semantics are respected

Historically, this meant writing a Versioned Symbol that would remap older versions of the operator into current PyTorch code (1), and bumping the produced file format version (2). Unfortunately, bumping the produced file format version is a nuclear option for ensuring semantics are respected, since it also prevents older versions of PyTorch from loading anything (even tensors!) from newer versions.

Dynamic versioning addresses the nuclear consequences of bumping the produced file format version by only bumping it when necessary. That is, when an operator with changed semantics is detected in the serialized Torchscript. This will prevent Torchscript programs that use the changed operator from loading on earlier versions of PyTorch, as desired, but will have no impact on programs that don't use the changed operator.

Note that this change is only applicable when using torch.jit.save and torch.jit.load. torch.save pickles the given object using pickle (by default), which saves a function's Python directly.

No new tests for this behavior are added since the existing tests for versioned division in test_save_load already validate that models with div are loaded correctly at version 4.
Pull Request resolved: pytorch#40279

Reviewed By: dzhulgakov

Differential Revision: D22168291

Pulled By: mruberry

fbshipit-source-id: e71d6380e727e25123c7eedf6d80e5d7f1fe9f95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: bc-breaking Related to a BC-breaking change module: serialization Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants