Adds dynamic versioning pattern by mruberry · Pull Request #40279 · pytorch/pytorch

mruberry · 2020-06-19T10:58:51Z

BC NOTE:

This change makes it so modules saved with torch.jit.save in PyTorch 1.6 can be loaded by previous versions of PyTorch unless they use torch.div or (soon) torch.full. It also lets tensors saved using torch.save be loaded by previous versions. So this is the opposite of BC-breaking, but I'm using that label to highlight this issue since we don't have a "BC-improving" label.

PR NOTE:
When an operator's semantics change in PyTorch we want to do two things:

Preserve the semantics of older serialized Torchscript programs that use the operator
Ensure the new semantics are respected

Historically, this meant writing a Versioned Symbol that would remap older versions of the operator into current PyTorch code (1), and bumping the produced file format version (2). Unfortunately, bumping the produced file format version is a nuclear option for ensuring semantics are respected, since it also prevents older versions of PyTorch from loading anything (even tensors!) from newer versions.

Dynamic versioning addresses the nuclear consequences of bumping the produced file format version by only bumping it when necessary. That is, when an operator with changed semantics is detected in the serialized Torchscript. This will prevent Torchscript programs that use the changed operator from loading on earlier versions of PyTorch, as desired, but will have no impact on programs that don't use the changed operator.

Note that this change is only applicable when using torch.jit.save and torch.jit.load. torch.save pickles the given object using pickle (by default), which saves a function's Python directly.

No new tests for this behavior are added since the existing tests for versioned division in test_save_load already validate that models with div are loaded correctly at version 4.

dr-ci · 2020-06-19T11:12:54Z

💊 CI failures summary and remediations

As of commit a1b15c6 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: False is not true

  test_mem_leak (__main__.TestProfiler_cuda) 
Checks that there's no memory leak when using profiler with CUDA ... FAIL (6.831s) 
 
====================================================================== 
FAIL [6.831s]: test_mem_leak (__main__.TestProfiler_cuda) 
Checks that there's no memory leak when using profiler with CUDA 
---------------------------------------------------------------------- 
Traceback (most recent call last): 
  File "test_profiler.py", line 42, in test_mem_leak 
    self.assertTrue(max_diff < 100 * 1024) 
AssertionError: False is not true 
 
---------------------------------------------------------------------- 
Ran 1 test in 6.831s 
 
FAILED (failures=1) 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestProfiler_cuda-20200624092838.xml 
Traceback (most recent call last): 
  File "run_test.py", line 727, in <module>

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 27 times.

lutzroeder · 2020-06-20T04:38:12Z

See #31877.

Will this add more complexity to this discussion? How is a user or tool going to be able to tell from an automatically allocated version number which version of PyTorch generated the file or needs to be installed to reproduce the behavior?

mruberry · 2020-06-21T04:14:15Z

See #31877.

Will this add more complexity to this discussion? How is a user or tool going to be able to tell from an automatically allocated version number which version of PyTorch generated the file or needs to be installed to reproduce the behavior?

You can't tell which version of PyTorch generated a particular version number, since it's not a unique mapping (even today) from PyTorch version -> produced format number.

You can tell which versions support it by looking at the supported format number, just like today.

lutzroeder · 2020-06-21T04:22:53Z

You can tell which versions support it by looking at the supported format number, just like today.

Scenario: TorchScript .pt file in Zip format.

How can a skilled user or a tool tell which min version of PyTorch is needed to load this file?

mruberry · 2020-06-21T04:25:16Z

You can tell which versions support it by looking at the supported format number, just like today.

Scenario: TorchScript .pt file in Zip format.

How can a skilled user or a tool tell which min version of PyTorch is needed to load this file?

I don't believe this is a supported scenario today. But I also don't think that's a very interesting scenario. Why would someone upgrade to the minimum version of PyTorch that supports the .pt file in question as opposed to just upgrading to the latest PyTorch?

lutzroeder · 2020-06-21T05:18:38Z

The general issue is how to consistently represent the format information to the user. There are tools (trying) to support PyTorch. Users report bugs and the complexities of the different PyTorch formats are hard to follow. For example, a TorchScript file saved with v4 could be called "TorchScript v4". What about a Zip container that was saved using torch.save? Is it called "PyTorch v4"? PyTorch v4 has not shipped yet. The discussion in #31877 is less about any specific scenario and more about having consistent naming and versioning rules across all PyTorch formats.

mruberry · 2020-06-21T05:24:10Z

The general issue is how to consistently represent the format information to the user. There are tools (trying) to support PyTorch. Users report bugs and the complexities of the different PyTorch formats are hard to follow. For example, a TorchScript file saved with v4 could be called "TorchScript v4". What about a Zip container that was saved using torch.save? Is it called PyTorch v4? PyTorch v4 has not shipped yet? The discussion in #31877 is less about any specific scenario and more about having a consistent naming scheme for all different PyTorch formats.

That's not an issue this PR is trying to address. Although it does add some complications to the discussion. You could say, "this is a file that requires the consumer to support version X," which is effectively what our error messages say if you try to read these files on versions of PyTorch that are too old to understand them.

zdevito

Looks good. One small nit about where the version logic is handled.

zdevito · 2020-06-22T16:29:56Z

  explicit PyTorchStreamWriter(
-      const std::function<size_t(const void*, size_t)>& writer_func);
+      std::string archive_name,
+      const bool _write_version_at_setup=true);


Not a huge fan of a flag here. Makes the behavior complicated. I feel like the version should just get written on finalization (seems like we already have a finalized_ flag). And inline container should handle the maxing.

Good idea. I wasn't a fan of it either but was leery of changing inline_container too much. Moving this to finalization should be nicer.

…mic_file_format

facebook-github-bot