Skip to content

fix jit List[Optional[Tensor]] type singleton bug#77846

Closed
bdhirsh wants to merge 2 commits intogh/bdhirsh/239/basefrom
gh/bdhirsh/239/head
Closed

fix jit List[Optional[Tensor]] type singleton bug#77846
bdhirsh wants to merge 2 commits intogh/bdhirsh/239/basefrom
gh/bdhirsh/239/head

Conversation

@bdhirsh
Copy link
Collaborator

@bdhirsh bdhirsh commented May 19, 2022

This should fix the remaining dynamo <> functionalization integration issue.

I had a fix earlier for JIT containers not correctly having singleton instances here, which fixed a bug in this code for functionalization:

def f(a, b):
    return a[b]

functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long))

But apparently the following code is still broken:

def f(a, b):
    return torch.ops.aten.index(a, b)

functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long))

Why? we have separate schema parsing logic for the ops in torch.ops.aten, and that logic circumvented my fix, creating its own singleton instance for Optional[Tensor].

We can't test this in core for the same reason as the last fix, so companion functorch tests here: pytorch/functorch#820

Stack from ghstack:

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 19, 2022

🔗 Helpful links

❌ 2 New Failures, 1 Pending

As of commit 8716053 (more details on the Dr. CI page):

Expand to see more
  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu) (1/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-19T17:57:12.2058359Z RuntimeError: test_sparse_csr failed!
2022-05-19T17:57:08.6017111Z 
2022-05-19T17:57:08.6017420Z Generating XML reports...
2022-05-19T17:57:08.8846606Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRCUDA-20220519175634.xml
2022-05-19T17:57:08.8848853Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRSampler-20220519175634.xml
2022-05-19T17:57:08.9428663Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCompressedCUDA-20220519175634.xml
2022-05-19T17:57:12.2043391Z Traceback (most recent call last):
2022-05-19T17:57:12.2044481Z   File "test/run_test.py", line 1074, in <module>
2022-05-19T17:57:12.2051603Z     main()
2022-05-19T17:57:12.2052567Z   File "test/run_test.py", line 1052, in main
2022-05-19T17:57:12.2057315Z     raise RuntimeError(err_message)
2022-05-19T17:57:12.2058359Z RuntimeError: test_sparse_csr failed!
2022-05-19T17:57:14.1610061Z 
2022-05-19T17:57:14.1611117Z real	96m27.654s
2022-05-19T17:57:14.1612133Z user	144m28.857s
2022-05-19T17:57:14.1612968Z sys	58m15.046s
2022-05-19T17:57:14.1613760Z + cleanup
2022-05-19T17:57:14.1614397Z + retcode=1
2022-05-19T17:57:14.1615028Z + set +x
2022-05-19T17:57:14.1746328Z ##[error]Process completed with exit code 1.
2022-05-19T17:57:14.1844220Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct
2022-05-19T17:57:14.1845664Z �[36;1m# copy test results back to the mounted workspace, needed sudo, resulting permissions were correct�[0m

See GitHub Actions build pull / win-vs2019-cpu-py3 / build (2/2)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

2022-05-19T14:53:24.2716820Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory
2022-05-19T14:53:21.1487995Z ++ cygpath -w /c/actions-runner/_work/pytorch/pytorch/build/win_tmp
2022-05-19T14:53:23.4832411Z + TMP_DIR_WIN='C:\actions-runner\_work\pytorch\pytorch\build\win_tmp'
2022-05-19T14:53:23.4832781Z + export TMP_DIR_WIN
2022-05-19T14:53:23.4833037Z + export PYTORCH_FINAL_PACKAGE_DIR=/c/2352712133/build-results/
2022-05-19T14:53:23.4833342Z + PYTORCH_FINAL_PACKAGE_DIR=/c/2352712133/build-results/
2022-05-19T14:53:23.4833605Z + [[ -n /c/2352712133/build-results/ ]]
2022-05-19T14:53:23.4833827Z + mkdir -p /c/2352712133/build-results/
2022-05-19T14:53:23.5121746Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts
2022-05-19T14:53:23.5122194Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts
2022-05-19T14:53:23.5345407Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts/*'
2022-05-19T14:53:24.2716820Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts/*': No such file or directory
2022-05-19T14:53:24.2721353Z + '[' -n '' ']'
2022-05-19T14:53:24.2722066Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/.jenkins/pytorch/win-test-helpers
2022-05-19T14:53:24.2722508Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/.jenkins/pytorch/win-test-helpers
2022-05-19T14:53:24.2722798Z + set +ex
2022-05-19T14:53:28.9783125Z + /c/actions-runner/_work/pytorch/pytorch/.jenkins/pytorch/win-test-helpers/build_pytorch.bat
2022-05-19T14:53:29.0116937Z 
2022-05-19T14:53:29.0117667Z C:\actions-runner\_work\pytorch\pytorch>if "0" == "1" (set BUILD_TYPE=debug )  ELSE (set BUILD_TYPE=release ) 
2022-05-19T14:53:29.0121251Z 
2022-05-19T14:53:29.0122623Z C:\actions-runner\_work\pytorch\pytorch>set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;C:\Program Files\Amazon\AWSCLI\bin;C:\Users\runneruser\AppData\Roaming\Python\Python310\Scripts;C:\actions-runner\_work\_tool\Python\3.10.4\x64\Scripts;C:\actions-runner\_work\_tool\Python\3.10.4\x64;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Amazon\cfn-bootstrap;C:\ProgramData\chocolatey\bin;C:\Program Files\Amazon\AWSCLIV2;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit;C:\Users\runneruser\AppData\Local\Microsoft\WindowsApps 
2022-05-19T14:53:29.0125584Z 

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

bdhirsh added a commit that referenced this pull request May 19, 2022
ghstack-source-id: d07446c
Pull Request resolved: #77846
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label May 19, 2022
@bdhirsh bdhirsh requested a review from ezyang May 19, 2022 14:31
This should fix the remaining dynamo <> functionalization integration issue.


I had a fix earlier for JIT containers not correctly having singleton instances [here](#76085), which fixed a bug in this code for functionalization:
```
def f(a, b):
    return a[b]

functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long))
```

But apparently the following code is still broken:
```
def f(a, b):
    return torch.ops.aten.index(a, b)

functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long))
```

Why? we have separate schema parsing logic for the ops in `torch.ops.aten`, and that logic circumvented my fix, creating its own singleton instance for Optional[Tensor].


We can't test this in core for the same reason as the last fix, so companion functorch tests here: pytorch/functorch#820




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request May 19, 2022
ghstack-source-id: 9718de4
Pull Request resolved: #77846
fake_value = c10::TypeFactory::create<c10::OptionalType>(fake_value);
real_value = c10::TypeFactory::create<c10::OptionalType>(real_value);
fake_value = c10::OptionalType::get(fake_value);
real_value = c10::OptionalType::get(real_value);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably a more general fix to this problem, but I just wanted to do something simple that would fix dynamo

For example, code that calls TypeFactory::create<ListType> would probably end up creating multiple "singleton" instances of a given ListType, but that doesn't matter in practice because the logic for checking if two list types are the same in IValue.h basically just checks if both types are list-like, and then pointer-compares their inner types.

@ezyang ezyang requested a review from eellison May 19, 2022 21:02
@bdhirsh
Copy link
Collaborator Author

bdhirsh commented May 19, 2022

@pytorchbot merge this please

@github-actions
Copy link
Contributor

Hey @bdhirsh.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request May 20, 2022
Summary:
Pull Request resolved: #77846

Approved by: https://github.com/ezyang

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/de646c06d457277f24d790eb41545e72f2d76f66

Reviewed By: seemethere

Differential Revision: D36537747

Pulled By: bdhirsh

fbshipit-source-id: e5f8a544f2275301f6229ea57fbfb8266d959819
@facebook-github-bot facebook-github-bot deleted the gh/bdhirsh/239/head branch May 23, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants