[PyTorch] Use DimVector in at::matmul by swolchok · Pull Request #72230 · pytorch/pytorch

swolchok · 2022-02-03T01:50:19Z

Stack from ghstack:

-> [PyTorch] Use DimVector in at::matmul #72230

Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387.

Differential Revision: D33962610

Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/) [ghstack-poisoned]

facebook-github-bot · 2022-02-03T01:50:23Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/72230
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit bb30773 (more details on the Dr. CI page):

3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (1/3)

Step: "Setup Python3" (full log | diagnosis details | 🔁 rerun)

2022-02-14T19:23:10.5579722Z C:\actions-runner\...9d18c23dd12.sh: line 1: python3: command not found

2022-02-14T19:23:10.5029367Z   SHARD_NUMBER: 2
2022-02-14T19:23:10.5029682Z   NUM_TEST_SHARDS: 2
2022-02-14T19:23:10.5029990Z   TEST_CONFIG: default
2022-02-14T19:23:10.5031097Z   http_proxy: http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128
2022-02-14T19:23:10.5032927Z   https_proxy: http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128
2022-02-14T19:23:10.5034056Z   PYTORCH_IGNORE_DISABLED_ISSUES: 
2022-02-14T19:23:10.5034398Z   BRANCH: 
2022-02-14T19:23:10.5034638Z   TAG: 
2022-02-14T19:23:10.5034916Z   WORKFLOW_ID: 1842658102
2022-02-14T19:23:10.5035208Z ##[endgroup]
2022-02-14T19:23:10.5579722Z C:\actions-runner\_work\_temp\c5af4a52-2237-4dfb-95cf-b9d18c23dd12.sh: line 1: python3: command not found
2022-02-14T19:23:10.5603499Z ##[error]Process completed with exit code 127.
2022-02-14T19:23:10.5707295Z ##[group]Run rm -rf ./*
2022-02-14T19:23:10.5707673Z �[36;1mrm -rf ./*�[0m
2022-02-14T19:23:10.5728429Z shell: C:\Program Files\Git\usr\bin\bash.EXE --noprofile --norc -e -o pipefail {0}
2022-02-14T19:23:10.5728905Z env:
2022-02-14T19:23:10.5729273Z   BUILD_ENVIRONMENT: win-vs2019-cpu-py3
2022-02-14T19:23:10.5729664Z   BUILD_WHEEL: 1
2022-02-14T19:23:10.5729933Z   MAX_JOBS: 8
2022-02-14T19:23:10.5730224Z   CUDA_VERSION: cpu
2022-02-14T19:23:10.5730489Z   IN_CI: 1

linux-bionic-py3.7-clang9 / test (xla, 1, 1, linux.2xlarge) (2/3)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-14T18:21:35.2291349Z /var/lib/jenkins/w... at::Tensor&, const at::Tensor&, c10::string_view)

2022-02-14T18:21:35.2287750Z  at::Tensor XLANativeFunctions::gelu(const at::Tensor& self) {
2022-02-14T18:21:35.2287870Z             ^~~~~~~~~~~~~~~~~~
2022-02-14T18:21:35.2288141Z In file included from /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.cpp:13:0:
2022-02-14T18:21:35.2288602Z /var/lib/jenkins/workspace/xla/torch_xla/csrc/XLANativeFunctions.h:182:19: error: candidate is: static at::Tensor torch_xla::XLANativeFunctions::gelu(const at::Tensor&, c10::string_view)
2022-02-14T18:21:35.2288848Z  static at::Tensor gelu(const at::Tensor & self, c10::string_view approximate);
2022-02-14T18:21:35.2288965Z                    ^~~~
2022-02-14T18:21:35.2289941Z /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.cpp:1514:12: error: prototype for ‘at::Tensor torch_xla::XLANativeFunctions::gelu_backward(const at::Tensor&, const at::Tensor&)’ does not match any in class ‘torch_xla::XLANativeFunctions’
2022-02-14T18:21:35.2290186Z  at::Tensor XLANativeFunctions::gelu_backward(const at::Tensor& grad,
2022-02-14T18:21:35.2290289Z             ^~~~~~~~~~~~~~~~~~
2022-02-14T18:21:35.2290813Z In file included from /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.cpp:13:0:
2022-02-14T18:21:35.2291349Z /var/lib/jenkins/workspace/xla/torch_xla/csrc/XLANativeFunctions.h:183:19: error: candidate is: static at::Tensor torch_xla::XLANativeFunctions::gelu_backward(const at::Tensor&, const at::Tensor&, c10::string_view)
2022-02-14T18:21:35.2291662Z  static at::Tensor gelu_backward(const at::Tensor & grad_output, const at::Tensor & self, c10::string_view approximate);
2022-02-14T18:21:35.2291787Z                    ^~~~~~~~~~~~~
2022-02-14T18:21:35.2336087Z [99/179] c++ -MMD -MF /var/lib/jenkins/workspace/xla/build/temp.linux-x86_64-3.7/torch_xla/csrc/ops/index_select.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/opt/conda/include/python3.7m -c -c /var/lib/jenkins/workspace/xla/torch_xla/csrc/ops/index_select.cpp -o /var/lib/jenkins/workspace/xla/build/temp.linux-x86_64-3.7/torch_xla/csrc/ops/index_select.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_clang"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1002"' -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
2022-02-14T18:21:35.2336594Z cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
2022-02-14T18:21:35.2336842Z In file included from /var/lib/jenkins/workspace/c10/util/Logging.h:28:0,
2022-02-14T18:21:35.2337205Z                  from /var/lib/jenkins/workspace/c10/core/TensorImpl.h:14,
2022-02-14T18:21:35.2337643Z                  from /opt/conda/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:21,
2022-02-14T18:21:35.2338044Z                  from /opt/conda/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
2022-02-14T18:21:35.2338265Z                  from /var/lib/jenkins/workspace/torch/csrc/lazy/core/hash.h:12,
2022-02-14T18:21:35.2338473Z                  from /var/lib/jenkins/workspace/xla/torch_xla/csrc/ir.h:19,

linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge) (3/3)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-14T18:18:29.7688000Z �[31mERROR: pip's ... the source of the following dependency conflicts.

2022-02-14T18:18:28.2357836Z ++ stat --format %U /opt/conda/bin/pip
2022-02-14T18:18:28.2367902Z + PIP_USER=jenkins
2022-02-14T18:18:28.2370095Z ++ id -u -n
2022-02-14T18:18:28.2379788Z + CURRENT_USER=jenkins
2022-02-14T18:18:28.2379965Z + [[ jenkins = root ]]
2022-02-14T18:18:28.2380285Z + pip -q uninstall -y hypothesis
2022-02-14T18:18:28.6033753Z + pip -q uninstall -y coverage
2022-02-14T18:18:28.8920970Z �[33mWARNING: Skipping coverage as it is not installed.�[0m
2022-02-14T18:18:28.9340242Z + pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
2022-02-14T18:18:29.2675860Z �[33mWARNING: Skipping page https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl because the HEAD request got Content-Type: binary/octet-stream.The only supported Content-Type is text/html�[0m
2022-02-14T18:18:29.7688000Z �[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
2022-02-14T18:18:29.7688519Z pytest 7.0.1 requires attrs>=19.2.0, but you have attrs 18.1.0 which is incompatible.�[0m
2022-02-14T18:18:29.8283367Z + pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
2022-02-14T18:18:30.1594272Z �[33mWARNING: Skipping page https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl because the HEAD request got Content-Type: binary/octet-stream.The only supported Content-Type is text/html�[0m
2022-02-14T18:18:31.1436640Z + pip -q install hypothesis==3.44.6 -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl
2022-02-14T18:18:31.4614504Z �[33mWARNING: Skipping page https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl because the HEAD request got Content-Type: binary/octet-stream.The only supported Content-Type is text/html�[0m
2022-02-14T18:18:46.4808350Z �[33mWARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', timeout('timed out'))': /simple/hypothesis/�[0m
2022-02-14T18:18:47.6895063Z + EXTRA_TESTS=()
2022-02-14T18:18:47.6895568Z + [[ linux-xenial-py3.7-clang7-onnx == *-cuda* ]]
2022-02-14T18:18:47.6896060Z + [[ linux-xenial-py3.7-clang7-onnx == *-rocm* ]]
2022-02-14T18:18:47.6896283Z + rocm_ignore_test=()

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

pytorch-bot · 2022-02-03T01:50:23Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/66626bad62f9a15045b0de6cd7938db33db9ae47/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/) ghstack-source-id: 148261442 Pull Request resolved: #72230

lezcano · 2022-02-03T02:48:48Z

I should spend some time splitting that PR into more manageable chunks to be able to isolate that mobile failure...

IvanYashchuk

Maybe tensor.sizes().vec() should be disallowed? :)

Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/) [ghstack-poisoned]

Pull Request resolved: #72230 Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. ghstack-source-id: 148314580 Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/)

Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/) [ghstack-poisoned]

Pull Request resolved: #72230 Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. ghstack-source-id: 148446641 Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/)

Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/) [ghstack-poisoned]

Pull Request resolved: #72230 Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. ghstack-source-id: 148556529 Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/)

Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/) [ghstack-poisoned]

Pull Request resolved: #72230 Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. ghstack-source-id: 149069527 Differential Revision: [D33962610](https://our.internmc.facebook.com/intern/diff/D33962610/)

Summary: Pull Request resolved: #72230 Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. ghstack-source-id: 149069527 Test Plan: CI Reviewed By: ngimel Differential Revision: D33962610 fbshipit-source-id: 51e200f5237bdf225bfb2445e1e36bacd0e65e92

github-actions · 2022-02-15T01:35:02Z

Hey @swolchok.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Pull Request resolved: pytorch/pytorch#72230 Here's a small PR that only fixes the extra heap allocations for shapes. Hopefully won't get stuck like #64387. ghstack-source-id: 149069527 Test Plan: CI Reviewed By: ngimel Differential Revision: D33962610 fbshipit-source-id: 51e200f5237bdf225bfb2445e1e36bacd0e65e92 (cherry picked from commit 027537f32965d23fc78a36fec71be41cd5cbce3d)

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

…ic boogaloo" This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. ghstack-source-id: 0808c8f Pull Request resolved: #75197

…ic boogaloo" This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

…ic boogaloo" This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. ghstack-source-id: 4f44593 Pull Request resolved: #75197

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

…ic boogaloo" This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. ghstack-source-id: 429f9e4 Pull Request resolved: #75197

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

…ic boogaloo" This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. ghstack-source-id: 1d8dadd Pull Request resolved: #75197