Skip to content

[wip] Replace optimizers in torch.optim with the ones from torch.optim._multi_tensor#49039

Closed
izdeby wants to merge 52 commits intogh/izdeby/69/basefrom
gh/izdeby/69/head
Closed

[wip] Replace optimizers in torch.optim with the ones from torch.optim._multi_tensor#49039
izdeby wants to merge 52 commits intogh/izdeby/69/basefrom
gh/izdeby/69/head

Conversation

@izdeby
Copy link
Copy Markdown
Contributor

@izdeby izdeby commented Dec 8, 2020

Stack from ghstack:

Differential Revision: D25406490


Benchmark results

SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

Benchmark script

import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()

[ghstack-poisoned]
izdeby pushed a commit that referenced this pull request Dec 8, 2020
ghstack-source-id: 5c74ab5
Pull Request resolved: #49039
@izdeby izdeby changed the title Swap optimizers Replace optimizers from torch.optim with torch.optim._multi_tensor Dec 8, 2020
@izdeby izdeby changed the title Replace optimizers from torch.optim with torch.optim._multi_tensor Replace optimizers in torch.optim with torch.optim._multi_tensor Dec 8, 2020
@izdeby izdeby changed the title Replace optimizers in torch.optim with torch.optim._multi_tensor Replace optimizers in torch.optim with the ones from torch.optim._multi_tensor Dec 8, 2020
@dr-ci
Copy link
Copy Markdown

dr-ci bot commented Dec 8, 2020

💊 CI failures summary and remediations

As of commit f456748 (more details on the Dr. CI page):


  • 19/19 failures possibly* introduced in this PR
    • 2/19 non-scanned failure(s)

🕵️ 16 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build mypy (1/16)

Step: "Run mypy" (full log | diagnosis details | 🔁 rerun)

2021-04-01T17:28:07.8986823Z torch/optim/_multi_tensor/adam.py:146: error: Need type annotation for 'per_device_and_dtype_grads' [var-annotated]
2021-04-01T17:28:07.8970974Z torch/optim/_multi_tensor/asgd.py:97: error: Argument 1 of "zero_grad" is incompatible with supertype "Optimizer"; supertype defines the argument type as "Optional[bool]"  [override]
2021-04-01T17:28:07.8972485Z torch/optim/_multi_tensor/asgd.py:98: error: Need type annotation for 'per_device_and_dtype_grads'  [var-annotated]
2021-04-01T17:28:07.8974412Z torch/optim/_multi_tensor/adamw.py:127: error: Incompatible types in assignment (expression has type "Union[Tuple[Tensor, ...], List[Tensor]]", variable has type "List[Any]")  [assignment]
2021-04-01T17:28:07.8976085Z torch/optim/_multi_tensor/adamw.py:146: error: Argument 1 of "zero_grad" is incompatible with supertype "Optimizer"; supertype defines the argument type as "Optional[bool]"  [override]
2021-04-01T17:28:07.8977868Z torch/optim/_multi_tensor/adamw.py:147: error: Need type annotation for 'per_device_and_dtype_grads'  [var-annotated]
2021-04-01T17:28:07.8979103Z torch/optim/_multi_tensor/adamax.py:108: error: Argument 1 of "zero_grad" is incompatible with supertype "Optimizer"; supertype defines the argument type as "Optional[bool]"  [override]
2021-04-01T17:28:07.8980527Z torch/optim/_multi_tensor/adamax.py:109: error: Need type annotation for 'per_device_and_dtype_grads'  [var-annotated]
2021-04-01T17:28:07.8982093Z torch/optim/_multi_tensor/adam.py:113: error: Incompatible types in assignment (expression has type "Union[Tuple[Tensor, ...], List[Tensor]]", variable has type "List[Any]")  [assignment]
2021-04-01T17:28:07.8983657Z torch/optim/_multi_tensor/adam.py:126: error: Incompatible types in assignment (expression has type "Union[Tuple[Tensor, ...], List[Tensor]]", variable has type "List[Any]")  [assignment]
2021-04-01T17:28:07.8985290Z torch/optim/_multi_tensor/adam.py:145: error: Argument 1 of "zero_grad" is incompatible with supertype "Optimizer"; supertype defines the argument type as "Optional[bool]"  [override]
2021-04-01T17:28:07.8986823Z torch/optim/_multi_tensor/adam.py:146: error: Need type annotation for 'per_device_and_dtype_grads'  [var-annotated]
2021-04-01T17:28:54.2408897Z Found 19 errors in 7 files (checked 1278 source files)
2021-04-01T17:28:56.5637144Z ##[error]Process completed with exit code 1.
2021-04-01T17:28:56.5776051Z Post job cleanup.
2021-04-01T17:28:56.7381839Z [command]/usr/bin/git version
2021-04-01T17:28:56.7446160Z git version 2.31.1
2021-04-01T17:28:56.7531822Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2021-04-01T17:28:56.7584232Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2021-04-01T17:28:56.7920680Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2021-04-01T17:28:56.8014670Z http.https://github.com/.extraheader
2021-04-01T17:28:56.8017388Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader

See CircleCI build pytorch_linux_xenial_py3_clang7_onnx_build (2/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:05:16 SyntaxError: invalid syntax
Apr 01 18:05:16 Traceback (most recent call last):
Apr 01 18:05:16   File "test/run_test.py", line 15, in <module>
Apr 01 18:05:16     import torch
Apr 01 18:05:16   File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:05:16     from torch import optim as optim
Apr 01 18:05:16   File "/opt/conda/lib/python3.6/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:05:16     from .adam import Adam
Apr 01 18:05:16   File "/opt/conda/lib/python3.6/site-packages/torch/optim/adam.py", line 73
Apr 01 18:05:16     <<<<<<< HEAD
Apr 01 18:05:16      ^
Apr 01 18:05:16 SyntaxError: invalid syntax
Apr 01 18:05:16 =================== sccache compilation log ===================
Apr 01 18:05:16 + cleanup
Apr 01 18:05:16 + retcode=1
Apr 01 18:05:16 + set +x
Apr 01 18:05:16 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:05:16 Compile requests                   4953
Apr 01 18:05:16 Compile requests executed          4600
Apr 01 18:05:16 Cache hits                         4581
Apr 01 18:05:16 Cache hits (C/C++)                 4581
Apr 01 18:05:16 Cache misses                          1

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test1 (3/16)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:34:23 SyntaxError: invalid syntax
Apr 01 18:34:21 + python -c 'import torch; print(torch.__version__, torch.version.git_version)'
Apr 01 18:34:23 Traceback (most recent call last):
Apr 01 18:34:23   File "<string>", line 1, in <module>
Apr 01 18:34:23   File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:34:23     from torch import optim as optim
Apr 01 18:34:23   File "/opt/conda/lib/python3.6/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:34:23     from .adam import Adam
Apr 01 18:34:23   File "/opt/conda/lib/python3.6/site-packages/torch/optim/adam.py", line 73
Apr 01 18:34:23     <<<<<<< HEAD
Apr 01 18:34:23      ^
Apr 01 18:34:23 SyntaxError: invalid syntax
Apr 01 18:34:23 + cleanup
Apr 01 18:34:23 + retcode=1
Apr 01 18:34:23 + set +x
Apr 01 18:34:23 =================== sccache compilation log ===================
Apr 01 18:34:23 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:34:23 Compile requests                      0
Apr 01 18:34:23 Compile requests executed             0
Apr 01 18:34:23 Cache hits                            0
Apr 01 18:34:23 Cache misses                          0
Apr 01 18:34:23 Cache timeouts                        0

See CircleCI build pytorch_libtorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (4/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:16:44 Error generating file
Apr 01 18:16:44 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu(31): error: identifier "alpha" is undefined
Apr 01 18:16:44 
Apr 01 18:16:44 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu(31): error: type name is not allowed
Apr 01 18:16:44 
Apr 01 18:16:44 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu(31): error: expected an expression
Apr 01 18:16:44 
Apr 01 18:16:44 36 errors detected in the compilation of "/var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu".
Apr 01 18:16:44 -- Removing /var/lib/jenkins/cpp-build/caffe2/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_ForeachBinaryOpList.cu.o
Apr 01 18:16:44 /usr/bin/cmake -E remove /var/lib/jenkins/cpp-build/caffe2/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_ForeachBinaryOpList.cu.o
Apr 01 18:16:44 CMake Error at torch_cuda_cu_generated_ForeachBinaryOpList.cu.o.Debug.cmake:281 (message):
Apr 01 18:16:44   Error generating file
Apr 01 18:16:44   /var/lib/jenkins/cpp-build/caffe2/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_ForeachBinaryOpList.cu.o
Apr 01 18:16:44 
Apr 01 18:16:44 
Apr 01 18:16:44 make[2]: *** [caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_ForeachBinaryOpList.cu.o] Error 1
Apr 01 18:16:44 make[2]: *** Waiting for unfinished jobs....
Apr 01 18:16:44 caffe2/CMakeFiles/torch_cuda_cu.dir/build.make:553: recipe for target 'caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_ForeachBinaryOpList.cu.o' failed
Apr 01 18:16:44 Generated /var/lib/jenkins/cpp-build/caffe2/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_Im2Col.cu.o successfully.
Apr 01 18:16:45 -- Generating temporary cmake readable file: /var/lib/jenkins/cpp-build/caffe2/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_AdaptiveAveragePooling.cu.o.depend.tmp
Apr 01 18:16:45 /usr/bin/cmake -D input_file:FILEPATH=/var/lib/jenkins/cpp-build/caffe2/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_AdaptiveAveragePooling.cu.o.NVCC-depend -D output_file:FILEPATH=/var/lib/jenkins/cpp-build/caffe2/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_AdaptiveAveragePooling.cu.o.depend.tmp -D verbose=1 -P /var/lib/jenkins/workspace/cmake/Modules_CUDA_fix/upstream/FindCUDA/make2cmake.cmake
Apr 01 18:16:45 CMake Warning at /var/lib/jenkins/workspace/cmake/Modules_CUDA_fix/upstream/FindCUDA/make2cmake.cmake:76 (message):

See CircleCI build pytorch_vulkan_linux_bionic_py3_6_clang9_build (5/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:06:54 SyntaxError: invalid syntax
Apr 01 18:06:54 Traceback (most recent call last):
Apr 01 18:06:54   File "test/run_test.py", line 15, in <module>
Apr 01 18:06:54     import torch
Apr 01 18:06:54   File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:06:54     from torch import optim as optim
Apr 01 18:06:54   File "/opt/conda/lib/python3.6/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:06:54     from .adam import Adam
Apr 01 18:06:54   File "/opt/conda/lib/python3.6/site-packages/torch/optim/adam.py", line 73
Apr 01 18:06:54     <<<<<<< HEAD
Apr 01 18:06:54      ^
Apr 01 18:06:54 SyntaxError: invalid syntax
Apr 01 18:06:54 + cleanup
Apr 01 18:06:54 + retcode=1
Apr 01 18:06:54 + set +x
Apr 01 18:06:54 =================== sccache compilation log ===================
Apr 01 18:06:54 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:06:54 Compile requests                   5002
Apr 01 18:06:54 Compile requests executed          4641
Apr 01 18:06:54 Cache hits                         4621
Apr 01 18:06:54 Cache hits (C/C++)                 4621
Apr 01 18:06:54 Cache misses                          1

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_build (6/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Error generating file
          detected during instantiation of "std::vector<at::Tensor, std::allocator<at::Tensor>> at::native::foreach_tensor_list_op<Op>(at::TensorList, at::TensorList, const c10::Scalar &, __nv_bool) [with Op=std::multiplies]" 
(103): here

Error limit reached.
100 errors detected in the compilation of "C:/Users/circleci/project/build/win_tmp/bin/.tmpQ1SMUw/tmpxft_00000fd8_00000000-7_ForeachBinaryOpList.cpp1.ii".
Compilation terminated.
ForeachBinaryOpList.cu
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_ForeachBinaryOpList.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_ForeachBinaryOpList.cu.obj
CMake Error at torch_cuda_generated_ForeachBinaryOpList.cu.obj.Release.cmake:281 (message):
  Error generating file
  C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_ForeachBinaryOpList.cu.obj


[4693/5565] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2\CMakeFiles\torch_cuda.dir\utils\math && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E make_directory C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/. && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -D verbose:BOOL=ON -D build_configuration:STRING=Release -D generated_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/./torch_cuda_generated_elementwise.cu.obj -D generated_cubin_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/./torch_cuda_generated_elementwise.cu.obj.cubin.txt -P C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/torch_cuda_generated_elementwise.cu.obj.Release.cmake"
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/./torch_cuda_generated_elementwise.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/./torch_cuda_generated_elementwise.cu.obj
-- Generating dependency file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/torch_cuda_generated_elementwise.cu.obj.NVCC-depend
C:/Users/circleci/project/build/win_tmp/bin/randomtemp.exe -M -D__CUDACC__ C:/Users/circleci/project/caffe2/utils/math/elementwise.cu -o C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/torch_cuda_generated_elementwise.cu.obj.NVCC-depend -ccbin cl.exe -m64 -Dtorch_cuda_EXPORTS -DUSE_CUDA -DTORCH_CUDA_BUILD_MAIN_LIB -DWIN32_LEAN_AND_MEAN -DTH_BLAS_MKL -D_OPENMP_NOFORCE_MANIFEST -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx_torch -D_CRT_SECURE_NO_DEPRECATE=1 -DMAGMA_V2 -DIDEEP_USE_MKL -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -Xcompiler ,\"/DWIN32\",\"/D_WINDOWS\",\"/GR\",\"/EHsc\",\"/w\",\"/bigobj\",\"-DUSE_PTHREADPOOL\",\"-openmp:experimental\",\"-IC:/Users/circleci/project/build/win_tmp/mkl/include\",\"-DNDEBUG\",\"-DUSE_FBGEMM\",\"-DUSE_XNNPACK\",\"-DHAVE_AVX_CPU_DEFINITION\",\"-DHAVE_AVX2_CPU_DEFINITION\",\"/MD\",\"/O2\",\"/Ob2\",\"/DNDEBUG\",\"/w\",\"/bigobj\",\"-DNDEBUG\" -Xcompiler /w -w -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch --use-local-env -gencode arch=compute_75,code=sm_75 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --Werror cross-execution-space-call --no-host-device-move-forward -Xcompiler -MD --expt-relaxed-constexpr --expt-extended-lambda -Xcompiler=/wd4819,/wd4503,/wd4190,/wd4244,/wd4251,/wd4275,/wd4522 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DNVCC "-IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include" -IC:/Users/circleci/project/build/aten/src -IC:/Users/circleci/project/aten/src -IC:/Users/circleci/project/build -IC:/Users/circleci/project -IC:/Users/circleci/project/build/third_party/gloo -IC:/Users/circleci/project/cmake/../third_party/gloo -IC:/Users/circleci/project/cmake/../third_party/googletest/googlemock/include -IC:/Users/circleci/project/cmake/../third_party/googletest/googletest/include -IC:/Users/circleci/project/third_party/protobuf/src -IC:/Users/circleci/project/build/win_tmp/mkl/include -IC:/Users/circleci/project/third_party/XNNPACK/include -IC:/Users/circleci/project/cmake/../third_party/benchmark/include -IC:/Users/circleci/project/third_party -IC:/Users/circleci/project/cmake/../third_party/eigen -IC:/Jenkins/Miniconda3/include -IC:/Jenkins/Miniconda3/lib/site-packages/numpy/core/include -IC:/Users/circleci/project/cmake/../third_party/pybind11/include -IC:/Users/circleci/project/cmake/../third_party/cub -IC:/Users/circleci/project/build/caffe2/contrib/aten -IC:/Users/circleci/project/third_party/onnx -IC:/Users/circleci/project/build/third_party/onnx -IC:/Users/circleci/project/third_party/foxi -IC:/Users/circleci/project/build/third_party/foxi -IC:/Users/circleci/project/build/win_tmp/magma/include -IC:/Users/circleci/project/third_party/ideep/mkl-dnn/include -IC:/Users/circleci/project/third_party/ideep/include -IC:/Users/circleci/project/build/include -IC:/Users/circleci/project/build/caffe2/aten/src/TH -IC:/Users/circleci/project/aten/src/TH -IC:/Users/circleci/project/build/caffe2/aten/src/THC -IC:/Users/circleci/project/aten/src/THC -IC:/Users/circleci/project/aten/src/THCUNN -IC:/Users/circleci/project/aten/src/ATen/cuda -IC:/Users/circleci/project/build/caffe2/aten/src -IC:/Users/circleci/project/aten/../third_party/catch/single_include -IC:/Users/circleci/project/aten/src/ATen/.. -IC:/Users/circleci/project/build/caffe2/aten/src/ATen -IC:/Users/circleci/project/c10/cuda/../.. -IC:/Users/circleci/project/c10/../ "-IC:/Program Files/NVIDIA Corporation/NvToolsExt/include" -IC:/Users/circleci/project/torch/csrc/api -IC:/Users/circleci/project/torch/csrc/api/include -IC:/Users/circleci/project/build/third_party/ideep/mkl-dnn/include -IC:/Users/circleci/project/third_party/ideep/mkl-dnn/src/../include
elementwise.cu
-- Generating temporary cmake readable file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/utils/math/torch_cuda_generated_elementwise.cu.obj.depend.tmp

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (7/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:22:35 SyntaxError: invalid syntax
Apr 01 18:22:35 Traceback (most recent call last):
Apr 01 18:22:35   File "setup.py", line 38, in <module>
Apr 01 18:22:35     from torch.utils.cpp_extension import BuildExtension, CppExtension
Apr 01 18:22:35   File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:22:35     from torch import optim as optim
Apr 01 18:22:35   File "/opt/conda/lib/python3.6/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:22:35     from .adam import Adam
Apr 01 18:22:35   File "/opt/conda/lib/python3.6/site-packages/torch/optim/adam.py", line 73
Apr 01 18:22:35     <<<<<<< HEAD
Apr 01 18:22:35      ^
Apr 01 18:22:35 SyntaxError: invalid syntax
Apr 01 18:22:35 + cleanup
Apr 01 18:22:35 + retcode=1
Apr 01 18:22:35 + set +x
Apr 01 18:22:35 =================== sccache compilation log ===================
Apr 01 18:22:35 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:22:35 Compile requests                    4895
Apr 01 18:22:35 Compile requests executed           4571
Apr 01 18:22:35 Cache hits                          4403
Apr 01 18:22:35 Cache hits (C/C++)                  4403
Apr 01 18:22:35 Cache misses                         149

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (8/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 17:59:57 SyntaxError: invalid syntax
Apr 01 17:59:57     self.compile_path(child, top_package_path)
Apr 01 17:59:57   File "freeze.py", line 170, in compile_path
Apr 01 17:59:57     self.compile_file(path, top_package_path)
Apr 01 17:59:57   File "freeze.py", line 83, in wrapper
Apr 01 17:59:57     ret = fn(*args, **kwargs)
Apr 01 17:59:57   File "freeze.py", line 232, in compile_file
Apr 01 17:59:57     co = compile(src_file.read(), path, "exec")
Apr 01 17:59:57   File "/var/lib/jenkins/workspace/torch/csrc/deploy/interpreter/../../../../torch/optim/adam.py", line 73
Apr 01 17:59:57     <<<<<<< HEAD
Apr 01 17:59:57     ^
Apr 01 17:59:57 SyntaxError: invalid syntax
Apr 01 17:59:57 torch/csrc/deploy/interpreter/CMakeFiles/torch_deployinterpreter.dir/build.make:668: recipe for target '../torch/csrc/deploy/interpreter/frozen/main.c' failed
Apr 01 17:59:57 make[2]: *** [../torch/csrc/deploy/interpreter/frozen/main.c] Error 1
Apr 01 17:59:57 CMakeFiles/Makefile2:18581: recipe for target 'torch/csrc/deploy/interpreter/CMakeFiles/torch_deployinterpreter.dir/all' failed
Apr 01 17:59:57 make[1]: *** [torch/csrc/deploy/interpreter/CMakeFiles/torch_deployinterpreter.dir/all] Error 2
Apr 01 17:59:57 make[1]: *** Waiting for unfinished jobs....
Apr 01 17:59:57 [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/f32-vscale/avx-x32.c.o
Apr 01 17:59:57 In file included from /var/lib/jenkins/workspace/third_party/XNNPACK/include/xnnpack.h:15:0,
Apr 01 17:59:57                  from /var/lib/jenkins/workspace/third_party/XNNPACK/src/xnnpack/params.h:15,
Apr 01 17:59:57                  from /var/lib/jenkins/workspace/third_party/XNNPACK/src/xnnpack/vunary.h:11,
Apr 01 17:59:57                  from /var/lib/jenkins/workspace/third_party/XNNPACK/src/f32-vrnd/gen/vrndz-avx-x8.c:16:

See CircleCI build pytorch_linux_bionic_rocm3_9_py3_6_build (9/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:03:05 Error generating file
Apr 01 18:03:05 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachBinaryOpList.hip:40:31: error: use of undeclared identifier 'alpha'
Apr 01 18:03:05 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachBinaryOpList.hip:40:31: error: use of undeclared identifier 'alpha'
Apr 01 18:03:05 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachBinaryOpList.hip:40:31: error: use of undeclared identifier 'alpha'
Apr 01 18:03:05 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachBinaryOpList.hip:40:31: error: use of undeclared identifier 'alpha'
Apr 01 18:03:05 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachBinaryOpList.hip:40:31: error: use of undeclared identifier 'alpha'
Apr 01 18:03:05 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachBinaryOpList.hip:40:31: error: use of undeclared identifier 'alpha'
Apr 01 18:03:05 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachBinaryOpList.hip:40:31: error: use of undeclared identifier 'alpha'
Apr 01 18:03:05 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachBinaryOpList.hip:40:31: error: use of undeclared identifier 'alpha'
Apr 01 18:03:05 2 warnings and 12 errors generated when compiling for gfx900.
Apr 01 18:03:05 CMake Error at torch_hip_generated_ForeachBinaryOpList.hip.o.cmake:192 (message):
Apr 01 18:03:05   Error generating file
Apr 01 18:03:05   /var/lib/jenkins/workspace/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/./torch_hip_generated_ForeachBinaryOpList.hip.o
Apr 01 18:03:05 
Apr 01 18:03:05 
Apr 01 18:03:05 make[2]: *** [caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ForeachBinaryOpList.hip.o] Error 1
Apr 01 18:03:05 caffe2/CMakeFiles/torch_hip.dir/build.make:880: recipe for target 'caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ForeachBinaryOpList.hip.o' failed
Apr 01 18:03:11 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/AmpKernels.hip:10:
Apr 01 18:03:11 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachFunctors.cuh:3:
Apr 01 18:03:11 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/MultiTensorApply.cuh:6:
Apr 01 18:03:11 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/Loops.cuh:18:
Apr 01 18:03:11 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/MemoryAccess.cuh:38:26: warning: template template parameter using 'typename' is a C++17 extension [-Wc++17-extensions]

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (10/16)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:35:21 SyntaxError: invalid syntax
Apr 01 18:35:20 + python -c 'import torch; print(torch.__version__, torch.version.git_version)'
Apr 01 18:35:21 Traceback (most recent call last):
Apr 01 18:35:21   File "<string>", line 1, in <module>
Apr 01 18:35:21   File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:35:21     from torch import optim as optim
Apr 01 18:35:21   File "/opt/conda/lib/python3.6/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:35:21     from .adam import Adam
Apr 01 18:35:21   File "/opt/conda/lib/python3.6/site-packages/torch/optim/adam.py", line 73
Apr 01 18:35:21     <<<<<<< HEAD
Apr 01 18:35:21      ^
Apr 01 18:35:21 SyntaxError: invalid syntax
Apr 01 18:35:22 + cleanup
Apr 01 18:35:22 + retcode=1
Apr 01 18:35:22 + set +x
Apr 01 18:35:22 =================== sccache compilation log ===================
Apr 01 18:35:22 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:35:22 Compile requests                      0
Apr 01 18:35:22 Compile requests executed             0
Apr 01 18:35:22 Cache hits                            0
Apr 01 18:35:22 Cache misses                          0
Apr 01 18:35:22 Cache timeouts                        0

See CircleCI build pytorch_windows_vs2019_py36_cuda11.1_build (11/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Error generating file
          detected during instantiation of "std::vector<at::Tensor, std::allocator<at::Tensor>> at::native::foreach_tensor_list_op<Op>(at::TensorList, at::TensorList, const c10::Scalar &, __nv_bool) [with Op=std::multiplies]" 
(103): here

Error limit reached.
100 errors detected in the compilation of "C:/Users/circleci/project/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu".
Compilation terminated.
ForeachBinaryOpList.cu
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_ForeachBinaryOpList.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_ForeachBinaryOpList.cu.obj
CMake Error at torch_cuda_cu_generated_ForeachBinaryOpList.cu.obj.Release.cmake:281 (message):
  Error generating file
  C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_ForeachBinaryOpList.cu.obj


[4897/5568] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2\CMakeFiles\torch_cuda_cu.dir\__\aten\src\ATen\native\cuda && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E make_directory C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/. && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -D verbose:BOOL=ON -D build_configuration:STRING=Release -D generated_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_SoftMax.cu.obj -D generated_cubin_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_SoftMax.cu.obj.cubin.txt -P C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_SoftMax.cu.obj.Release.cmake"
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_SoftMax.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/./torch_cuda_cu_generated_SoftMax.cu.obj
-- Generating dependency file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_SoftMax.cu.obj.NVCC-depend
C:/Users/circleci/project/build/win_tmp/bin/randomtemp.exe -M -D__CUDACC__ C:/Users/circleci/project/aten/src/ATen/native/cuda/SoftMax.cu -o C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_SoftMax.cu.obj.NVCC-depend -ccbin cl.exe -m64 -Dtorch_cuda_cu_EXPORTS -DBUILD_SPLIT_CUDA -DUSE_CUDA -DTORCH_CUDA_CU_BUILD_MAIN_LIB -DWIN32_LEAN_AND_MEAN -DTH_BLAS_MKL -D_OPENMP_NOFORCE_MANIFEST -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx_torch -D_CRT_SECURE_NO_DEPRECATE=1 -DMAGMA_V2 -DIDEEP_USE_MKL -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -Xcompiler ,\"/DWIN32\",\"/D_WINDOWS\",\"/GR\",\"/EHsc\",\"/w\",\"/bigobj\",\"-DUSE_PTHREADPOOL\",\"-openmp:experimental\",\"-IC:/Users/circleci/project/build/win_tmp/mkl/include\",\"-DNDEBUG\",\"-DUSE_FBGEMM\",\"-DUSE_XNNPACK\",\"-DHAVE_AVX_CPU_DEFINITION\",\"-DHAVE_AVX2_CPU_DEFINITION\",\"/MD\",\"/O2\",\"/Ob2\",\"/DNDEBUG\",\"/w\",\"/bigobj\",\"-DNDEBUG\" -Xcompiler /w -w -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch --use-local-env -gencode arch=compute_75,code=sm_75 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --Werror cross-execution-space-call --no-host-device-move-forward -Xcompiler -MD --expt-relaxed-constexpr --expt-extended-lambda -Xcompiler=/wd4819,/wd4503,/wd4190,/wd4244,/wd4251,/wd4275,/wd4522 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DNVCC "-IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/include" -IC:/Users/circleci/project/build/aten/src -IC:/Users/circleci/project/aten/src -IC:/Users/circleci/project/build -IC:/Users/circleci/project -IC:/Users/circleci/project/build/third_party/gloo -IC:/Users/circleci/project/cmake/../third_party/gloo -IC:/Users/circleci/project/cmake/../third_party/googletest/googlemock/include -IC:/Users/circleci/project/cmake/../third_party/googletest/googletest/include -IC:/Users/circleci/project/third_party/protobuf/src -IC:/Users/circleci/project/build/win_tmp/mkl/include -IC:/Users/circleci/project/third_party/XNNPACK/include -IC:/Users/circleci/project/cmake/../third_party/benchmark/include -IC:/Users/circleci/project/third_party -IC:/Users/circleci/project/cmake/../third_party/eigen -IC:/Jenkins/Miniconda3/include -IC:/Jenkins/Miniconda3/lib/site-packages/numpy/core/include -IC:/Users/circleci/project/cmake/../third_party/pybind11/include -IC:/Users/circleci/project/build/caffe2/contrib/aten -IC:/Users/circleci/project/third_party/onnx -IC:/Users/circleci/project/build/third_party/onnx -IC:/Users/circleci/project/third_party/foxi -IC:/Users/circleci/project/build/third_party/foxi -IC:/Users/circleci/project/build/win_tmp/magma/include -IC:/Users/circleci/project/third_party/ideep/mkl-dnn/include -IC:/Users/circleci/project/third_party/ideep/include -IC:/Users/circleci/project/build/include -IC:/Users/circleci/project/build/caffe2/aten/src/TH -IC:/Users/circleci/project/aten/src/TH -IC:/Users/circleci/project/build/caffe2/aten/src/THC -IC:/Users/circleci/project/aten/src/THC -IC:/Users/circleci/project/aten/src/THCUNN -IC:/Users/circleci/project/aten/src/ATen/cuda -IC:/Users/circleci/project/build/caffe2/aten/src -IC:/Users/circleci/project/aten/../third_party/catch/single_include -IC:/Users/circleci/project/aten/src/ATen/.. -IC:/Users/circleci/project/build/caffe2/aten/src/ATen -IC:/Users/circleci/project/c10/cuda/../.. -IC:/Users/circleci/project/c10/../ "-IC:/Program Files/NVIDIA Corporation/NvToolsExt/include" -IC:/Users/circleci/project/torch/csrc/api -IC:/Users/circleci/project/torch/csrc/api/include -IC:/Users/circleci/project/build/third_party/ideep/mkl-dnn/include -IC:/Users/circleci/project/third_party/ideep/mkl-dnn/src/../include
SoftMax.cu
-- Generating temporary cmake readable file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda_cu.dir/__/aten/src/ATen/native/cuda/torch_cuda_cu_generated_SoftMax.cu.obj.depend.tmp

See CircleCI build pytorch_macos_10_13_py3_test (12/16)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:56:41 SyntaxError: invalid syntax
Apr 01 18:56:40 ++ python -c 'import torch; print(int(torch.backends.openmp.is_available()))'
Apr 01 18:56:41 Traceback (most recent call last):
Apr 01 18:56:41   File "<string>", line 1, in <module>
Apr 01 18:56:41   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:56:41     from torch import optim as optim
Apr 01 18:56:41   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:56:41     from .adam import Adam
Apr 01 18:56:41   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/optim/adam.py", line 73
Apr 01 18:56:41     <<<<<<< HEAD
Apr 01 18:56:41      ^
Apr 01 18:56:41 SyntaxError: invalid syntax
Apr 01 18:56:41 + [[ ! '' == \1 ]]
Apr 01 18:56:41 + echo 'Build should have OpenMP enabled, but torch.backends.openmp.is_available() is False'
Apr 01 18:56:41 Build should have OpenMP enabled, but torch.backends.openmp.is_available() is False
Apr 01 18:56:41 + exit 1
Apr 01 18:56:41 + cleanup
Apr 01 18:56:41 + retcode=1
Apr 01 18:56:41 + set +x


Exited with code exit status 1

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_build (13/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:06:10 SyntaxError: invalid syntax
Apr 01 18:06:10 Traceback (most recent call last):
Apr 01 18:06:10   File "test/run_test.py", line 15, in <module>
Apr 01 18:06:10     import torch
Apr 01 18:06:10   File "/opt/conda/lib/python3.8/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:06:10     from torch import optim as optim
Apr 01 18:06:10   File "/opt/conda/lib/python3.8/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:06:10     from .adam import Adam
Apr 01 18:06:10   File "/opt/conda/lib/python3.8/site-packages/torch/optim/adam.py", line 73
Apr 01 18:06:10     <<<<<<< HEAD
Apr 01 18:06:10     ^
Apr 01 18:06:10 SyntaxError: invalid syntax
Apr 01 18:06:10 =================== sccache compilation log ===================
Apr 01 18:06:10 + cleanup
Apr 01 18:06:10 + retcode=1
Apr 01 18:06:10 + set +x
Apr 01 18:06:10 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:06:10 Compile requests                   4903
Apr 01 18:06:10 Compile requests executed          4569
Apr 01 18:06:10 Cache hits                         4551
Apr 01 18:06:10 Cache hits (C/C++)                 4551
Apr 01 18:06:10 Cache misses                          1

See CircleCI build pytorch_windows_vs2019_py36_cpu_build (14/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

SyntaxError: invalid syntax
Traceback (most recent call last):
  File "test/run_test.py", line 15, in <module>
    import torch
  File "C:\Jenkins\Miniconda3\lib\site-packages\torch\__init__.py", line 649, in <module>
    from torch import optim as optim
  File "C:\Jenkins\Miniconda3\lib\site-packages\torch\optim\__init__.py", line 10, in <module>
    from .adam import Adam
  File "C:\Jenkins\Miniconda3\lib\site-packages\torch\optim\adam.py", line 73
    <<<<<<< HEAD
     ^
SyntaxError: invalid syntax
+ cleanup
+ retcode=1
+ set +x


Exited with code exit status 1

See CircleCI build pytorch_linux_bionic_py3_6_clang9_noarch_build (15/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:06:41 SyntaxError: invalid syntax
Apr 01 18:06:41 Traceback (most recent call last):
Apr 01 18:06:41   File "test/run_test.py", line 15, in <module>
Apr 01 18:06:41     import torch
Apr 01 18:06:41   File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:06:41     from torch import optim as optim
Apr 01 18:06:41   File "/opt/conda/lib/python3.6/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:06:41     from .adam import Adam
Apr 01 18:06:41   File "/opt/conda/lib/python3.6/site-packages/torch/optim/adam.py", line 73
Apr 01 18:06:41     <<<<<<< HEAD
Apr 01 18:06:41      ^
Apr 01 18:06:41 SyntaxError: invalid syntax
Apr 01 18:06:41 =================== sccache compilation log ===================
Apr 01 18:06:41 + cleanup
Apr 01 18:06:41 + retcode=1
Apr 01 18:06:41 + set +x
Apr 01 18:06:41 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:06:41 Compile requests                   4894
Apr 01 18:06:41 Compile requests executed          4570
Apr 01 18:06:41 Cache hits                         4550
Apr 01 18:06:41 Cache hits (C/C++)                 4550
Apr 01 18:06:41 Cache misses                          1

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (16/16)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:05:32 SyntaxError: invalid syntax
Apr 01 18:05:32 Traceback (most recent call last):
Apr 01 18:05:32   File "test/run_test.py", line 15, in <module>
Apr 01 18:05:32     import torch
Apr 01 18:05:32   File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 649, in <module>
Apr 01 18:05:32     from torch import optim as optim
Apr 01 18:05:32   File "/opt/conda/lib/python3.6/site-packages/torch/optim/__init__.py", line 10, in <module>
Apr 01 18:05:32     from .adam import Adam
Apr 01 18:05:32   File "/opt/conda/lib/python3.6/site-packages/torch/optim/adam.py", line 73
Apr 01 18:05:32     <<<<<<< HEAD
Apr 01 18:05:32      ^
Apr 01 18:05:32 SyntaxError: invalid syntax
Apr 01 18:05:32 + cleanup
Apr 01 18:05:32 + retcode=1
Apr 01 18:05:32 + set +x
Apr 01 18:05:32 =================== sccache compilation log ===================
Apr 01 18:05:32 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:05:32 Compile requests                   4941
Apr 01 18:05:32 Compile requests executed          4589
Apr 01 18:05:32 Cache hits                         4569
Apr 01 18:05:32 Cache hits (C/C++)                 4569
Apr 01 18:05:32 Cache misses                          1

1 failure not recognized by patterns:

Job Step Action
GitHub Actions quick-checks Ensure correct trailing newlines 🔁 rerun

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

….optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
izdeby pushed a commit that referenced this pull request Dec 8, 2020
ghstack-source-id: 0d30191
Pull Request resolved: #49039
….optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
izdeby pushed a commit that referenced this pull request Dec 9, 2020
ghstack-source-id: abcf962
Pull Request resolved: #49039
….optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
izdeby pushed a commit that referenced this pull request Dec 10, 2020
ghstack-source-id: 22d43e6
Pull Request resolved: #49039
….optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
….optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
….optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
izdeby pushed a commit that referenced this pull request Dec 11, 2020
ghstack-source-id: 2f94dab
Pull Request resolved: #49039
….optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
izdeby added a commit that referenced this pull request Feb 8, 2021
ghstack-source-id: eb4309d
Pull Request resolved: #49039
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
Iurii Zdebskyi and others added 17 commits February 18, 2021 12:16
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
… torch.optim._multi_tensor"


Differential Revision: [D25406490](https://our.internmc.facebook.com/intern/diff/D25406490)

------

### Benchmark results
SGD (lr=1e-3, momentum=1, dampening=0, weight_decay=1, nesterov=True)
Current: 201.63 ms
Foreach: 56.99 ms

Adam (weight_decay=1., amsgrad=True)
Current: 233.27 ms
Foreach: 46.89 ms

AdamW (weight_decay=1., amsgrad=True)
Current: 371.18 ms
Foreach: 121.04 ms

RMSprop (weight_decay=1, momentum=1, centered=True)
Current: 364.88 ms
Foreach: 47.52 ms

Rprop (lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
Current: 1.43 s
Foreach: 1.26 s

ASGD (weight_decay=1)
Current: 165.39 ms
Foreach: 40.61 ms

Adamax (weight_decay=1)
Current: 374.42 ms
Foreach: 291.06 ms

Adadelta (weight_decay=1)
Current: 252.64 ms
Foreach: 29.62 ms

### Benchmark script
```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

model = torchvision.models.resnet.resnet101(pretrained=True).to("cuda")
targets = torch.randint(0, 1000, (100, 100), device="cuda")
criterion = nn.CrossEntropyLoss()

# optimizers
params = dict(weight_decay=1)
optimizer = optim.Adadelta(model.parameters(), **params) 
optimizer_mta = optim._multi_tensor.Adadelta(model.parameters(), **params)

running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device="cuda").random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device="cuda" , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )
    print(f"autorange:\n{timer.blocked_autorange()}\n\n")

    timer_mta = benchmark_utils.Timer(
        stmt="torch.cuda.synchronize(); optimizer_mta.step()",
        globals=globals(),
        label="str(optimizer_mta)",
    )
    print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

Hi @izdeby!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@ngimel ngimel removed their request for review May 30, 2021 23:45
@github-actions github-actions bot closed this May 12, 2022
@facebook-github-bot facebook-github-bot deleted the gh/izdeby/69/head branch June 11, 2022 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants