Skip to content
This repository was archived by the owner on Aug 1, 2025. It is now read-only.

Add a master-only test that at least one PrimTorch ref can be traced nopython#355

Merged
ezyang merged 1 commit intogh/ezyang/1/basefrom
gh/ezyang/1/head
Jun 9, 2022
Merged

Add a master-only test that at least one PrimTorch ref can be traced nopython#355
ezyang merged 1 commit intogh/ezyang/1/basefrom
gh/ezyang/1/head

Conversation

@ezyang
Copy link
Copy Markdown
Contributor

@ezyang ezyang commented Jun 9, 2022

…nopython

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 9, 2022
…nopython

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 221ff4b
Pull Request resolved: #355
@ezyang ezyang merged commit 6a4c96e into gh/ezyang/1/base Jun 9, 2022
ezyang added a commit that referenced this pull request Jun 9, 2022
…nopython

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 221ff4b
Pull Request resolved: #355
@ezyang ezyang deleted the gh/ezyang/1/head branch June 9, 2022 02:32
@pyjhzwh
Copy link
Copy Markdown
Contributor

pyjhzwh commented Jun 9, 2022

My python version is 3.9.12, torch version is '1.13.0.dev20220607+cu113'.
On main branch, make test will failed

FAILED tests/test_dynamic_shapes.py::DynamicShapesReproTests::test_primtorch_dynamic_shapes - AttributeError: module 'torch._refs' has no attr...
FAILED tests/test_repros.py::ReproTests::test_primtorch - AttributeError: module 'torch._refs' has no attribute '_ref'

Also,

$ python
Python 3.9.12
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch._refs
>>> torch._refs._ref(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'torch._refs' has no attribute '_ref'

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Jun 9, 2022

I goofed we need to land pytorch/pytorch#79178 on master

shiyu22 pushed a commit to towhee-io/towhee-compiler that referenced this pull request Sep 9, 2022
.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Add README.md

Improve counters and stats

Constant control flow

Support some function calls

Support calling submodules and methods

Add profiler to measure coverage

Measure overheads with TorchBench

Refactor tests

Support for unpacking, inplace, and matmul op

Rewrite how guards work

Cleanup and refactoring

Linting, formatting, and documentation

Fix crashes and add torchdynamo.reset()

Disable list arg unpacking

Support control flow with graph prefix

Minor refactoring and naming

Improve support for partial graphs

Increase coverage of comparisons, constants, modules

Fix for handling of iterators

Extract multiple graphs from control flow

Refactor binary ops

Handle more type of jump instructions

Support wrapping `Real` types (#1)

Allow using nn.Modules inside a list (#2)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Fix key error (#4)

Fix broken tests

Add support for EXTENDED_ARG

TorchBench and debugging improvements

Add support for staticmethod

Improve handling of unsupported variables

Allow using Tensors inside a list/tuple (#3)

* Allow using nn.Modules inside a list

* Allow using Tensors inside a list/tuple

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Implement `MAKE_FUNCTION` (#5)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Stricter typing and support nn.Sequential

Support tuple returns (#6)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Support global loads of bools (#7)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Support `len` (#8)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Fix bug in LOAD_GLOBAL

Support float constructor (#10)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Implement `isinstance` (#9)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Revamp resuming after unsupported things

Livevars and constant folding

Optional dtype/device/rank/shape specialization

Specialization and mutation guards

Improve support for config objects

Add support for IMPORT_NAME

Early work on optimizations and autotuning

Support for sizes at input args and dicts

Minor refactoring

Support for closures

Support ModuleList and strings

Support nested functions with closures

Fix build for gcc-7

Minor fixes for avx512 machines

Clean up stack size handling

Rewrite how graph resume works

Support for super()

Support zip and enumerate

Refactor variable construction into seperate file

Clean up handling of variable sources

Dictionary support and improve arg handling

Add IPEX backend

Improve handling of lists/dicts

Fix python 3.7 issues

Support module constants

Support basic list mutation

Support list comprehensions

Support for inline nn.Softmax()

Refactor and cleanup call_function

Support inlining generators

Dynamic .shape/.ndim support

Skip transformers.file_utils

Support property/classmethod

Support dict.__setitem__

Break graph on STORE_ATTR

Garbage collect generated code

Add trt backends

Improve cuda backends

Support latest pytorch/torchenchmark

Fixes in TRT baseline script

Fixes for GPU measurement

Refactor optimization backends and tuning

Retry autotuning failures

Fix static runtime backend

Fix lints

Cleanup list packing/unpacking

Fixes for new torchbench version

Work around crashes in static runtime

Fix vision_maskrcnn/detectron2_maskrcnn

Switch to alternate version of onnx2trt

Skip pyhpc_turbulent_kinetic_energy

Add backends to skipfiles

Fix weakref handling

Specialize on torch.is_tensor and torch.is_floating_point

Analysis and functionalization passes

Add dynamic dtype/device/shape propogation

Support namedtuple and dtype constants

Fix as_tensor issue

Improve support for range()

Config flag to control normalization

Fix __len__ issue

Support list.pop()

Allow inlining methods on UnsupportedVariable

Update README.md (#13)

[Backend][TVM] Support boolean as output (#14)

Remove extra call to torch.jit.fuser

Improve support for list/dict/len/str

Support list.extend and dict.update

Support for no_grad/enable_grad

Improve coverage of huggingface models

Refactor stack_op implementation

Fix chunk method in longformer

Fix bug with calls between nested functions

Support hasattr(namedtuple, ...)

Support for len(inspect.signature(fn).parameters)

Support for autograd.Function

Allow adding a BaseListVariable and a ConstantVariable together if the latter is an iterable (#15)

Refactor torchbench.py to use subprocess isolation

Switch project to use isort import format

Rewrite README.md

Improve docs in ./torchbench.py --help

Allow changing torchdynamo.config.dynamic_shapes without recompile

Refactor offline autotuner

Bugfix for bias towards eager

Add online version of autotuner

Fix some backend exceptions

Fix for bool inputs

Work around TensorRT abort() on group_norm

Improve error printing when backend fails

Fix bug with int64 in onnxrt

Add isolation to baseline runs

Fix some errors in copy_ for backend testing

Skip TRT for einsum models

Split out fixed_strategy1/fixed_strategy2

Disable TRT bypassing

Support direct calls to module.forward

Support for BUILD_TUPLE_UNPACK_WITH_CALL

Support map/reduce/sum of tensors

Split test_functions into two files

Fix reconstructing nested attrs

Improve support for HuggingFace ModelOutput() wrapper

Improve support for zip and __contains__

Fix issue with list multiply

Adding functorch to skipfile (#16)

Adding AOT Autograd API for inference (#17)

Add note about skipfiles

Fixes for latest torchbenchmark version

Support list mutation side effects

Refactor codegen related things into codegen.py

Refactor graph generation related things into output_graph.py

Support inlining super() calls of nn.Module subclasses

Support some simple cases of try/except

Refactor variable_tracker.py into many files

Support builtins module (#18)

Adding training optimizations (#19)

Allow constant folding through set()

Support for property/__getattr__ on user defined classes

Support more cases of varargs calls

Add nopython=True whole-program graph mode

Improve support for dataclasses

Refactor BuiltinVariable call_function handling

Add support for tuple iadd

Add support for UNPACK_EX bytecode

Improve handling for tuple constants

Support BUILD_LIST_UNPACK with tensor args

Support numpy integer constants

Support module.__class__.__name__

Add support for dict mutation side effects

Fix composability with FX generated code

Fix off by one bug in profile operator counts

Avoid over-specializing on dynamically created nn.Modules

Helper functions for AOT Autograd testing (#20)

Skip networkx for AOT (#21)

Run `make format`

Move non-specialized nn.Module handling to UnspecializedNNModuleVariable

Reuse generated code when control flow paths converge

Support resume while inside 'with no_grad()'

Support tuple_iterator

Support for getitem with default value

Support type(obj) calls

Fix bug in __getattr__ handling

add ltc backend (#23)

OSS Automated Fix: Addition of Code of Conduct (#25)

OSS Automated Fix: Addition of Contributing (#24)

Minor refactor in side_effects.py

`make format` and lint issues

Add lint workflow

Add LICENSE

Update CONTRIBUTING.md

Add test workflow

Update github workflows

Skipping logging module (#26)

Initial support for setattr side effects

Improve coverage statistics measurement

Support object creation side effects

Improve support for torch.distributions

Allow graph breaks on unsupported getitem

Remove nn.Sequential from skipfiles

Add CITATION.cff

Improve support for dynamic mutation of nn.Modules

Bugfix for mutating mutated attributes

Adding Torchbench training support (#27)

Deduplicate FX graph outputs

Support list.clear()

Adding missed random state reset (#28)

Fix issue with maskrcnn

Remove unneeded guards

Support HF ModelOutput() wrapper class

Avoid compiling the output of user_compiler

Fix ./torchbench.py --nothing

AOT Autograd fixes for moco, resnet50_qat, pytorch_struct (#29)

Save some memory while profiling

Fix AOT autograd bug where dynamo tries to compile generated backwards

Fix guards for 'mod.0.bias' attributes

Convert tests to use public API

Workaround for AOT Autograd LSTM bug (#31)

Fix handling of torch.manual_seed

Add `with torchdynamo.disable()` context manager

Support itertools.{chain,islice}

Support Tensor.is_quantized

Support multiple threads using TorchDynamo

Improve support and testing of dynamic shapes

Workaround for issue in hf_Bart in dynamic shape mode

Don't directly import from _eval_frame

Fix aliasing issue in #30

Support staticmethod/classmethod on user defined classes

Improve support for closures and dunder methods

Fix bug in handling of type annotations

Support 3+ nestings of closures

Fix threading issue for autograd threads

Add pthon autograd test case

IMPORT_NAME Instruction - Import the top-level package (#47)

hf_T5 dataclass fields handling (#50)

Using mean instead of sum to have reasonable loss value for backprop (#51)

Allow passing a string with a backend name to torchdynamo.optimize

Support nesting torchdynamo.optimize() decorated functions

Support __init__ of HF ModelOutput (#65)

Fill in missing fields in setup.py

Fix pip install issue

Add python key tracing backend

Fix build on M1 Max Mac (#63)

Fix lint

Fx2trt integration improvement (#71)

Refactor python_key_normalize

Handle mutation propagated by getitem (#83)

Torchbench changes for AOT Autograd (#84)

Fix string-based backend mode

Fix linter

Fix bug when torch and torchdynamo are in the same folder

Fx2trt pr2 (#97)

Fix lint github action

Add update 6 link to readme (#99)

Fix torchbench.py --nothing option (#100)

Skip tacotron2 and unskip vision_maskrcnn in torchbench.py (#102)

Add developer setup section to README.md (#105)

Fx2trt pr3 (#110)

Use Low overhead version AOT Module (#113)

Cleanup AOTAutograd related args (#114)

Add extra tolerance for some GPU models in Torchbench (#116)

Enabling few more torchbench models with AOT Autograd (#127)

Support formatted literal strings (f-strings) (#128)

* Support formatted literal strings (f-strings)

* TensorVariable var_getattr supports __class__ and add test case

* Address review comments

* Fix lint error

add an option to randomize inputs (#130)

Fix typo in IPEX backend (#126)

Handle torch size (#139)

Remove `gc.collect()` upon every model compute run. (#140)

Make STORE_SUBSCR break when unsupported. (#142)

* Make STORE_SUBSCR break when unsupported.

I probably could have done a bit more but this is enough to fix the
issue and I'll let someone more intrepid get this going comprehensively.
I'm also not sure how to test this.

Fixes #131

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix Python 3.7 compatibility issues & Add testing action workflow (#143)

Handle torch.cuda.current_device (#146)

Handle torch.seed (#141)

Handle torch.override.is_tensor_like (#144)

* Handle torch.override.is_tensor_like

* Making it a constant

* Comments

* Conflict

Preserve CUDA rng states during frame analysis (#147)

* Preserve CUDA rng states during frame analysis

* Retrigger CI

* Retrigger CI

* Debugging

* Debugging

* Debugging

AOT Autograd Training - few more models passing (#151)

* AOT Autograd Training - few more models passing

* Add skip API

* Skip inlining

Support Slice of NNModuleList (#152)

* Support Slice of NNModuleList

* Comments

Fix import overhead by using `importlib.util.find_spec` (#153)

Fix issues in #132 (#150)

Variable builder - handle slice (#155)

More skipfiles (#157)

* Add more skip modules

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

moco and hf_Reformer fixes (#158)

* hf_Reformer fixed

* Adding moco fix

Revert "Fix import overhead by using `importlib.util.find_spec` (#153)" (#160)

This reverts commit dedd9fa.

Add support for Python 3.9 (#154)

* Changes to make TorchDynamo support Python 3.9

* Fix lint

* Add Python 3.9 github test workflow

* Fix typo

* RERAISE to TERMINAL_OPCODES set

* Make IS_OP support ConstDictVariable

* Address comments

Monkey patch autograd.Variable (fixes Tacotron2) (#161)

Break graph on Tensor grad (#163)

* Break graph on Tensor grad

* Comments

* Suppress warning

enable support for staticmethod on superclass (#162)

Summary:

Adds support for tracing through this syntax:

```
class Parent(torch.nn.Module):
    @classmethod
    def foo(cls, x):
        x = x + x
        return x

class Child(Parent):
    @classmethod
    def helper(cls, x):
        // resolving super().foo failed before this PR
        x = super().foo(x)
        return x

    def forward(self, x):
        x = self.helper(x)
        return x
```

This is useful for eventually enabling __torch_function__ support.

Test plan:

```
pytest -vsk test_super_static_method
```

Options to print fx/aot traces (#164)

enable basic __torch_function__ support (#167)

Summary:

This adds a skeleton for `__torch_function__` support in torchdynamo.

What this is currently doing:

1. in variable builder, check for __torch_function__ and wrap tensors in TensorTFOverrideVariable if found
2. in TorchVariable.call_function, inline the __torch_function__ function of TensorTFOverrideVariable arguments
3. in GetAttrVariable.call_function, check for super().__torch_function__ which resolves to the original. If it's found, stop inlining and insert the function call into the graph.

The current test just creates a __torch_function__ override which doesn't do
anything but call super().__torch_function__.

Things left for future PRs:

* supporting call_method
* supporing actual logic inside the overrides
* implementing the full __torch_function__ spec (currently things are hardcoded to first argument only)

Test plan:

```
pytest -vsk test_simple_torch_function
// used to fail with https://www.internalfb.com/phabricator/paste/view/P496842415
// currently passes
```

Revert the monkey patching of variable, fixed in PyTorch (#173)

Miscellaneous small fixes and lints (#168)

Fixes for pytorch tests - 1/n (#174)

* Fixes for pytorch tests - 1/n

* Better comment

modify 'ipex' backend (#166)

Add torchdynamo.config.raise_on_backend_error (#177)

Fix for module returning (Tensor, None) (#176)

Fixes for pytorch tests 2/n - torch.Size and nn.Parameter (#182)

* Torch testing - Fix bugs for torch.Size and nn.Parameter

* CI failures

extend __torch_function__ support to `call_method` (#181)

Summary:

In pytorch/torchdynamo#167 we added `__torch_function__`
support for tracing through `call_function`.

This PR extends the support to also work on `call_method`.

Note: the LOC is high because some code was refactored to be reusable.
Note: implementing correct rewrapping logic for methods is saved for a
future PR, hope that's OK.

Test plan:

```
pytest -vsk test_simple_torch_function
```

Add supports for Python 3.10 (#172)

* Add supports for Python 3.10

* fix lint

* Add github test workflow for python 3.10

* Add lnotab and linetable writer test case

* Fix lint

* Fix lint

* Fix lint

* Split the unit test

* Remove some requirements and fix lint

* Add several new bytecodes in Python 3.10

* add Cython as requirements

* set numpy version

* Update README.md

Fix Pytorch tests 3/n - Skip exec frame (#184)

* Fix Pytorch tests 3/n - Skip exec frame

* CI failure

[PR] pass rewrite for supporting setitem (#188)

* temp changes

* add pass rewrite for setitem

* linter

Add --float16/--float32/--cosine options to torchbench.py (#189)

Don't import unused third party packages (#193)

Early version of TorchInductor (#190)

Fix for `KeyError: 'Size'` error (#194)

Add xarray to skipfiles (#196)

Update URLs to github.com/pytorch/torchdynamo (#199)

Add `./torchbench.py --fast` option (#198)

Pass backend-related ctx to TorchDynamo Optimize Context (#201)

* Pass backend-related ctx to TorchDynamo Optimize Context

* Reinit the backend ctx for every frame

* Doc

Detect x.new(torch.Size) and rewrite to torch.empty(tuple) (#195)

* Detect x.new(torch.Size) and rewrite to torch.empty(tuple)

* address comments

Skip frames when no graph is found (#205)

support inlining __torch_function__ with reading from closure (#197)

Summary:

The previous PRs to add `__torch_function__` support inlined through
`__torch_function__` without adding any guards for the function.

This worked for simple cases, but did not work if `__torch_function__`
needs to read a nonlocal variable, for example: https://gist.github.com/vkuzo/a3388fcaa532318d049368e96652b366
The reason it was broken is because the code which bound arguments
during inlining had to have a reference to a source in order to bind
things properly.

One way to fix this is to get the source of the `__torch_function__` attribute
of the original tensor, guard on it, and persist it through
all the rewrapping logic.

I'm flexible if there is a better alternative, lmk.

Test plan:

```
pytest -vsk test_torch_function_with_closure
```

[inductor] add lowerings for hardswish/hardsigmoid/hardtanh (#200)

[inductor] Handle +/-inf constants (#210)

Replace tensor.new with tensor.new_empty (#212)

add configuration for modules eligible for inlining (#208)

Summary:

Makes the source modules for `skipfiles.is_torch_inline_allowed` configurable.

This is needed for DBR quant integration exploration, we can now override this
config to allow torchdynamo to inline DBR quant utility functions.

Test plan:

Run this: https://gist.github.com/vkuzo/010e0483c9bbb35837cc9cb27c555243
it now advances past the error of "inlining in skipfiles"

[inductor] fix transposed convolution shape formula (#202)

Try finally block for with context on graph break instruction (#213)

* Try finally block for with context on graph break instruction

* fix test

* Support >= 3.9

* Support python 3.7

* Comments

* Replacing the global load with GlobalSource and reconstruct

Remove nn.Parameter filter hack for AOTAutograd backend (#214)

Fix test failures (#218)

Fix slicing list returning wrong result (#222)

* Correct ListVariable source

* Fix lint

Remove reference cycle (#223)

Refactor ConstDictVariable to support user_cls and use dict by default (#226)

Pin inductor CI to specific pytorch version (#229)

[inductor] support torch.linspace and torch.tensor (#217)

[inductor] add heuristic to decide layouts and loop orders (#216)

Revert "Fix slicing list returning wrong result (#222)" (#231)

This reverts commit 243222e.

Add equal_nan option to torchdynamo.testing.same() (#232)

Support device constants (#230)

Bail out for __setattr__ and fix ClassVariable handling (#227)

- avoid compiling __setattr__ functions as they may be difficult
  to correctly handle for arbitrary custom classes, but also aren't
  likely to be useful for torch module optimization
- expand the condition for constructing UserDefinedClassVariable
  to include ABCMeta classes via `inspect.isclass` check

Remove reference cycle - with exceptions (#228)

* Remove reference cycle - with exceptions

* Fix for InliningInstructionTranslator

Add fix for writing to closures (#233)

* Add fix for writing to closures

* run black

* one more time

Co-authored-by: Elias Ellison <eellison@devfair044.h1.fair>

Delete example value for unused args (#234)

Fix list slice & ConstantVariable to TupleVariable conversion missing source info (#235)

* Fix list slice & ConstantVariable to TupleVariable conversion miss source info.

* Update test cases

* Address comment

[inductor] early memory reuse and new operators (#237)

Add type checking in Constant match - Fix Pytorch tests 4/n (#238)

* Add type checking in Constant match

* Fix test

enable tracing through enum comparison (#245)

add support for tracing torch.nn.ModuleDict.__contains__ (#246)

Summary:

Adds support for tracing through this syntax:

```
class M(torch.nn.Module):
    def __init__(self, module_dict):
        super().__init__()
        self.module_dict = module_dict

    def forward(self, x):
        if "foo" in self.module_dict:
            x = torch.mul(x, 1.0)
        x = torch.add(x, 1.0)
        return x
```

This is useful for DBR quantization.

Test plan:

```
pytest -vsk test_nn_moduledict_contains
```

enumerate supports start argument (#240)

* enumerate supports start argument

* address comments

Make eval_frame thread safe (#239)

This should make eval_frame thread safe. Currently, the eval_frame is a global object, and different threads my step on each other setting a different one.

This changes the behavior to instead always* have a "shim" eval_frame which then routes to the correct behavior by looking at the thread-local associated object. This is thread safe because now the callback object is always thread safe, and we only use it to drive logic at frame eval time, as opposed to at callback registration time.

Currently, the logic for None/False/Callback is kept, but the False case could be easily collapsed behind the shim in a subsequent diff.

*Always here means always when dynamo is running. The shim is installed and removed based on keeping track of how many dynamo threads are running at the moment.

Add support for __subclasses__ (#242)

Fixes #241

Inline function jump on tensor condition should be unimplemented (#249)

cast model in no-isolate mode (#244)

support tracing __getitem__ of torch.nn.ModuleDict (#253)

Summary:

Supports tracing through

```
class ModuleDict(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.ModuleDict(
            {
                "0": torch.nn.Linear(10, 10),
            }
        )

    def forward(self, x):
        x = self.layers["0"](x)
        return x
```

This is useful for DBR quant.

Note: handling other logic for `ModuleDict` is left for future PRs.

Test plan:

```
pytest -vsk test_moduledict
```

Run torch inductor test on GPU machine (Part 1)  (#258)

* Run torch inductor test on GPU machine

* Land scale-config first

Implement verify_correctness #179 (#252)

* Wrapperbackend to enable verifying corretness of backends; set config.verify_correctness as True to enable it.

* move testing.same() to utils.py

Skip inductor tests on older pytorch versions (#257)

[inductor] Multi-devices, primtorch decomps, and many new ops (#243)

enable tracing through id(nn_module_variable) (#262)

Summary:

Enables tracing through this syntax:

```
class M(torch.nn.Module):
    def forward(self, x, ref_id):
        self_id = id(self)
        if self_id == ref_id:
            x = torch.mul(x, 1.0)
        x = torch.add(x, 1.0)
        return x
```

This is useful for DBR quant because it uses `id(module)` for some
FQN gymnastics.

Test plan:

```
pytest -vsk test_id_of_nn_module
```

enable tracing through frozenset contains of PyTorch ops (#251)

Summary:

Enables tracing through this syntax:

```
funcs = frozenset([torch.add])

def fn(x, func):
    if func in funcs:
        x = torch.add(x, 1.0)
    x = torch.mul(x, 1.0)
    return x
```

This is useful for DBR quantization.

Test plan:

```
pytest -vsk test_frozenset_torch_func_contains
```

Dump conv args into file (#261)

* dump convolution args into file

* add option --log-conv-args in torchbench.py

Fix generation tagging new (#263)

Simplify eval frame, merge _run_only (#264)

Allow layout=torch.strided in new_constant (#269)

Decomposition for nan_to_num (#268)

[inductor] Handle non-reduction reductions (#266)

Use unittest.mock.patch for test_verify_correctness (#265)

[inductor] Support sort/as_tensor/LongTensor (#267)

Fix inline list/dict mutation (#273)

* Fix inline list/dict mutation

* Fix lint

* Refact inline translator's replace_all

* Fix recursive inline replace

* Remove debug print

Support split_with_sizes (#272)

Light Refactor + Add support for torch.autograd.profiler.record_profile function (#274)

This diff takes GradModeVariable's logic and pulls it partially into a more generic ContextWrappingVariable base class intended for making it easier to write context managed code.

[inductor] Support input and slice mutation (#275)

Add recompile ux tests (#270)

Just a first step, this PR adds a few tests that starts to outline a proposed UX, and proposes mechanisms for setting/checking the #recompiles and cache limit to facilitate the testing

Skip non tensor frame (#248)

* Skip non tensor

* Skip non tensor frame

* Lint

* Jason comments

* Add decorator functionality

* Comments

Prioritize class method if there is duplicated attribute name (#278)

* Prioritize class method if there is duplicated attribute name

* Refactor var_getattr to make it consistent with native pytorch

[inductor] Improve merging of contiguous loops (#279)

Add support for STORE_GLOBAL (#286)

Summary:
1. Create a symbolic_global table to store a global variable name to an
unique object mapping, and the unique object is further used as a key to
index into the store_attr_mutations table in SideEffects.
2. The actual STORE_GLOGAL action is buffered by SideEffects and later
LOAD_GLOBAL just reads from SideEffects when appropriate. STORE_GLOBAL
is eventually applied after the generated graph.

Skip inductor CPU tests if there is no working c++ compiler (#283)

Collect guard failures into one warning at cache limit hit (#281)

- avoid warning on each guard failure separately (in cases cache limit > 1)
- instead, bundle a summary of gaurd failure warnings together at the
  time of cache limit hit

[inductor] Add support for more operators (#282)

[inductor] Improved indexing simplification and loop body representation (#289)

All base class methods take precedence if it's a nn.Module (#290)

Fx2trt pr4 (#294)

* temp changes

* add pass rewrite for setitem

* linter

* temp checkin

* squeeze for normalization

* code clean

* comments improvement

* comment out int64->int32

* linter

Fx2trt pr5 (#296)

* temp changes

* add pass rewrite for setitem

* linter

* temp checkin

* squeeze for normalization

* code clean

* comments improvement

* comment out int64->int32

* linter

* add a threshold for fall back to non-TRT

Support guarding inf constants (#300)

Pytorch tests 5/n - Graph break on MemberDescriptor type (#301)

* Graph break on MemberDescriptor type

* CI

update_locals_and_stack should use shared cache (#302)

* update_locals_and_stack should use shared cache

* update SideEffects.apply to use default cache

Implement verbose tensor guards check  (#287)

Verbose guard checks are guards used outside of the hot path
for providing specific failure information to the user on compile
cache miss.

This PR adds support for verbose guards and implements one for the
tensor guard, leaving other guards alone.

* Add tensor names to tensor guard failure message

Low precision support (#304)

* add low precision support to torchinductor triton backend

* remove temporary tests

* lint

* lint

* lint

Run clang-format on torchdynamo/_guards.cpp (#306)

Fix slowdown due to generation_tagging_new (#305)

* Use patched init to track dynamic modules + test gen tagging

Elaborate on error message for failing tensor type match (#307)

ConstDictVariable reconstruct should keep original order (#308)

[inductor] Minor fixes for latest PyTorch and benchmark harness codegen (#309)

Verbose guard check Bugfix (#311)

* Bugfix and clang format

* It wasn't will, it was me - Clang formatter

PyTorch tests - 6/n - Add type check for list/tuple elems in CONSTANT_MATCH (#303)

* Add type check for list/tuple elems in CONSTANT_MATCH

* recursive length guarding

* All decomp tests pass

* Filter out only useful guards

Add torchdynamo.allow_in_graph and torchdynamo.disallow_in_graph (#295)

Fix CI (#316)

* DONT MERGE - Checking CI

* fix

* fix

Fix repro test to unblock internal sync (#315)

Add constant checks for list of numpy integers (#313)

Skip inductor tests if sympy is missing (#320)

[Easy] fix reference to removed variable in debug trace (#323)

Add a basic compilation profiler (#312)

* Add a basic compilation profiler

* Include graph break reasons in compilation report

* lint and import issues

* Add --recompile_profiler option to torchbench.py

Filter out unimportant modules from allowed modules (#324)

* Filter out unimportant modules from allowed modules

* Remove typo

* Further cleanup

* CI testing

* Michael's comment

* Remove few more meaningless things

* Jason's comments

Prevent guard creation from accessing objects between __new__ and __init__ (#322)

Initial implementation of UnspecializedPrimitiveVariable (#321)

* Initial implementation of UnspecializedPrimitiveVariable

* Update heuristic

* Add test for no recompilations for different values

[inductor] Initial suppport for tiling output code (#317)

Fix test skip when sympy is missing (#333)

remove unconditional sympy import from test_torchdinductor.py (#334)

* remove unconditional sympy import from test_torchdinductor.py

conv in triton (#310)

* general conv and conv1x1 implementation in triton
* correctness check with torch baseline
* benchmarking on resnet50 layers
* enable `triton_ops.conv` to replace `aten.convolution` by setting config.triton.use_conv as True

[inductor] Minor float16 fix (#338)

update IPEX backend (#344)

Add _refs and _prims to the allowlist

This won't get exercised by real models but it's necessary so we
can test that PrimTorch decomps work under dynamo.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 874bbbd8d80d07fcd089a0d57849f21b2b81756d
Pull Request resolved: pytorch/torchdynamo#345

[inductor] Support additional operators (#339)

[inductor] Benchmark harness for training (#337)

codegen to update mutated variables with side effect should after stack value's codegen (#347)

Add a master-only test that at least one PrimTorch ref can be traced nopython

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 221ff4b897648f99a4552dd955b3426a2d6cb738
Pull Request resolved: pytorch/torchdynamo#355

Convert dict to list bebefor iterate in case there is possible delete (#360)

Add TROUBLESHOOTING.md (#357)

- link TROUBLESHOOTING.md from README.md and from recompilation warning

Addresses issue https://github.com/pytorch/torchdynamo/issues/348

Add coldstart/breakeven benchmark (#352)

try with:
python torchbench.py --cold-start -d cuda --training --use-eval-mode --nvfuser --isolate

This actually adds 2 new benchmark metrics:

coldstart: measures the worst of t_eager_compile / t_dynamo_compile as a 'speedup', where dynamo compiles twice to exercise profiling executor

breakeven: predicts the number of iterations dynamo would have to run to 'break even' with eager, considering the amortization of its compile cost

Not yet tested with inference or cpu, may have some other issues.

Should probably adjust to repeat the whole cold-start process several times and median, but for now just does this once.

Raise unimplemeted if checkpoint is empty (#351)

Only wrap in TorchVariable if is allowed, not if not disallowed

If you don't do this, allowed_functions_module_string_ignorelist
doesn't actually affect if we try to trace these functions into
the graph, since the disallowed list doesn't actually respect
this config.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: fad216f8a3013caaa2814cc3efaf5348e6033ee5
Pull Request resolved: pytorch/torchdynamo#356

List append should return None (#365)

Make TorchInductor use Triton's MM implementation in codegen (#325)

* Rebased on upstream

* Fixed make lint

* Fixed make lint

add yaml to requirements.txt (#367)

Disabling TorchDynamo inside torch.jit and torch.onnx compiler (#361)

* Disabling TorchDynamo inside torch.jit compiler

* Adding trace_module

* Remove the script

* Also adding ONNX

* Jason's comments

Workaround triton float64 log issue (#379)

Fix missing ir.Reduction.default_value (#378)

autotune conv kernels (#364)

* tuned_conv to choose the best kernel for given inputs shape, stride, layer params
* set config.triton.convolution as "aten"(default), "triton" or "autotune"

[inductor] Refactor fallback kernel handling (#381)

Add torch._decomps to the list to trace into

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: aa557ba00ea2fc03946089d56592041a5df31275
Pull Request resolved: pytorch/torchdynamo#369

Print value of TorchVariable object

I find this is helpful for debugging what exactly a given TorchVariable
is; presently there is no information so it is hard to tell.  Because
these are PyTorch variables they should be well behaved and it shouldn't
cause problems to call repr on them.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: a47f459ce2dd577b0133b9e59af0609e4123f60b
Pull Request resolved: pytorch/torchdynamo#370

Skipping the namedtuple subclass constructor (#382)

Adding missing guard for GET_ITER Bytecode (#386)

fx2trt_oss is merged to https://github.com/pytorch/TensorRT (#385)

* temp changes

* fx2trt_oss is merged to https://github.com/pytorch/TensorRT

* linter fix

Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) (#380)

* Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm)

* Reformatted

Fix for `any` decomposition (#384)

* fix  lowering

* restore abs

* black

Extend new to new_empty for tensor.shape (#387)

Support for lazy modules (#391)

* Support for lazy modules with test

Added fixes to support lazy modules in torchdynamo. The main issue that needed to be addressed is that LazyModules register some hooks which are run when a module is called. Torchdynamo typically calls the forward method instead of __call__ so these hooks were never run. In the case of LazyModules we now run and trace the __call__ method, and allow the original module to be mutated. In the future, we could do this for all modules, but there were cases where torchdynamo does not yet support functionality used in all hooks.

Enable tf32 in torchbench.py (#397)

Support TensorType checking (#395)

* Support TensorType checking

* Update torchdynamo/variables/builtin.py

Co-authored-by: Jason Ansel <jansel@jansel.net>

Co-authored-by: Jason Ansel <jansel@jansel.net>

[inductor] Canonicalize indexes for MemoryDep (#314)

Summary: By canoicalizing indexes, we can build more accurate and
flexible read-write dependency, which will allow more general kernel fusion.

Suppress warnings during the pre-checks on a frame (#396)

* Suppress warnings during the pre-checks on a frame

* Encapsulate in PatchTorch

[inductor] Do convolution bias in its own kernel (#403)

[inductor] Fix convolution output stride order (#400)

Added more shapes (from alexnet, BERT, hf_GPT2) to the inductor mm test (#398)

Reducing overhead of has_tensor_in_frame (#406)

[inductor] add manual_seed in test_round (#388)

[inductor] Improve heuristic for realizing buffers (#402)

[inductor] Fix typo preventing some fusions (#401)

[inductor] Fix for constants on wrong device (#405)

remove deepcopy in fx2trt (#407)

* temp changes

* remove deepcopy

Fix broken link in README (#408)

Adding nan (#383)

* Adding nan

* Jason's comments

Fixing import issues for PyTorch 1.12 rc branch (#411)

Raising tolerance after using tf32 (#415)

[inductor] Improve handling of reductions (#404)

Shape guard and isinstance fixes (#414)

* Shape guard and isinstance fixes and tests

* Add guard for any accessed Tensor attribute

Ensure dtype instances are not mapped to TorchVariable() (#394)

* Disallow dtype instances

* Extract dtypes from torch automatically and add test

Propagate guards from ConstDict variables (#421)

Config driven support for torch.Tensor .item()  (#417)

[inductor] Refactor helpers into torchinductor.utils (#418)

Removing generation field from the patched nn.Module (#423)

* Removing generation field from the patched nn.Module

* Lint

* Rebseing

[Inductor] Triton template for conv (#422)

* TritonTemplateKernel, template_codegen, conv jinja2 template
* pip install Jinja2 in setup_nightly

Remove unnecessary call to clone which also caused a segfault (??) (#427)

Disable TorchDynamo on frames created by fx symbolic tracing (#429)

Fix for disabling triton templates (#430)

Minor test fixes (#432)

[inductor] Remove dead stores after fusion (#409)

Summary: Use DeferredLine/DeferredIndentedBuffer to perform a lazy emit of
buffer allocation/store after we determine which buffers are redundant.

Add support for iterating over a dict (#436)

Check if kwargs has key "fp16_mode" when determining the precision (#437)

[inductor] Add heuristic to set num_warps (#433)

update accuracy check for TRT fp32 (#438)

* temp changes

* fix an issue in fp32, change accuracy check to cosine similarity for fp32 since TRT fp32 could not meet 1e-4

 Pin CI to June 20th Torch Nightly (#441)

* Pin CI to June 20th Torch Nightly

* Respond to commetns

* duplicated set up..

[inductor] Add some prims (#431)

Disabling the trace instead of symbolic trace (#443)

Break graph on torch.Storage types (#428)

* Break graph on torch.Storage types

* Hmm, CI failing, trying Jason's suggestion

* Debug CI

Fix guard propagation for tuple iterators (#448)

[inductor] Improve fusing of tiled + untiled (#446)

Directly compute sum of a list of floats/ints (#449)

* Directly compute sum of a list of floats/ints

* Test

Added option use_bmm to enable triton codegen for bmm (#393)

* Added option use_bmm to enable triton codegen for bmm
* Added more shapes to microbench

Add support for crossentropy (#450)

Rewrite symbolic_locals for torch.return_types (#442)

* Rewrite symbolic_locals for torch.return_types

* Special casing on the out kwargs

* Replace == with is

Remove traced op overloads before compiling (#455)

* Remove op overloads

* lint

Raise errors when backends throw exceptions (#451)

Huggingface model benchmarking (#459)

[inductor] Support scatter operations (#434)

Step 2 of supporting UnspecializedNumpyVariable & UnspecializedPythonVariable (#392)

* Implement UnspecializedPrimitiveVariable codegen

* Make UnspecilizedPrimitiveVariable as GraphArg

* Update make_call_generated_code

* Update min/max builtin func

* Support random.random

* Remove unnecessary change

* Fix lint

* Refactor to support multiple random.random

* Refactor out unspecialized numpy and python variables

* Fix RandomValueSource guard

* Support multiple random functions

* Rebase to updated main

* Refactor out random_values_var

* Fix lint

* Fix lint

* Move random_values_var to output graph

* Add need_unwrap to distinguish unspec from x.item()

* Make global rand func unique

* Fix lint

* Add raw value propagation for unspec variables

* Fix lint

* Directly load type(raw_value) & update random func example value

* Fix lint

Add Fake Tensor Propagation (#426)

* Add Fake Tensor Propagation

* extend test

* lint

* fix import

* one more day..

* update functorch commit

* bump one more day to get pytorch/pytorch#79741

* Skip test

* use FakeTensorError

* lint

* Guard on fake tensor availability

* test skips (fix en route in core)

* lint

* update nightly

* lint

* update

* format

* update recent

[inductor] Cherry pick nll_loss_forward decomp (#456)

[inductor] Register lowerings for operator overloads (#457)

Add support for torch.finfo/torch.iinfo (#470)

[inductor] Auto-download gcc12 from conda-forge (#471)

Rename benchmarking files (#472)

Add support for named_params and named_modules (#465)

[inductor] Add a metric to count the number of generated kernels (#476)

Summary: This can be used to prevent a regression on our fusion result.

Extend python_key_normalize with support for PythonTensor class override and a post trace hook (#424)

* Add support for custom class

* lint

* Fix unpack to reflect main

* Feedback

* Simplify, rebase

* Lint, format

Flag to skip printing of Dynamo internal exceptions (#480)

Don't leak cache entry on skip

Fixes pytorch/torchdynamo#477

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 5b73c3470173b894d188c6e9290ebc3cbbef2bab
Pull Request resolved: pytorch/torchdynamo#478

Reduction cache (#487)

* enable cse for reductions

* self.cse

use fake tensors when deep copying model to check mutation (#486)

* use fake tensors when deepcopying model to check mutation

* fix fake tensors not available

* add tests

Added cos lowering (#492)

fix microbenchmarks import path (#474)

Compute multilayer reductions in higher precision (#484)

* Compute multilayer reductions in higher precision

* Compute whole mean kernel in higher precision, downcast in end

* update test

* lint

* skip test

Fix correctness checking code and non-deterministic benchmarks (#493)

Improve recompilation warning (#494)

- default to printing only the most recent guard failure not one failure for each cache miss
- reformat the text to be (hopefully) more readable and useful

Motivation:
While in some cases, knowing the individual failure reasons for each of (say, 64) cache misses could be useful, i practice it is probably good enough to know the most recent one since they tend to be similar reasons (such as incrementing counters or new object ids triggering the same type of guard).

Previously:
torchdynamo hit recompilation cache limit (64) for function 'toy_example' (example.py:5), due to the following guard failures: [['___guarded_code.valid'], {... 62 more times...}, ['___guarded_code.valid']]to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md

Now:
torchdynamo hit config.cache_size_limit (64)
   function: 'toy_example' (example.py:5)
   reasons:  ['___guarded_code.valid']

Add a new backend option for TVM's meta_schedule (#479)

Add (experimental) support for exporting a graph and guards (#469)

Makefile/packaging updates (#499)

[inductor] Misc small improvements (#475)

Adding logging config (#504)

More Huggingface models  (#500)

* More Huggingface models (from simple_dl)

* Comments

Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)

Revert "Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)" (#509)

This reverts commit c747580.

Disable torchdynamo inside dispatch_trace (#508)

[inductor] Support rand()/dropout() (#505)

[inductor] Workaround triton bug with XBLOCK=1 (#510)

Reduction deps (#502)

* mark more reduction dependencies

* cleanup

* black

* make sure canonicalization prefix is the same everywhere

change tuning trial of meta schedule (#513)

[inductor] Fix bug with invalidated reuse (#506)

[inductor] Handle no-op slice_scatter (#507)

[WIP] Adding AMP support in benchmark infra (#464)

[WIP][Discussion] Write out a deeper documentation on how we go from … (#498)

* [WIP][Discussion] Write out a deeper documentation on how we go from user code to producing guards

* Update GuardsOverviewPt1.md

* Update GuardsOverviewPt1.md
shiyu22 pushed a commit to towhee-io/towhee-compiler that referenced this pull request Sep 9, 2022
.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Add README.md

Improve counters and stats

Constant control flow

Support some function calls

Support calling submodules and methods

Add profiler to measure coverage

Measure overheads with TorchBench

Refactor tests

Support for unpacking, inplace, and matmul op

Rewrite how guards work

Cleanup and refactoring

Linting, formatting, and documentation

Fix crashes and add torchdynamo.reset()

Disable list arg unpacking

Support control flow with graph prefix

Minor refactoring and naming

Improve support for partial graphs

Increase coverage of comparisons, constants, modules

Fix for handling of iterators

Extract multiple graphs from control flow

Refactor binary ops

Handle more type of jump instructions

Support wrapping `Real` types (#1)

Allow using nn.Modules inside a list (#2)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Fix key error (#4)

Fix broken tests

Add support for EXTENDED_ARG

TorchBench and debugging improvements

Add support for staticmethod

Improve handling of unsupported variables

Allow using Tensors inside a list/tuple (#3)

* Allow using nn.Modules inside a list

* Allow using Tensors inside a list/tuple

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Implement `MAKE_FUNCTION` (#5)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Stricter typing and support nn.Sequential

Support tuple returns (#6)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Support global loads of bools (#7)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Support `len` (#8)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Fix bug in LOAD_GLOBAL

Support float constructor (#10)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Implement `isinstance` (#9)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Revamp resuming after unsupported things

Livevars and constant folding

Optional dtype/device/rank/shape specialization

Specialization and mutation guards

Improve support for config objects

Add support for IMPORT_NAME

Early work on optimizations and autotuning

Support for sizes at input args and dicts

Minor refactoring

Support for closures

Support ModuleList and strings

Support nested functions with closures

Fix build for gcc-7

Minor fixes for avx512 machines

Clean up stack size handling

Rewrite how graph resume works

Support for super()

Support zip and enumerate

Refactor variable construction into seperate file

Clean up handling of variable sources

Dictionary support and improve arg handling

Add IPEX backend

Improve handling of lists/dicts

Fix python 3.7 issues

Support module constants

Support basic list mutation

Support list comprehensions

Support for inline nn.Softmax()

Refactor and cleanup call_function

Support inlining generators

Dynamic .shape/.ndim support

Skip transformers.file_utils

Support property/classmethod

Support dict.__setitem__

Break graph on STORE_ATTR

Garbage collect generated code

Add trt backends

Improve cuda backends

Support latest pytorch/torchenchmark

Fixes in TRT baseline script

Fixes for GPU measurement

Refactor optimization backends and tuning

Retry autotuning failures

Fix static runtime backend

Fix lints

Cleanup list packing/unpacking

Fixes for new torchbench version

Work around crashes in static runtime

Fix vision_maskrcnn/detectron2_maskrcnn

Switch to alternate version of onnx2trt

Skip pyhpc_turbulent_kinetic_energy

Add backends to skipfiles

Fix weakref handling

Specialize on torch.is_tensor and torch.is_floating_point

Analysis and functionalization passes

Add dynamic dtype/device/shape propogation

Support namedtuple and dtype constants

Fix as_tensor issue

Improve support for range()

Config flag to control normalization

Fix __len__ issue

Support list.pop()

Allow inlining methods on UnsupportedVariable

Update README.md (#13)

[Backend][TVM] Support boolean as output (#14)

Remove extra call to torch.jit.fuser

Improve support for list/dict/len/str

Support list.extend and dict.update

Support for no_grad/enable_grad

Improve coverage of huggingface models

Refactor stack_op implementation

Fix chunk method in longformer

Fix bug with calls between nested functions

Support hasattr(namedtuple, ...)

Support for len(inspect.signature(fn).parameters)

Support for autograd.Function

Allow adding a BaseListVariable and a ConstantVariable together if the latter is an iterable (#15)

Refactor torchbench.py to use subprocess isolation

Switch project to use isort import format

Rewrite README.md

Improve docs in ./torchbench.py --help

Allow changing torchdynamo.config.dynamic_shapes without recompile

Refactor offline autotuner

Bugfix for bias towards eager

Add online version of autotuner

Fix some backend exceptions

Fix for bool inputs

Work around TensorRT abort() on group_norm

Improve error printing when backend fails

Fix bug with int64 in onnxrt

Add isolation to baseline runs

Fix some errors in copy_ for backend testing

Skip TRT for einsum models

Split out fixed_strategy1/fixed_strategy2

Disable TRT bypassing

Support direct calls to module.forward

Support for BUILD_TUPLE_UNPACK_WITH_CALL

Support map/reduce/sum of tensors

Split test_functions into two files

Fix reconstructing nested attrs

Improve support for HuggingFace ModelOutput() wrapper

Improve support for zip and __contains__

Fix issue with list multiply

Adding functorch to skipfile (#16)

Adding AOT Autograd API for inference (#17)

Add note about skipfiles

Fixes for latest torchbenchmark version

Support list mutation side effects

Refactor codegen related things into codegen.py

Refactor graph generation related things into output_graph.py

Support inlining super() calls of nn.Module subclasses

Support some simple cases of try/except

Refactor variable_tracker.py into many files

Support builtins module (#18)

Adding training optimizations (#19)

Allow constant folding through set()

Support for property/__getattr__ on user defined classes

Support more cases of varargs calls

Add nopython=True whole-program graph mode

Improve support for dataclasses

Refactor BuiltinVariable call_function handling

Add support for tuple iadd

Add support for UNPACK_EX bytecode

Improve handling for tuple constants

Support BUILD_LIST_UNPACK with tensor args

Support numpy integer constants

Support module.__class__.__name__

Add support for dict mutation side effects

Fix composability with FX generated code

Fix off by one bug in profile operator counts

Avoid over-specializing on dynamically created nn.Modules

Helper functions for AOT Autograd testing (#20)

Skip networkx for AOT (#21)

Run `make format`

Move non-specialized nn.Module handling to UnspecializedNNModuleVariable

Reuse generated code when control flow paths converge

Support resume while inside 'with no_grad()'

Support tuple_iterator

Support for getitem with default value

Support type(obj) calls

Fix bug in __getattr__ handling

add ltc backend (#23)

OSS Automated Fix: Addition of Code of Conduct (#25)

OSS Automated Fix: Addition of Contributing (#24)

Minor refactor in side_effects.py

`make format` and lint issues

Add lint workflow

Add LICENSE

Update CONTRIBUTING.md

Add test workflow

Update github workflows

Skipping logging module (#26)

Initial support for setattr side effects

Improve coverage statistics measurement

Support object creation side effects

Improve support for torch.distributions

Allow graph breaks on unsupported getitem

Remove nn.Sequential from skipfiles

Add CITATION.cff

Improve support for dynamic mutation of nn.Modules

Bugfix for mutating mutated attributes

Adding Torchbench training support (#27)

Deduplicate FX graph outputs

Support list.clear()

Adding missed random state reset (#28)

Fix issue with maskrcnn

Remove unneeded guards

Support HF ModelOutput() wrapper class

Avoid compiling the output of user_compiler

Fix ./torchbench.py --nothing

AOT Autograd fixes for moco, resnet50_qat, pytorch_struct (#29)

Save some memory while profiling

Fix AOT autograd bug where dynamo tries to compile generated backwards

Fix guards for 'mod.0.bias' attributes

Convert tests to use public API

Workaround for AOT Autograd LSTM bug (#31)

Fix handling of torch.manual_seed

Add `with torchdynamo.disable()` context manager

Support itertools.{chain,islice}

Support Tensor.is_quantized

Support multiple threads using TorchDynamo

Improve support and testing of dynamic shapes

Workaround for issue in hf_Bart in dynamic shape mode

Don't directly import from _eval_frame

Fix aliasing issue in #30

Support staticmethod/classmethod on user defined classes

Improve support for closures and dunder methods

Fix bug in handling of type annotations

Support 3+ nestings of closures

Fix threading issue for autograd threads

Add pthon autograd test case

IMPORT_NAME Instruction - Import the top-level package (#47)

hf_T5 dataclass fields handling (#50)

Using mean instead of sum to have reasonable loss value for backprop (#51)

Allow passing a string with a backend name to torchdynamo.optimize

Support nesting torchdynamo.optimize() decorated functions

Support __init__ of HF ModelOutput (#65)

Fill in missing fields in setup.py

Fix pip install issue

Add python key tracing backend

Fix build on M1 Max Mac (#63)

Fix lint

Fx2trt integration improvement (#71)

Refactor python_key_normalize

Handle mutation propagated by getitem (#83)

Torchbench changes for AOT Autograd (#84)

Fix string-based backend mode

Fix linter

Fix bug when torch and torchdynamo are in the same folder

Fx2trt pr2 (#97)

Fix lint github action

Add update 6 link to readme (#99)

Fix torchbench.py --nothing option (#100)

Skip tacotron2 and unskip vision_maskrcnn in torchbench.py (#102)

Add developer setup section to README.md (#105)

Fx2trt pr3 (#110)

Use Low overhead version AOT Module (#113)

Cleanup AOTAutograd related args (#114)

Add extra tolerance for some GPU models in Torchbench (#116)

Enabling few more torchbench models with AOT Autograd (#127)

Support formatted literal strings (f-strings) (#128)

* Support formatted literal strings (f-strings)

* TensorVariable var_getattr supports __class__ and add test case

* Address review comments

* Fix lint error

add an option to randomize inputs (#130)

Fix typo in IPEX backend (#126)

Handle torch size (#139)

Remove `gc.collect()` upon every model compute run. (#140)

Make STORE_SUBSCR break when unsupported. (#142)

* Make STORE_SUBSCR break when unsupported.

I probably could have done a bit more but this is enough to fix the
issue and I'll let someone more intrepid get this going comprehensively.
I'm also not sure how to test this.

Fixes #131

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix Python 3.7 compatibility issues & Add testing action workflow (#143)

Handle torch.cuda.current_device (#146)

Handle torch.seed (#141)

Handle torch.override.is_tensor_like (#144)

* Handle torch.override.is_tensor_like

* Making it a constant

* Comments

* Conflict

Preserve CUDA rng states during frame analysis (#147)

* Preserve CUDA rng states during frame analysis

* Retrigger CI

* Retrigger CI

* Debugging

* Debugging

* Debugging

AOT Autograd Training - few more models passing (#151)

* AOT Autograd Training - few more models passing

* Add skip API

* Skip inlining

Support Slice of NNModuleList (#152)

* Support Slice of NNModuleList

* Comments

Fix import overhead by using `importlib.util.find_spec` (#153)

Fix issues in #132 (#150)

Variable builder - handle slice (#155)

More skipfiles (#157)

* Add more skip modules

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

moco and hf_Reformer fixes (#158)

* hf_Reformer fixed

* Adding moco fix

Revert "Fix import overhead by using `importlib.util.find_spec` (#153)" (#160)

This reverts commit dedd9fa.

Add support for Python 3.9 (#154)

* Changes to make TorchDynamo support Python 3.9

* Fix lint

* Add Python 3.9 github test workflow

* Fix typo

* RERAISE to TERMINAL_OPCODES set

* Make IS_OP support ConstDictVariable

* Address comments

Monkey patch autograd.Variable (fixes Tacotron2) (#161)

Break graph on Tensor grad (#163)

* Break graph on Tensor grad

* Comments

* Suppress warning

enable support for staticmethod on superclass (#162)

Summary:

Adds support for tracing through this syntax:

```
class Parent(torch.nn.Module):
    @classmethod
    def foo(cls, x):
        x = x + x
        return x

class Child(Parent):
    @classmethod
    def helper(cls, x):
        // resolving super().foo failed before this PR
        x = super().foo(x)
        return x

    def forward(self, x):
        x = self.helper(x)
        return x
```

This is useful for eventually enabling __torch_function__ support.

Test plan:

```
pytest -vsk test_super_static_method
```

Options to print fx/aot traces (#164)

enable basic __torch_function__ support (#167)

Summary:

This adds a skeleton for `__torch_function__` support in torchdynamo.

What this is currently doing:

1. in variable builder, check for __torch_function__ and wrap tensors in TensorTFOverrideVariable if found
2. in TorchVariable.call_function, inline the __torch_function__ function of TensorTFOverrideVariable arguments
3. in GetAttrVariable.call_function, check for super().__torch_function__ which resolves to the original. If it's found, stop inlining and insert the function call into the graph.

The current test just creates a __torch_function__ override which doesn't do
anything but call super().__torch_function__.

Things left for future PRs:

* supporting call_method
* supporing actual logic inside the overrides
* implementing the full __torch_function__ spec (currently things are hardcoded to first argument only)

Test plan:

```
pytest -vsk test_simple_torch_function
// used to fail with https://www.internalfb.com/phabricator/paste/view/P496842415
// currently passes
```

Revert the monkey patching of variable, fixed in PyTorch (#173)

Miscellaneous small fixes and lints (#168)

Fixes for pytorch tests - 1/n (#174)

* Fixes for pytorch tests - 1/n

* Better comment

modify 'ipex' backend (#166)

Add torchdynamo.config.raise_on_backend_error (#177)

Fix for module returning (Tensor, None) (#176)

Fixes for pytorch tests 2/n - torch.Size and nn.Parameter (#182)

* Torch testing - Fix bugs for torch.Size and nn.Parameter

* CI failures

extend __torch_function__ support to `call_method` (#181)

Summary:

In pytorch/torchdynamo#167 we added `__torch_function__`
support for tracing through `call_function`.

This PR extends the support to also work on `call_method`.

Note: the LOC is high because some code was refactored to be reusable.
Note: implementing correct rewrapping logic for methods is saved for a
future PR, hope that's OK.

Test plan:

```
pytest -vsk test_simple_torch_function
```

Add supports for Python 3.10 (#172)

* Add supports for Python 3.10

* fix lint

* Add github test workflow for python 3.10

* Add lnotab and linetable writer test case

* Fix lint

* Fix lint

* Fix lint

* Split the unit test

* Remove some requirements and fix lint

* Add several new bytecodes in Python 3.10

* add Cython as requirements

* set numpy version

* Update README.md

Fix Pytorch tests 3/n - Skip exec frame (#184)

* Fix Pytorch tests 3/n - Skip exec frame

* CI failure

[PR] pass rewrite for supporting setitem (#188)

* temp changes

* add pass rewrite for setitem

* linter

Add --float16/--float32/--cosine options to torchbench.py (#189)

Don't import unused third party packages (#193)

Early version of TorchInductor (#190)

Fix for `KeyError: 'Size'` error (#194)

Add xarray to skipfiles (#196)

Update URLs to github.com/pytorch/torchdynamo (#199)

Add `./torchbench.py --fast` option (#198)

Pass backend-related ctx to TorchDynamo Optimize Context (#201)

* Pass backend-related ctx to TorchDynamo Optimize Context

* Reinit the backend ctx for every frame

* Doc

Detect x.new(torch.Size) and rewrite to torch.empty(tuple) (#195)

* Detect x.new(torch.Size) and rewrite to torch.empty(tuple)

* address comments

Skip frames when no graph is found (#205)

support inlining __torch_function__ with reading from closure (#197)

Summary:

The previous PRs to add `__torch_function__` support inlined through
`__torch_function__` without adding any guards for the function.

This worked for simple cases, but did not work if `__torch_function__`
needs to read a nonlocal variable, for example: https://gist.github.com/vkuzo/a3388fcaa532318d049368e96652b366
The reason it was broken is because the code which bound arguments
during inlining had to have a reference to a source in order to bind
things properly.

One way to fix this is to get the source of the `__torch_function__` attribute
of the original tensor, guard on it, and persist it through
all the rewrapping logic.

I'm flexible if there is a better alternative, lmk.

Test plan:

```
pytest -vsk test_torch_function_with_closure
```

[inductor] add lowerings for hardswish/hardsigmoid/hardtanh (#200)

[inductor] Handle +/-inf constants (#210)

Replace tensor.new with tensor.new_empty (#212)

add configuration for modules eligible for inlining (#208)

Summary:

Makes the source modules for `skipfiles.is_torch_inline_allowed` configurable.

This is needed for DBR quant integration exploration, we can now override this
config to allow torchdynamo to inline DBR quant utility functions.

Test plan:

Run this: https://gist.github.com/vkuzo/010e0483c9bbb35837cc9cb27c555243
it now advances past the error of "inlining in skipfiles"

[inductor] fix transposed convolution shape formula (#202)

Try finally block for with context on graph break instruction (#213)

* Try finally block for with context on graph break instruction

* fix test

* Support >= 3.9

* Support python 3.7

* Comments

* Replacing the global load with GlobalSource and reconstruct

Remove nn.Parameter filter hack for AOTAutograd backend (#214)

Fix test failures (#218)

Fix slicing list returning wrong result (#222)

* Correct ListVariable source

* Fix lint

Remove reference cycle (#223)

Refactor ConstDictVariable to support user_cls and use dict by default (#226)

Pin inductor CI to specific pytorch version (#229)

[inductor] support torch.linspace and torch.tensor (#217)

[inductor] add heuristic to decide layouts and loop orders (#216)

Revert "Fix slicing list returning wrong result (#222)" (#231)

This reverts commit 243222e.

Add equal_nan option to torchdynamo.testing.same() (#232)

Support device constants (#230)

Bail out for __setattr__ and fix ClassVariable handling (#227)

- avoid compiling __setattr__ functions as they may be difficult
  to correctly handle for arbitrary custom classes, but also aren't
  likely to be useful for torch module optimization
- expand the condition for constructing UserDefinedClassVariable
  to include ABCMeta classes via `inspect.isclass` check

Remove reference cycle - with exceptions (#228)

* Remove reference cycle - with exceptions

* Fix for InliningInstructionTranslator

Add fix for writing to closures (#233)

* Add fix for writing to closures

* run black

* one more time

Co-authored-by: Elias Ellison <eellison@devfair044.h1.fair>

Delete example value for unused args (#234)

Fix list slice & ConstantVariable to TupleVariable conversion missing source info (#235)

* Fix list slice & ConstantVariable to TupleVariable conversion miss source info.

* Update test cases

* Address comment

[inductor] early memory reuse and new operators (#237)

Add type checking in Constant match - Fix Pytorch tests 4/n (#238)

* Add type checking in Constant match

* Fix test

enable tracing through enum comparison (#245)

add support for tracing torch.nn.ModuleDict.__contains__ (#246)

Summary:

Adds support for tracing through this syntax:

```
class M(torch.nn.Module):
    def __init__(self, module_dict):
        super().__init__()
        self.module_dict = module_dict

    def forward(self, x):
        if "foo" in self.module_dict:
            x = torch.mul(x, 1.0)
        x = torch.add(x, 1.0)
        return x
```

This is useful for DBR quantization.

Test plan:

```
pytest -vsk test_nn_moduledict_contains
```

enumerate supports start argument (#240)

* enumerate supports start argument

* address comments

Make eval_frame thread safe (#239)

This should make eval_frame thread safe. Currently, the eval_frame is a global object, and different threads my step on each other setting a different one.

This changes the behavior to instead always* have a "shim" eval_frame which then routes to the correct behavior by looking at the thread-local associated object. This is thread safe because now the callback object is always thread safe, and we only use it to drive logic at frame eval time, as opposed to at callback registration time.

Currently, the logic for None/False/Callback is kept, but the False case could be easily collapsed behind the shim in a subsequent diff.

*Always here means always when dynamo is running. The shim is installed and removed based on keeping track of how many dynamo threads are running at the moment.

Add support for __subclasses__ (#242)

Fixes #241

Inline function jump on tensor condition should be unimplemented (#249)

cast model in no-isolate mode (#244)

support tracing __getitem__ of torch.nn.ModuleDict (#253)

Summary:

Supports tracing through

```
class ModuleDict(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.ModuleDict(
            {
                "0": torch.nn.Linear(10, 10),
            }
        )

    def forward(self, x):
        x = self.layers["0"](x)
        return x
```

This is useful for DBR quant.

Note: handling other logic for `ModuleDict` is left for future PRs.

Test plan:

```
pytest -vsk test_moduledict
```

Run torch inductor test on GPU machine (Part 1)  (#258)

* Run torch inductor test on GPU machine

* Land scale-config first

Implement verify_correctness #179 (#252)

* Wrapperbackend to enable verifying corretness of backends; set config.verify_correctness as True to enable it.

* move testing.same() to utils.py

Skip inductor tests on older pytorch versions (#257)

[inductor] Multi-devices, primtorch decomps, and many new ops (#243)

enable tracing through id(nn_module_variable) (#262)

Summary:

Enables tracing through this syntax:

```
class M(torch.nn.Module):
    def forward(self, x, ref_id):
        self_id = id(self)
        if self_id == ref_id:
            x = torch.mul(x, 1.0)
        x = torch.add(x, 1.0)
        return x
```

This is useful for DBR quant because it uses `id(module)` for some
FQN gymnastics.

Test plan:

```
pytest -vsk test_id_of_nn_module
```

enable tracing through frozenset contains of PyTorch ops (#251)

Summary:

Enables tracing through this syntax:

```
funcs = frozenset([torch.add])

def fn(x, func):
    if func in funcs:
        x = torch.add(x, 1.0)
    x = torch.mul(x, 1.0)
    return x
```

This is useful for DBR quantization.

Test plan:

```
pytest -vsk test_frozenset_torch_func_contains
```

Dump conv args into file (#261)

* dump convolution args into file

* add option --log-conv-args in torchbench.py

Fix generation tagging new (#263)

Simplify eval frame, merge _run_only (#264)

Allow layout=torch.strided in new_constant (#269)

Decomposition for nan_to_num (#268)

[inductor] Handle non-reduction reductions (#266)

Use unittest.mock.patch for test_verify_correctness (#265)

[inductor] Support sort/as_tensor/LongTensor (#267)

Fix inline list/dict mutation (#273)

* Fix inline list/dict mutation

* Fix lint

* Refact inline translator's replace_all

* Fix recursive inline replace

* Remove debug print

Support split_with_sizes (#272)

Light Refactor + Add support for torch.autograd.profiler.record_profile function (#274)

This diff takes GradModeVariable's logic and pulls it partially into a more generic ContextWrappingVariable base class intended for making it easier to write context managed code.

[inductor] Support input and slice mutation (#275)

Add recompile ux tests (#270)

Just a first step, this PR adds a few tests that starts to outline a proposed UX, and proposes mechanisms for setting/checking the #recompiles and cache limit to facilitate the testing

Skip non tensor frame (#248)

* Skip non tensor

* Skip non tensor frame

* Lint

* Jason comments

* Add decorator functionality

* Comments

Prioritize class method if there is duplicated attribute name (#278)

* Prioritize class method if there is duplicated attribute name

* Refactor var_getattr to make it consistent with native pytorch

[inductor] Improve merging of contiguous loops (#279)

Add support for STORE_GLOBAL (#286)

Summary:
1. Create a symbolic_global table to store a global variable name to an
unique object mapping, and the unique object is further used as a key to
index into the store_attr_mutations table in SideEffects.
2. The actual STORE_GLOGAL action is buffered by SideEffects and later
LOAD_GLOBAL just reads from SideEffects when appropriate. STORE_GLOBAL
is eventually applied after the generated graph.

Skip inductor CPU tests if there is no working c++ compiler (#283)

Collect guard failures into one warning at cache limit hit (#281)

- avoid warning on each guard failure separately (in cases cache limit > 1)
- instead, bundle a summary of gaurd failure warnings together at the
  time of cache limit hit

[inductor] Add support for more operators (#282)

[inductor] Improved indexing simplification and loop body representation (#289)

All base class methods take precedence if it's a nn.Module (#290)

Fx2trt pr4 (#294)

* temp changes

* add pass rewrite for setitem

* linter

* temp checkin

* squeeze for normalization

* code clean

* comments improvement

* comment out int64->int32

* linter

Fx2trt pr5 (#296)

* temp changes

* add pass rewrite for setitem

* linter

* temp checkin

* squeeze for normalization

* code clean

* comments improvement

* comment out int64->int32

* linter

* add a threshold for fall back to non-TRT

Support guarding inf constants (#300)

Pytorch tests 5/n - Graph break on MemberDescriptor type (#301)

* Graph break on MemberDescriptor type

* CI

update_locals_and_stack should use shared cache (#302)

* update_locals_and_stack should use shared cache

* update SideEffects.apply to use default cache

Implement verbose tensor guards check  (#287)

Verbose guard checks are guards used outside of the hot path
for providing specific failure information to the user on compile
cache miss.

This PR adds support for verbose guards and implements one for the
tensor guard, leaving other guards alone.

* Add tensor names to tensor guard failure message

Low precision support (#304)

* add low precision support to torchinductor triton backend

* remove temporary tests

* lint

* lint

* lint

Run clang-format on torchdynamo/_guards.cpp (#306)

Fix slowdown due to generation_tagging_new (#305)

* Use patched init to track dynamic modules + test gen tagging

Elaborate on error message for failing tensor type match (#307)

ConstDictVariable reconstruct should keep original order (#308)

[inductor] Minor fixes for latest PyTorch and benchmark harness codegen (#309)

Verbose guard check Bugfix (#311)

* Bugfix and clang format

* It wasn't will, it was me - Clang formatter

PyTorch tests - 6/n - Add type check for list/tuple elems in CONSTANT_MATCH (#303)

* Add type check for list/tuple elems in CONSTANT_MATCH

* recursive length guarding

* All decomp tests pass

* Filter out only useful guards

Add torchdynamo.allow_in_graph and torchdynamo.disallow_in_graph (#295)

Fix CI (#316)

* DONT MERGE - Checking CI

* fix

* fix

Fix repro test to unblock internal sync (#315)

Add constant checks for list of numpy integers (#313)

Skip inductor tests if sympy is missing (#320)

[Easy] fix reference to removed variable in debug trace (#323)

Add a basic compilation profiler (#312)

* Add a basic compilation profiler

* Include graph break reasons in compilation report

* lint and import issues

* Add --recompile_profiler option to torchbench.py

Filter out unimportant modules from allowed modules (#324)

* Filter out unimportant modules from allowed modules

* Remove typo

* Further cleanup

* CI testing

* Michael's comment

* Remove few more meaningless things

* Jason's comments

Prevent guard creation from accessing objects between __new__ and __init__ (#322)

Initial implementation of UnspecializedPrimitiveVariable (#321)

* Initial implementation of UnspecializedPrimitiveVariable

* Update heuristic

* Add test for no recompilations for different values

[inductor] Initial suppport for tiling output code (#317)

Fix test skip when sympy is missing (#333)

remove unconditional sympy import from test_torchdinductor.py (#334)

* remove unconditional sympy import from test_torchdinductor.py

conv in triton (#310)

* general conv and conv1x1 implementation in triton
* correctness check with torch baseline
* benchmarking on resnet50 layers
* enable `triton_ops.conv` to replace `aten.convolution` by setting config.triton.use_conv as True

[inductor] Minor float16 fix (#338)

update IPEX backend (#344)

Add _refs and _prims to the allowlist

This won't get exercised by real models but it's necessary so we
can test that PrimTorch decomps work under dynamo.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 874bbbd8d80d07fcd089a0d57849f21b2b81756d
Pull Request resolved: pytorch/torchdynamo#345

[inductor] Support additional operators (#339)

[inductor] Benchmark harness for training (#337)

codegen to update mutated variables with side effect should after stack value's codegen (#347)

Add a master-only test that at least one PrimTorch ref can be traced nopython

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 221ff4b897648f99a4552dd955b3426a2d6cb738
Pull Request resolved: pytorch/torchdynamo#355

Convert dict to list bebefor iterate in case there is possible delete (#360)

Add TROUBLESHOOTING.md (#357)

- link TROUBLESHOOTING.md from README.md and from recompilation warning

Addresses issue https://github.com/pytorch/torchdynamo/issues/348

Add coldstart/breakeven benchmark (#352)

try with:
python torchbench.py --cold-start -d cuda --training --use-eval-mode --nvfuser --isolate

This actually adds 2 new benchmark metrics:

coldstart: measures the worst of t_eager_compile / t_dynamo_compile as a 'speedup', where dynamo compiles twice to exercise profiling executor

breakeven: predicts the number of iterations dynamo would have to run to 'break even' with eager, considering the amortization of its compile cost

Not yet tested with inference or cpu, may have some other issues.

Should probably adjust to repeat the whole cold-start process several times and median, but for now just does this once.

Raise unimplemeted if checkpoint is empty (#351)

Only wrap in TorchVariable if is allowed, not if not disallowed

If you don't do this, allowed_functions_module_string_ignorelist
doesn't actually affect if we try to trace these functions into
the graph, since the disallowed list doesn't actually respect
this config.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: fad216f8a3013caaa2814cc3efaf5348e6033ee5
Pull Request resolved: pytorch/torchdynamo#356

List append should return None (#365)

Make TorchInductor use Triton's MM implementation in codegen (#325)

* Rebased on upstream

* Fixed make lint

* Fixed make lint

add yaml to requirements.txt (#367)

Disabling TorchDynamo inside torch.jit and torch.onnx compiler (#361)

* Disabling TorchDynamo inside torch.jit compiler

* Adding trace_module

* Remove the script

* Also adding ONNX

* Jason's comments

Workaround triton float64 log issue (#379)

Fix missing ir.Reduction.default_value (#378)

autotune conv kernels (#364)

* tuned_conv to choose the best kernel for given inputs shape, stride, layer params
* set config.triton.convolution as "aten"(default), "triton" or "autotune"

[inductor] Refactor fallback kernel handling (#381)

Add torch._decomps to the list to trace into

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: aa557ba00ea2fc03946089d56592041a5df31275
Pull Request resolved: pytorch/torchdynamo#369

Print value of TorchVariable object

I find this is helpful for debugging what exactly a given TorchVariable
is; presently there is no information so it is hard to tell.  Because
these are PyTorch variables they should be well behaved and it shouldn't
cause problems to call repr on them.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: a47f459ce2dd577b0133b9e59af0609e4123f60b
Pull Request resolved: pytorch/torchdynamo#370

Skipping the namedtuple subclass constructor (#382)

Adding missing guard for GET_ITER Bytecode (#386)

fx2trt_oss is merged to https://github.com/pytorch/TensorRT (#385)

* temp changes

* fx2trt_oss is merged to https://github.com/pytorch/TensorRT

* linter fix

Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) (#380)

* Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm)

* Reformatted

Fix for `any` decomposition (#384)

* fix  lowering

* restore abs

* black

Extend new to new_empty for tensor.shape (#387)

Support for lazy modules (#391)

* Support for lazy modules with test

Added fixes to support lazy modules in torchdynamo. The main issue that needed to be addressed is that LazyModules register some hooks which are run when a module is called. Torchdynamo typically calls the forward method instead of __call__ so these hooks were never run. In the case of LazyModules we now run and trace the __call__ method, and allow the original module to be mutated. In the future, we could do this for all modules, but there were cases where torchdynamo does not yet support functionality used in all hooks.

Enable tf32 in torchbench.py (#397)

Support TensorType checking (#395)

* Support TensorType checking

* Update torchdynamo/variables/builtin.py

Co-authored-by: Jason Ansel <jansel@jansel.net>

Co-authored-by: Jason Ansel <jansel@jansel.net>

[inductor] Canonicalize indexes for MemoryDep (#314)

Summary: By canoicalizing indexes, we can build more accurate and
flexible read-write dependency, which will allow more general kernel fusion.

Suppress warnings during the pre-checks on a frame (#396)

* Suppress warnings during the pre-checks on a frame

* Encapsulate in PatchTorch

[inductor] Do convolution bias in its own kernel (#403)

[inductor] Fix convolution output stride order (#400)

Added more shapes (from alexnet, BERT, hf_GPT2) to the inductor mm test (#398)

Reducing overhead of has_tensor_in_frame (#406)

[inductor] add manual_seed in test_round (#388)

[inductor] Improve heuristic for realizing buffers (#402)

[inductor] Fix typo preventing some fusions (#401)

[inductor] Fix for constants on wrong device (#405)

remove deepcopy in fx2trt (#407)

* temp changes

* remove deepcopy

Fix broken link in README (#408)

Adding nan (#383)

* Adding nan

* Jason's comments

Fixing import issues for PyTorch 1.12 rc branch (#411)

Raising tolerance after using tf32 (#415)

[inductor] Improve handling of reductions (#404)

Shape guard and isinstance fixes (#414)

* Shape guard and isinstance fixes and tests

* Add guard for any accessed Tensor attribute

Ensure dtype instances are not mapped to TorchVariable() (#394)

* Disallow dtype instances

* Extract dtypes from torch automatically and add test

Propagate guards from ConstDict variables (#421)

Config driven support for torch.Tensor .item()  (#417)

[inductor] Refactor helpers into torchinductor.utils (#418)

Removing generation field from the patched nn.Module (#423)

* Removing generation field from the patched nn.Module

* Lint

* Rebseing

[Inductor] Triton template for conv (#422)

* TritonTemplateKernel, template_codegen, conv jinja2 template
* pip install Jinja2 in setup_nightly

Remove unnecessary call to clone which also caused a segfault (??) (#427)

Disable TorchDynamo on frames created by fx symbolic tracing (#429)

Fix for disabling triton templates (#430)

Minor test fixes (#432)

[inductor] Remove dead stores after fusion (#409)

Summary: Use DeferredLine/DeferredIndentedBuffer to perform a lazy emit of
buffer allocation/store after we determine which buffers are redundant.

Add support for iterating over a dict (#436)

Check if kwargs has key "fp16_mode" when determining the precision (#437)

[inductor] Add heuristic to set num_warps (#433)

update accuracy check for TRT fp32 (#438)

* temp changes

* fix an issue in fp32, change accuracy check to cosine similarity for fp32 since TRT fp32 could not meet 1e-4

 Pin CI to June 20th Torch Nightly (#441)

* Pin CI to June 20th Torch Nightly

* Respond to commetns

* duplicated set up..

[inductor] Add some prims (#431)

Disabling the trace instead of symbolic trace (#443)

Break graph on torch.Storage types (#428)

* Break graph on torch.Storage types

* Hmm, CI failing, trying Jason's suggestion

* Debug CI

Fix guard propagation for tuple iterators (#448)

[inductor] Improve fusing of tiled + untiled (#446)

Directly compute sum of a list of floats/ints (#449)

* Directly compute sum of a list of floats/ints

* Test

Added option use_bmm to enable triton codegen for bmm (#393)

* Added option use_bmm to enable triton codegen for bmm
* Added more shapes to microbench

Add support for crossentropy (#450)

Rewrite symbolic_locals for torch.return_types (#442)

* Rewrite symbolic_locals for torch.return_types

* Special casing on the out kwargs

* Replace == with is

Remove traced op overloads before compiling (#455)

* Remove op overloads

* lint

Raise errors when backends throw exceptions (#451)

Huggingface model benchmarking (#459)

[inductor] Support scatter operations (#434)

Step 2 of supporting UnspecializedNumpyVariable & UnspecializedPythonVariable (#392)

* Implement UnspecializedPrimitiveVariable codegen

* Make UnspecilizedPrimitiveVariable as GraphArg

* Update make_call_generated_code

* Update min/max builtin func

* Support random.random

* Remove unnecessary change

* Fix lint

* Refactor to support multiple random.random

* Refactor out unspecialized numpy and python variables

* Fix RandomValueSource guard

* Support multiple random functions

* Rebase to updated main

* Refactor out random_values_var

* Fix lint

* Fix lint

* Move random_values_var to output graph

* Add need_unwrap to distinguish unspec from x.item()

* Make global rand func unique

* Fix lint

* Add raw value propagation for unspec variables

* Fix lint

* Directly load type(raw_value) & update random func example value

* Fix lint

Add Fake Tensor Propagation (#426)

* Add Fake Tensor Propagation

* extend test

* lint

* fix import

* one more day..

* update functorch commit

* bump one more day to get pytorch/pytorch#79741

* Skip test

* use FakeTensorError

* lint

* Guard on fake tensor availability

* test skips (fix en route in core)

* lint

* update nightly

* lint

* update

* format

* update recent

[inductor] Cherry pick nll_loss_forward decomp (#456)

[inductor] Register lowerings for operator overloads (#457)

Add support for torch.finfo/torch.iinfo (#470)

[inductor] Auto-download gcc12 from conda-forge (#471)

Rename benchmarking files (#472)

Add support for named_params and named_modules (#465)

[inductor] Add a metric to count the number of generated kernels (#476)

Summary: This can be used to prevent a regression on our fusion result.

Extend python_key_normalize with support for PythonTensor class override and a post trace hook (#424)

* Add support for custom class

* lint

* Fix unpack to reflect main

* Feedback

* Simplify, rebase

* Lint, format

Flag to skip printing of Dynamo internal exceptions (#480)

Don't leak cache entry on skip

Fixes pytorch/torchdynamo#477

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 5b73c3470173b894d188c6e9290ebc3cbbef2bab
Pull Request resolved: pytorch/torchdynamo#478

Reduction cache (#487)

* enable cse for reductions

* self.cse

use fake tensors when deep copying model to check mutation (#486)

* use fake tensors when deepcopying model to check mutation

* fix fake tensors not available

* add tests

Added cos lowering (#492)

fix microbenchmarks import path (#474)

Compute multilayer reductions in higher precision (#484)

* Compute multilayer reductions in higher precision

* Compute whole mean kernel in higher precision, downcast in end

* update test

* lint

* skip test

Fix correctness checking code and non-deterministic benchmarks (#493)

Improve recompilation warning (#494)

- default to printing only the most recent guard failure not one failure for each cache miss
- reformat the text to be (hopefully) more readable and useful

Motivation:
While in some cases, knowing the individual failure reasons for each of (say, 64) cache misses could be useful, i practice it is probably good enough to know the most recent one since they tend to be similar reasons (such as incrementing counters or new object ids triggering the same type of guard).

Previously:
torchdynamo hit recompilation cache limit (64) for function 'toy_example' (example.py:5), due to the following guard failures: [['___guarded_code.valid'], {... 62 more times...}, ['___guarded_code.valid']]to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md

Now:
torchdynamo hit config.cache_size_limit (64)
   function: 'toy_example' (example.py:5)
   reasons:  ['___guarded_code.valid']

Add a new backend option for TVM's meta_schedule (#479)

Add (experimental) support for exporting a graph and guards (#469)

Makefile/packaging updates (#499)

[inductor] Misc small improvements (#475)

Adding logging config (#504)

More Huggingface models  (#500)

* More Huggingface models (from simple_dl)

* Comments

Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)

Revert "Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)" (#509)

This reverts commit c747580.

Disable torchdynamo inside dispatch_trace (#508)

[inductor] Support rand()/dropout() (#505)

[inductor] Workaround triton bug with XBLOCK=1 (#510)

Reduction deps (#502)

* mark more reduction dependencies

* cleanup

* black

* make sure canonicalization prefix is the same everywhere

change tuning trial of meta schedule (#513)

[inductor] Fix bug with invalidated reuse (#506)

[inductor] Handle no-op slice_scatter (#507)

[WIP] Adding AMP support in benchmark infra (#464)

[WIP][Discussion] Write out a deeper documentation on how we go from … (#498)

* [WIP][Discussion] Write out a deeper documentation on how we go from user code to producing guards

* Update GuardsOverviewPt1.md

* Update GuardsOverviewPt1.md
shiyu22 pushed a commit to towhee-io/towhee-compiler that referenced this pull request Sep 9, 2022
.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Add README.md

Improve counters and stats

Constant control flow

Support some function calls

Support calling submodules and methods

Add profiler to measure coverage

Measure overheads with TorchBench

Refactor tests

Support for unpacking, inplace, and matmul op

Rewrite how guards work

Cleanup and refactoring

Linting, formatting, and documentation

Fix crashes and add torchdynamo.reset()

Disable list arg unpacking

Support control flow with graph prefix

Minor refactoring and naming

Improve support for partial graphs

Increase coverage of comparisons, constants, modules

Fix for handling of iterators

Extract multiple graphs from control flow

Refactor binary ops

Handle more type of jump instructions

Support wrapping `Real` types (#1)

Allow using nn.Modules inside a list (#2)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Fix key error (#4)

Fix broken tests

Add support for EXTENDED_ARG

TorchBench and debugging improvements

Add support for staticmethod

Improve handling of unsupported variables

Allow using Tensors inside a list/tuple (#3)

* Allow using nn.Modules inside a list

* Allow using Tensors inside a list/tuple

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Implement `MAKE_FUNCTION` (#5)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Stricter typing and support nn.Sequential

Support tuple returns (#6)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Support global loads of bools (#7)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Support `len` (#8)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Fix bug in LOAD_GLOBAL

Support float constructor (#10)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Implement `isinstance` (#9)

Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com>

Revamp resuming after unsupported things

Livevars and constant folding

Optional dtype/device/rank/shape specialization

Specialization and mutation guards

Improve support for config objects

Add support for IMPORT_NAME

Early work on optimizations and autotuning

Support for sizes at input args and dicts

Minor refactoring

Support for closures

Support ModuleList and strings

Support nested functions with closures

Fix build for gcc-7

Minor fixes for avx512 machines

Clean up stack size handling

Rewrite how graph resume works

Support for super()

Support zip and enumerate

Refactor variable construction into seperate file

Clean up handling of variable sources

Dictionary support and improve arg handling

Add IPEX backend

Improve handling of lists/dicts

Fix python 3.7 issues

Support module constants

Support basic list mutation

Support list comprehensions

Support for inline nn.Softmax()

Refactor and cleanup call_function

Support inlining generators

Dynamic .shape/.ndim support

Skip transformers.file_utils

Support property/classmethod

Support dict.__setitem__

Break graph on STORE_ATTR

Garbage collect generated code

Add trt backends

Improve cuda backends

Support latest pytorch/torchenchmark

Fixes in TRT baseline script

Fixes for GPU measurement

Refactor optimization backends and tuning

Retry autotuning failures

Fix static runtime backend

Fix lints

Cleanup list packing/unpacking

Fixes for new torchbench version

Work around crashes in static runtime

Fix vision_maskrcnn/detectron2_maskrcnn

Switch to alternate version of onnx2trt

Skip pyhpc_turbulent_kinetic_energy

Add backends to skipfiles

Fix weakref handling

Specialize on torch.is_tensor and torch.is_floating_point

Analysis and functionalization passes

Add dynamic dtype/device/shape propogation

Support namedtuple and dtype constants

Fix as_tensor issue

Improve support for range()

Config flag to control normalization

Fix __len__ issue

Support list.pop()

Allow inlining methods on UnsupportedVariable

Update README.md (#13)

[Backend][TVM] Support boolean as output (#14)

Remove extra call to torch.jit.fuser

Improve support for list/dict/len/str

Support list.extend and dict.update

Support for no_grad/enable_grad

Improve coverage of huggingface models

Refactor stack_op implementation

Fix chunk method in longformer

Fix bug with calls between nested functions

Support hasattr(namedtuple, ...)

Support for len(inspect.signature(fn).parameters)

Support for autograd.Function

Allow adding a BaseListVariable and a ConstantVariable together if the latter is an iterable (#15)

Refactor torchbench.py to use subprocess isolation

Switch project to use isort import format

Rewrite README.md

Improve docs in ./torchbench.py --help

Allow changing torchdynamo.config.dynamic_shapes without recompile

Refactor offline autotuner

Bugfix for bias towards eager

Add online version of autotuner

Fix some backend exceptions

Fix for bool inputs

Work around TensorRT abort() on group_norm

Improve error printing when backend fails

Fix bug with int64 in onnxrt

Add isolation to baseline runs

Fix some errors in copy_ for backend testing

Skip TRT for einsum models

Split out fixed_strategy1/fixed_strategy2

Disable TRT bypassing

Support direct calls to module.forward

Support for BUILD_TUPLE_UNPACK_WITH_CALL

Support map/reduce/sum of tensors

Split test_functions into two files

Fix reconstructing nested attrs

Improve support for HuggingFace ModelOutput() wrapper

Improve support for zip and __contains__

Fix issue with list multiply

Adding functorch to skipfile (#16)

Adding AOT Autograd API for inference (#17)

Add note about skipfiles

Fixes for latest torchbenchmark version

Support list mutation side effects

Refactor codegen related things into codegen.py

Refactor graph generation related things into output_graph.py

Support inlining super() calls of nn.Module subclasses

Support some simple cases of try/except

Refactor variable_tracker.py into many files

Support builtins module (#18)

Adding training optimizations (#19)

Allow constant folding through set()

Support for property/__getattr__ on user defined classes

Support more cases of varargs calls

Add nopython=True whole-program graph mode

Improve support for dataclasses

Refactor BuiltinVariable call_function handling

Add support for tuple iadd

Add support for UNPACK_EX bytecode

Improve handling for tuple constants

Support BUILD_LIST_UNPACK with tensor args

Support numpy integer constants

Support module.__class__.__name__

Add support for dict mutation side effects

Fix composability with FX generated code

Fix off by one bug in profile operator counts

Avoid over-specializing on dynamically created nn.Modules

Helper functions for AOT Autograd testing (#20)

Skip networkx for AOT (#21)

Run `make format`

Move non-specialized nn.Module handling to UnspecializedNNModuleVariable

Reuse generated code when control flow paths converge

Support resume while inside 'with no_grad()'

Support tuple_iterator

Support for getitem with default value

Support type(obj) calls

Fix bug in __getattr__ handling

add ltc backend (#23)

OSS Automated Fix: Addition of Code of Conduct (#25)

OSS Automated Fix: Addition of Contributing (#24)

Minor refactor in side_effects.py

`make format` and lint issues

Add lint workflow

Add LICENSE

Update CONTRIBUTING.md

Add test workflow

Update github workflows

Skipping logging module (#26)

Initial support for setattr side effects

Improve coverage statistics measurement

Support object creation side effects

Improve support for torch.distributions

Allow graph breaks on unsupported getitem

Remove nn.Sequential from skipfiles

Add CITATION.cff

Improve support for dynamic mutation of nn.Modules

Bugfix for mutating mutated attributes

Adding Torchbench training support (#27)

Deduplicate FX graph outputs

Support list.clear()

Adding missed random state reset (#28)

Fix issue with maskrcnn

Remove unneeded guards

Support HF ModelOutput() wrapper class

Avoid compiling the output of user_compiler

Fix ./torchbench.py --nothing

AOT Autograd fixes for moco, resnet50_qat, pytorch_struct (#29)

Save some memory while profiling

Fix AOT autograd bug where dynamo tries to compile generated backwards

Fix guards for 'mod.0.bias' attributes

Convert tests to use public API

Workaround for AOT Autograd LSTM bug (#31)

Fix handling of torch.manual_seed

Add `with torchdynamo.disable()` context manager

Support itertools.{chain,islice}

Support Tensor.is_quantized

Support multiple threads using TorchDynamo

Improve support and testing of dynamic shapes

Workaround for issue in hf_Bart in dynamic shape mode

Don't directly import from _eval_frame

Fix aliasing issue in #30

Support staticmethod/classmethod on user defined classes

Improve support for closures and dunder methods

Fix bug in handling of type annotations

Support 3+ nestings of closures

Fix threading issue for autograd threads

Add pthon autograd test case

IMPORT_NAME Instruction - Import the top-level package (#47)

hf_T5 dataclass fields handling (#50)

Using mean instead of sum to have reasonable loss value for backprop (#51)

Allow passing a string with a backend name to torchdynamo.optimize

Support nesting torchdynamo.optimize() decorated functions

Support __init__ of HF ModelOutput (#65)

Fill in missing fields in setup.py

Fix pip install issue

Add python key tracing backend

Fix build on M1 Max Mac (#63)

Fix lint

Fx2trt integration improvement (#71)

Refactor python_key_normalize

Handle mutation propagated by getitem (#83)

Torchbench changes for AOT Autograd (#84)

Fix string-based backend mode

Fix linter

Fix bug when torch and torchdynamo are in the same folder

Fx2trt pr2 (#97)

Fix lint github action

Add update 6 link to readme (#99)

Fix torchbench.py --nothing option (#100)

Skip tacotron2 and unskip vision_maskrcnn in torchbench.py (#102)

Add developer setup section to README.md (#105)

Fx2trt pr3 (#110)

Use Low overhead version AOT Module (#113)

Cleanup AOTAutograd related args (#114)

Add extra tolerance for some GPU models in Torchbench (#116)

Enabling few more torchbench models with AOT Autograd (#127)

Support formatted literal strings (f-strings) (#128)

* Support formatted literal strings (f-strings)

* TensorVariable var_getattr supports __class__ and add test case

* Address review comments

* Fix lint error

add an option to randomize inputs (#130)

Fix typo in IPEX backend (#126)

Handle torch size (#139)

Remove `gc.collect()` upon every model compute run. (#140)

Make STORE_SUBSCR break when unsupported. (#142)

* Make STORE_SUBSCR break when unsupported.

I probably could have done a bit more but this is enough to fix the
issue and I'll let someone more intrepid get this going comprehensively.
I'm also not sure how to test this.

Fixes #131

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix Python 3.7 compatibility issues & Add testing action workflow (#143)

Handle torch.cuda.current_device (#146)

Handle torch.seed (#141)

Handle torch.override.is_tensor_like (#144)

* Handle torch.override.is_tensor_like

* Making it a constant

* Comments

* Conflict

Preserve CUDA rng states during frame analysis (#147)

* Preserve CUDA rng states during frame analysis

* Retrigger CI

* Retrigger CI

* Debugging

* Debugging

* Debugging

AOT Autograd Training - few more models passing (#151)

* AOT Autograd Training - few more models passing

* Add skip API

* Skip inlining

Support Slice of NNModuleList (#152)

* Support Slice of NNModuleList

* Comments

Fix import overhead by using `importlib.util.find_spec` (#153)

Fix issues in #132 (#150)

Variable builder - handle slice (#155)

More skipfiles (#157)

* Add more skip modules

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

moco and hf_Reformer fixes (#158)

* hf_Reformer fixed

* Adding moco fix

Revert "Fix import overhead by using `importlib.util.find_spec` (#153)" (#160)

This reverts commit dedd9fa.

Add support for Python 3.9 (#154)

* Changes to make TorchDynamo support Python 3.9

* Fix lint

* Add Python 3.9 github test workflow

* Fix typo

* RERAISE to TERMINAL_OPCODES set

* Make IS_OP support ConstDictVariable

* Address comments

Monkey patch autograd.Variable (fixes Tacotron2) (#161)

Break graph on Tensor grad (#163)

* Break graph on Tensor grad

* Comments

* Suppress warning

enable support for staticmethod on superclass (#162)

Summary:

Adds support for tracing through this syntax:

```
class Parent(torch.nn.Module):
    @classmethod
    def foo(cls, x):
        x = x + x
        return x

class Child(Parent):
    @classmethod
    def helper(cls, x):
        // resolving super().foo failed before this PR
        x = super().foo(x)
        return x

    def forward(self, x):
        x = self.helper(x)
        return x
```

This is useful for eventually enabling __torch_function__ support.

Test plan:

```
pytest -vsk test_super_static_method
```

Options to print fx/aot traces (#164)

enable basic __torch_function__ support (#167)

Summary:

This adds a skeleton for `__torch_function__` support in torchdynamo.

What this is currently doing:

1. in variable builder, check for __torch_function__ and wrap tensors in TensorTFOverrideVariable if found
2. in TorchVariable.call_function, inline the __torch_function__ function of TensorTFOverrideVariable arguments
3. in GetAttrVariable.call_function, check for super().__torch_function__ which resolves to the original. If it's found, stop inlining and insert the function call into the graph.

The current test just creates a __torch_function__ override which doesn't do
anything but call super().__torch_function__.

Things left for future PRs:

* supporting call_method
* supporing actual logic inside the overrides
* implementing the full __torch_function__ spec (currently things are hardcoded to first argument only)

Test plan:

```
pytest -vsk test_simple_torch_function
// used to fail with https://www.internalfb.com/phabricator/paste/view/P496842415
// currently passes
```

Revert the monkey patching of variable, fixed in PyTorch (#173)

Miscellaneous small fixes and lints (#168)

Fixes for pytorch tests - 1/n (#174)

* Fixes for pytorch tests - 1/n

* Better comment

modify 'ipex' backend (#166)

Add torchdynamo.config.raise_on_backend_error (#177)

Fix for module returning (Tensor, None) (#176)

Fixes for pytorch tests 2/n - torch.Size and nn.Parameter (#182)

* Torch testing - Fix bugs for torch.Size and nn.Parameter

* CI failures

extend __torch_function__ support to `call_method` (#181)

Summary:

In pytorch/torchdynamo#167 we added `__torch_function__`
support for tracing through `call_function`.

This PR extends the support to also work on `call_method`.

Note: the LOC is high because some code was refactored to be reusable.
Note: implementing correct rewrapping logic for methods is saved for a
future PR, hope that's OK.

Test plan:

```
pytest -vsk test_simple_torch_function
```

Add supports for Python 3.10 (#172)

* Add supports for Python 3.10

* fix lint

* Add github test workflow for python 3.10

* Add lnotab and linetable writer test case

* Fix lint

* Fix lint

* Fix lint

* Split the unit test

* Remove some requirements and fix lint

* Add several new bytecodes in Python 3.10

* add Cython as requirements

* set numpy version

* Update README.md

Fix Pytorch tests 3/n - Skip exec frame (#184)

* Fix Pytorch tests 3/n - Skip exec frame

* CI failure

[PR] pass rewrite for supporting setitem (#188)

* temp changes

* add pass rewrite for setitem

* linter

Add --float16/--float32/--cosine options to torchbench.py (#189)

Don't import unused third party packages (#193)

Early version of TorchInductor (#190)

Fix for `KeyError: 'Size'` error (#194)

Add xarray to skipfiles (#196)

Update URLs to github.com/pytorch/torchdynamo (#199)

Add `./torchbench.py --fast` option (#198)

Pass backend-related ctx to TorchDynamo Optimize Context (#201)

* Pass backend-related ctx to TorchDynamo Optimize Context

* Reinit the backend ctx for every frame

* Doc

Detect x.new(torch.Size) and rewrite to torch.empty(tuple) (#195)

* Detect x.new(torch.Size) and rewrite to torch.empty(tuple)

* address comments

Skip frames when no graph is found (#205)

support inlining __torch_function__ with reading from closure (#197)

Summary:

The previous PRs to add `__torch_function__` support inlined through
`__torch_function__` without adding any guards for the function.

This worked for simple cases, but did not work if `__torch_function__`
needs to read a nonlocal variable, for example: https://gist.github.com/vkuzo/a3388fcaa532318d049368e96652b366
The reason it was broken is because the code which bound arguments
during inlining had to have a reference to a source in order to bind
things properly.

One way to fix this is to get the source of the `__torch_function__` attribute
of the original tensor, guard on it, and persist it through
all the rewrapping logic.

I'm flexible if there is a better alternative, lmk.

Test plan:

```
pytest -vsk test_torch_function_with_closure
```

[inductor] add lowerings for hardswish/hardsigmoid/hardtanh (#200)

[inductor] Handle +/-inf constants (#210)

Replace tensor.new with tensor.new_empty (#212)

add configuration for modules eligible for inlining (#208)

Summary:

Makes the source modules for `skipfiles.is_torch_inline_allowed` configurable.

This is needed for DBR quant integration exploration, we can now override this
config to allow torchdynamo to inline DBR quant utility functions.

Test plan:

Run this: https://gist.github.com/vkuzo/010e0483c9bbb35837cc9cb27c555243
it now advances past the error of "inlining in skipfiles"

[inductor] fix transposed convolution shape formula (#202)

Try finally block for with context on graph break instruction (#213)

* Try finally block for with context on graph break instruction

* fix test

* Support >= 3.9

* Support python 3.7

* Comments

* Replacing the global load with GlobalSource and reconstruct

Remove nn.Parameter filter hack for AOTAutograd backend (#214)

Fix test failures (#218)

Fix slicing list returning wrong result (#222)

* Correct ListVariable source

* Fix lint

Remove reference cycle (#223)

Refactor ConstDictVariable to support user_cls and use dict by default (#226)

Pin inductor CI to specific pytorch version (#229)

[inductor] support torch.linspace and torch.tensor (#217)

[inductor] add heuristic to decide layouts and loop orders (#216)

Revert "Fix slicing list returning wrong result (#222)" (#231)

This reverts commit 243222e.

Add equal_nan option to torchdynamo.testing.same() (#232)

Support device constants (#230)

Bail out for __setattr__ and fix ClassVariable handling (#227)

- avoid compiling __setattr__ functions as they may be difficult
  to correctly handle for arbitrary custom classes, but also aren't
  likely to be useful for torch module optimization
- expand the condition for constructing UserDefinedClassVariable
  to include ABCMeta classes via `inspect.isclass` check

Remove reference cycle - with exceptions (#228)

* Remove reference cycle - with exceptions

* Fix for InliningInstructionTranslator

Add fix for writing to closures (#233)

* Add fix for writing to closures

* run black

* one more time

Co-authored-by: Elias Ellison <eellison@devfair044.h1.fair>

Delete example value for unused args (#234)

Fix list slice & ConstantVariable to TupleVariable conversion missing source info (#235)

* Fix list slice & ConstantVariable to TupleVariable conversion miss source info.

* Update test cases

* Address comment

[inductor] early memory reuse and new operators (#237)

Add type checking in Constant match - Fix Pytorch tests 4/n (#238)

* Add type checking in Constant match

* Fix test

enable tracing through enum comparison (#245)

add support for tracing torch.nn.ModuleDict.__contains__ (#246)

Summary:

Adds support for tracing through this syntax:

```
class M(torch.nn.Module):
    def __init__(self, module_dict):
        super().__init__()
        self.module_dict = module_dict

    def forward(self, x):
        if "foo" in self.module_dict:
            x = torch.mul(x, 1.0)
        x = torch.add(x, 1.0)
        return x
```

This is useful for DBR quantization.

Test plan:

```
pytest -vsk test_nn_moduledict_contains
```

enumerate supports start argument (#240)

* enumerate supports start argument

* address comments

Make eval_frame thread safe (#239)

This should make eval_frame thread safe. Currently, the eval_frame is a global object, and different threads my step on each other setting a different one.

This changes the behavior to instead always* have a "shim" eval_frame which then routes to the correct behavior by looking at the thread-local associated object. This is thread safe because now the callback object is always thread safe, and we only use it to drive logic at frame eval time, as opposed to at callback registration time.

Currently, the logic for None/False/Callback is kept, but the False case could be easily collapsed behind the shim in a subsequent diff.

*Always here means always when dynamo is running. The shim is installed and removed based on keeping track of how many dynamo threads are running at the moment.

Add support for __subclasses__ (#242)

Fixes #241

Inline function jump on tensor condition should be unimplemented (#249)

cast model in no-isolate mode (#244)

support tracing __getitem__ of torch.nn.ModuleDict (#253)

Summary:

Supports tracing through

```
class ModuleDict(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.ModuleDict(
            {
                "0": torch.nn.Linear(10, 10),
            }
        )

    def forward(self, x):
        x = self.layers["0"](x)
        return x
```

This is useful for DBR quant.

Note: handling other logic for `ModuleDict` is left for future PRs.

Test plan:

```
pytest -vsk test_moduledict
```

Run torch inductor test on GPU machine (Part 1)  (#258)

* Run torch inductor test on GPU machine

* Land scale-config first

Implement verify_correctness #179 (#252)

* Wrapperbackend to enable verifying corretness of backends; set config.verify_correctness as True to enable it.

* move testing.same() to utils.py

Skip inductor tests on older pytorch versions (#257)

[inductor] Multi-devices, primtorch decomps, and many new ops (#243)

enable tracing through id(nn_module_variable) (#262)

Summary:

Enables tracing through this syntax:

```
class M(torch.nn.Module):
    def forward(self, x, ref_id):
        self_id = id(self)
        if self_id == ref_id:
            x = torch.mul(x, 1.0)
        x = torch.add(x, 1.0)
        return x
```

This is useful for DBR quant because it uses `id(module)` for some
FQN gymnastics.

Test plan:

```
pytest -vsk test_id_of_nn_module
```

enable tracing through frozenset contains of PyTorch ops (#251)

Summary:

Enables tracing through this syntax:

```
funcs = frozenset([torch.add])

def fn(x, func):
    if func in funcs:
        x = torch.add(x, 1.0)
    x = torch.mul(x, 1.0)
    return x
```

This is useful for DBR quantization.

Test plan:

```
pytest -vsk test_frozenset_torch_func_contains
```

Dump conv args into file (#261)

* dump convolution args into file

* add option --log-conv-args in torchbench.py

Fix generation tagging new (#263)

Simplify eval frame, merge _run_only (#264)

Allow layout=torch.strided in new_constant (#269)

Decomposition for nan_to_num (#268)

[inductor] Handle non-reduction reductions (#266)

Use unittest.mock.patch for test_verify_correctness (#265)

[inductor] Support sort/as_tensor/LongTensor (#267)

Fix inline list/dict mutation (#273)

* Fix inline list/dict mutation

* Fix lint

* Refact inline translator's replace_all

* Fix recursive inline replace

* Remove debug print

Support split_with_sizes (#272)

Light Refactor + Add support for torch.autograd.profiler.record_profile function (#274)

This diff takes GradModeVariable's logic and pulls it partially into a more generic ContextWrappingVariable base class intended for making it easier to write context managed code.

[inductor] Support input and slice mutation (#275)

Add recompile ux tests (#270)

Just a first step, this PR adds a few tests that starts to outline a proposed UX, and proposes mechanisms for setting/checking the #recompiles and cache limit to facilitate the testing

Skip non tensor frame (#248)

* Skip non tensor

* Skip non tensor frame

* Lint

* Jason comments

* Add decorator functionality

* Comments

Prioritize class method if there is duplicated attribute name (#278)

* Prioritize class method if there is duplicated attribute name

* Refactor var_getattr to make it consistent with native pytorch

[inductor] Improve merging of contiguous loops (#279)

Add support for STORE_GLOBAL (#286)

Summary:
1. Create a symbolic_global table to store a global variable name to an
unique object mapping, and the unique object is further used as a key to
index into the store_attr_mutations table in SideEffects.
2. The actual STORE_GLOGAL action is buffered by SideEffects and later
LOAD_GLOBAL just reads from SideEffects when appropriate. STORE_GLOBAL
is eventually applied after the generated graph.

Skip inductor CPU tests if there is no working c++ compiler (#283)

Collect guard failures into one warning at cache limit hit (#281)

- avoid warning on each guard failure separately (in cases cache limit > 1)
- instead, bundle a summary of gaurd failure warnings together at the
  time of cache limit hit

[inductor] Add support for more operators (#282)

[inductor] Improved indexing simplification and loop body representation (#289)

All base class methods take precedence if it's a nn.Module (#290)

Fx2trt pr4 (#294)

* temp changes

* add pass rewrite for setitem

* linter

* temp checkin

* squeeze for normalization

* code clean

* comments improvement

* comment out int64->int32

* linter

Fx2trt pr5 (#296)

* temp changes

* add pass rewrite for setitem

* linter

* temp checkin

* squeeze for normalization

* code clean

* comments improvement

* comment out int64->int32

* linter

* add a threshold for fall back to non-TRT

Support guarding inf constants (#300)

Pytorch tests 5/n - Graph break on MemberDescriptor type (#301)

* Graph break on MemberDescriptor type

* CI

update_locals_and_stack should use shared cache (#302)

* update_locals_and_stack should use shared cache

* update SideEffects.apply to use default cache

Implement verbose tensor guards check  (#287)

Verbose guard checks are guards used outside of the hot path
for providing specific failure information to the user on compile
cache miss.

This PR adds support for verbose guards and implements one for the
tensor guard, leaving other guards alone.

* Add tensor names to tensor guard failure message

Low precision support (#304)

* add low precision support to torchinductor triton backend

* remove temporary tests

* lint

* lint

* lint

Run clang-format on torchdynamo/_guards.cpp (#306)

Fix slowdown due to generation_tagging_new (#305)

* Use patched init to track dynamic modules + test gen tagging

Elaborate on error message for failing tensor type match (#307)

ConstDictVariable reconstruct should keep original order (#308)

[inductor] Minor fixes for latest PyTorch and benchmark harness codegen (#309)

Verbose guard check Bugfix (#311)

* Bugfix and clang format

* It wasn't will, it was me - Clang formatter

PyTorch tests - 6/n - Add type check for list/tuple elems in CONSTANT_MATCH (#303)

* Add type check for list/tuple elems in CONSTANT_MATCH

* recursive length guarding

* All decomp tests pass

* Filter out only useful guards

Add torchdynamo.allow_in_graph and torchdynamo.disallow_in_graph (#295)

Fix CI (#316)

* DONT MERGE - Checking CI

* fix

* fix

Fix repro test to unblock internal sync (#315)

Add constant checks for list of numpy integers (#313)

Skip inductor tests if sympy is missing (#320)

[Easy] fix reference to removed variable in debug trace (#323)

Add a basic compilation profiler (#312)

* Add a basic compilation profiler

* Include graph break reasons in compilation report

* lint and import issues

* Add --recompile_profiler option to torchbench.py

Filter out unimportant modules from allowed modules (#324)

* Filter out unimportant modules from allowed modules

* Remove typo

* Further cleanup

* CI testing

* Michael's comment

* Remove few more meaningless things

* Jason's comments

Prevent guard creation from accessing objects between __new__ and __init__ (#322)

Initial implementation of UnspecializedPrimitiveVariable (#321)

* Initial implementation of UnspecializedPrimitiveVariable

* Update heuristic

* Add test for no recompilations for different values

[inductor] Initial suppport for tiling output code (#317)

Fix test skip when sympy is missing (#333)

remove unconditional sympy import from test_torchdinductor.py (#334)

* remove unconditional sympy import from test_torchdinductor.py

conv in triton (#310)

* general conv and conv1x1 implementation in triton
* correctness check with torch baseline
* benchmarking on resnet50 layers
* enable `triton_ops.conv` to replace `aten.convolution` by setting config.triton.use_conv as True

[inductor] Minor float16 fix (#338)

update IPEX backend (#344)

Add _refs and _prims to the allowlist

This won't get exercised by real models but it's necessary so we
can test that PrimTorch decomps work under dynamo.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 874bbbd8d80d07fcd089a0d57849f21b2b81756d
Pull Request resolved: pytorch/torchdynamo#345

[inductor] Support additional operators (#339)

[inductor] Benchmark harness for training (#337)

codegen to update mutated variables with side effect should after stack value's codegen (#347)

Add a master-only test that at least one PrimTorch ref can be traced nopython

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 221ff4b897648f99a4552dd955b3426a2d6cb738
Pull Request resolved: pytorch/torchdynamo#355

Convert dict to list bebefor iterate in case there is possible delete (#360)

Add TROUBLESHOOTING.md (#357)

- link TROUBLESHOOTING.md from README.md and from recompilation warning

Addresses issue https://github.com/pytorch/torchdynamo/issues/348

Add coldstart/breakeven benchmark (#352)

try with:
python torchbench.py --cold-start -d cuda --training --use-eval-mode --nvfuser --isolate

This actually adds 2 new benchmark metrics:

coldstart: measures the worst of t_eager_compile / t_dynamo_compile as a 'speedup', where dynamo compiles twice to exercise profiling executor

breakeven: predicts the number of iterations dynamo would have to run to 'break even' with eager, considering the amortization of its compile cost

Not yet tested with inference or cpu, may have some other issues.

Should probably adjust to repeat the whole cold-start process several times and median, but for now just does this once.

Raise unimplemeted if checkpoint is empty (#351)

Only wrap in TorchVariable if is allowed, not if not disallowed

If you don't do this, allowed_functions_module_string_ignorelist
doesn't actually affect if we try to trace these functions into
the graph, since the disallowed list doesn't actually respect
this config.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: fad216f8a3013caaa2814cc3efaf5348e6033ee5
Pull Request resolved: pytorch/torchdynamo#356

List append should return None (#365)

Make TorchInductor use Triton's MM implementation in codegen (#325)

* Rebased on upstream

* Fixed make lint

* Fixed make lint

add yaml to requirements.txt (#367)

Disabling TorchDynamo inside torch.jit and torch.onnx compiler (#361)

* Disabling TorchDynamo inside torch.jit compiler

* Adding trace_module

* Remove the script

* Also adding ONNX

* Jason's comments

Workaround triton float64 log issue (#379)

Fix missing ir.Reduction.default_value (#378)

autotune conv kernels (#364)

* tuned_conv to choose the best kernel for given inputs shape, stride, layer params
* set config.triton.convolution as "aten"(default), "triton" or "autotune"

[inductor] Refactor fallback kernel handling (#381)

Add torch._decomps to the list to trace into

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: aa557ba00ea2fc03946089d56592041a5df31275
Pull Request resolved: pytorch/torchdynamo#369

Print value of TorchVariable object

I find this is helpful for debugging what exactly a given TorchVariable
is; presently there is no information so it is hard to tell.  Because
these are PyTorch variables they should be well behaved and it shouldn't
cause problems to call repr on them.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: a47f459ce2dd577b0133b9e59af0609e4123f60b
Pull Request resolved: pytorch/torchdynamo#370

Skipping the namedtuple subclass constructor (#382)

Adding missing guard for GET_ITER Bytecode (#386)

fx2trt_oss is merged to https://github.com/pytorch/TensorRT (#385)

* temp changes

* fx2trt_oss is merged to https://github.com/pytorch/TensorRT

* linter fix

Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) (#380)

* Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm)

* Reformatted

Fix for `any` decomposition (#384)

* fix  lowering

* restore abs

* black

Extend new to new_empty for tensor.shape (#387)

Support for lazy modules (#391)

* Support for lazy modules with test

Added fixes to support lazy modules in torchdynamo. The main issue that needed to be addressed is that LazyModules register some hooks which are run when a module is called. Torchdynamo typically calls the forward method instead of __call__ so these hooks were never run. In the case of LazyModules we now run and trace the __call__ method, and allow the original module to be mutated. In the future, we could do this for all modules, but there were cases where torchdynamo does not yet support functionality used in all hooks.

Enable tf32 in torchbench.py (#397)

Support TensorType checking (#395)

* Support TensorType checking

* Update torchdynamo/variables/builtin.py

Co-authored-by: Jason Ansel <jansel@jansel.net>

Co-authored-by: Jason Ansel <jansel@jansel.net>

[inductor] Canonicalize indexes for MemoryDep (#314)

Summary: By canoicalizing indexes, we can build more accurate and
flexible read-write dependency, which will allow more general kernel fusion.

Suppress warnings during the pre-checks on a frame (#396)

* Suppress warnings during the pre-checks on a frame

* Encapsulate in PatchTorch

[inductor] Do convolution bias in its own kernel (#403)

[inductor] Fix convolution output stride order (#400)

Added more shapes (from alexnet, BERT, hf_GPT2) to the inductor mm test (#398)

Reducing overhead of has_tensor_in_frame (#406)

[inductor] add manual_seed in test_round (#388)

[inductor] Improve heuristic for realizing buffers (#402)

[inductor] Fix typo preventing some fusions (#401)

[inductor] Fix for constants on wrong device (#405)

remove deepcopy in fx2trt (#407)

* temp changes

* remove deepcopy

Fix broken link in README (#408)

Adding nan (#383)

* Adding nan

* Jason's comments

Fixing import issues for PyTorch 1.12 rc branch (#411)

Raising tolerance after using tf32 (#415)

[inductor] Improve handling of reductions (#404)

Shape guard and isinstance fixes (#414)

* Shape guard and isinstance fixes and tests

* Add guard for any accessed Tensor attribute

Ensure dtype instances are not mapped to TorchVariable() (#394)

* Disallow dtype instances

* Extract dtypes from torch automatically and add test

Propagate guards from ConstDict variables (#421)

Config driven support for torch.Tensor .item()  (#417)

[inductor] Refactor helpers into torchinductor.utils (#418)

Removing generation field from the patched nn.Module (#423)

* Removing generation field from the patched nn.Module

* Lint

* Rebseing

[Inductor] Triton template for conv (#422)

* TritonTemplateKernel, template_codegen, conv jinja2 template
* pip install Jinja2 in setup_nightly

Remove unnecessary call to clone which also caused a segfault (??) (#427)

Disable TorchDynamo on frames created by fx symbolic tracing (#429)

Fix for disabling triton templates (#430)

Minor test fixes (#432)

[inductor] Remove dead stores after fusion (#409)

Summary: Use DeferredLine/DeferredIndentedBuffer to perform a lazy emit of
buffer allocation/store after we determine which buffers are redundant.

Add support for iterating over a dict (#436)

Check if kwargs has key "fp16_mode" when determining the precision (#437)

[inductor] Add heuristic to set num_warps (#433)

update accuracy check for TRT fp32 (#438)

* temp changes

* fix an issue in fp32, change accuracy check to cosine similarity for fp32 since TRT fp32 could not meet 1e-4

 Pin CI to June 20th Torch Nightly (#441)

* Pin CI to June 20th Torch Nightly

* Respond to commetns

* duplicated set up..

[inductor] Add some prims (#431)

Disabling the trace instead of symbolic trace (#443)

Break graph on torch.Storage types (#428)

* Break graph on torch.Storage types

* Hmm, CI failing, trying Jason's suggestion

* Debug CI

Fix guard propagation for tuple iterators (#448)

[inductor] Improve fusing of tiled + untiled (#446)

Directly compute sum of a list of floats/ints (#449)

* Directly compute sum of a list of floats/ints

* Test

Added option use_bmm to enable triton codegen for bmm (#393)

* Added option use_bmm to enable triton codegen for bmm
* Added more shapes to microbench

Add support for crossentropy (#450)

Rewrite symbolic_locals for torch.return_types (#442)

* Rewrite symbolic_locals for torch.return_types

* Special casing on the out kwargs

* Replace == with is

Remove traced op overloads before compiling (#455)

* Remove op overloads

* lint

Raise errors when backends throw exceptions (#451)

Huggingface model benchmarking (#459)

[inductor] Support scatter operations (#434)

Step 2 of supporting UnspecializedNumpyVariable & UnspecializedPythonVariable (#392)

* Implement UnspecializedPrimitiveVariable codegen

* Make UnspecilizedPrimitiveVariable as GraphArg

* Update make_call_generated_code

* Update min/max builtin func

* Support random.random

* Remove unnecessary change

* Fix lint

* Refactor to support multiple random.random

* Refactor out unspecialized numpy and python variables

* Fix RandomValueSource guard

* Support multiple random functions

* Rebase to updated main

* Refactor out random_values_var

* Fix lint

* Fix lint

* Move random_values_var to output graph

* Add need_unwrap to distinguish unspec from x.item()

* Make global rand func unique

* Fix lint

* Add raw value propagation for unspec variables

* Fix lint

* Directly load type(raw_value) & update random func example value

* Fix lint

Add Fake Tensor Propagation (#426)

* Add Fake Tensor Propagation

* extend test

* lint

* fix import

* one more day..

* update functorch commit

* bump one more day to get pytorch/pytorch#79741

* Skip test

* use FakeTensorError

* lint

* Guard on fake tensor availability

* test skips (fix en route in core)

* lint

* update nightly

* lint

* update

* format

* update recent

[inductor] Cherry pick nll_loss_forward decomp (#456)

[inductor] Register lowerings for operator overloads (#457)

Add support for torch.finfo/torch.iinfo (#470)

[inductor] Auto-download gcc12 from conda-forge (#471)

Rename benchmarking files (#472)

Add support for named_params and named_modules (#465)

[inductor] Add a metric to count the number of generated kernels (#476)

Summary: This can be used to prevent a regression on our fusion result.

Extend python_key_normalize with support for PythonTensor class override and a post trace hook (#424)

* Add support for custom class

* lint

* Fix unpack to reflect main

* Feedback

* Simplify, rebase

* Lint, format

Flag to skip printing of Dynamo internal exceptions (#480)

Don't leak cache entry on skip

Fixes pytorch/torchdynamo#477

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 5b73c3470173b894d188c6e9290ebc3cbbef2bab
Pull Request resolved: pytorch/torchdynamo#478

Reduction cache (#487)

* enable cse for reductions

* self.cse

use fake tensors when deep copying model to check mutation (#486)

* use fake tensors when deepcopying model to check mutation

* fix fake tensors not available

* add tests

Added cos lowering (#492)

fix microbenchmarks import path (#474)

Compute multilayer reductions in higher precision (#484)

* Compute multilayer reductions in higher precision

* Compute whole mean kernel in higher precision, downcast in end

* update test

* lint

* skip test

Fix correctness checking code and non-deterministic benchmarks (#493)

Improve recompilation warning (#494)

- default to printing only the most recent guard failure not one failure for each cache miss
- reformat the text to be (hopefully) more readable and useful

Motivation:
While in some cases, knowing the individual failure reasons for each of (say, 64) cache misses could be useful, i practice it is probably good enough to know the most recent one since they tend to be similar reasons (such as incrementing counters or new object ids triggering the same type of guard).

Previously:
torchdynamo hit recompilation cache limit (64) for function 'toy_example' (example.py:5), due to the following guard failures: [['___guarded_code.valid'], {... 62 more times...}, ['___guarded_code.valid']]to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md

Now:
torchdynamo hit config.cache_size_limit (64)
   function: 'toy_example' (example.py:5)
   reasons:  ['___guarded_code.valid']

Add a new backend option for TVM's meta_schedule (#479)

Add (experimental) support for exporting a graph and guards (#469)

Makefile/packaging updates (#499)

[inductor] Misc small improvements (#475)

Adding logging config (#504)

More Huggingface models  (#500)

* More Huggingface models (from simple_dl)

* Comments

Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)

Revert "Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)" (#509)

This reverts commit c747580.

Disable torchdynamo inside dispatch_trace (#508)

[inductor] Support rand()/dropout() (#505)

[inductor] Workaround triton bug with XBLOCK=1 (#510)

Reduction deps (#502)

* mark more reduction dependencies

* cleanup

* black

* make sure canonicalization prefix is the same everywhere

change tuning trial of meta schedule (#513)

[inductor] Fix bug with invalidated reuse (#506)

[inductor] Handle no-op slice_scatter (#507)

[WIP] Adding AMP support in benchmark infra (#464)

[WIP][Discussion] Write out a deeper documentation on how we go from … (#498)

* [WIP][Discussion] Write out a deeper documentation on how we go from user code to producing guards

* Update GuardsOverviewPt1.md

* Update GuardsOverviewPt1.md
davidmiller4185101 added a commit to davidmiller4185101/django-torch-dynamo-develop that referenced this pull request Sep 29, 2022
…nopython

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 221ff4b897648f99a4552dd955b3426a2d6cb738
Pull Request resolved: pytorch/torchdynamo#355
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants