This repository was archived by the owner on Aug 1, 2025. It is now read-only.
Conversation
jansel
reviewed
Apr 22, 2022
4130fc9 to
8ed04af
Compare
jansel
suggested changes
Apr 24, 2022
| # defined type of a custom tensor subclass. | ||
| type_to_use = self.objvar.python_type() | ||
| if type_to_use is type: | ||
| if type_to_use is type or type_to_use is torch._C._TensorMeta: |
Contributor
There was a problem hiding this comment.
I wonder if there is a better way to handle this, as it feels a bit like a hack to do different things based on the value. Perhaps we could detect we are in a @classmethod and do something different. Or look at the args to super() and do this logic based on if super() got a type or an object. Ideally this would mirror what cpython does.
Contributor
Author
There was a problem hiding this comment.
I agree, it looks hacky. Would it be acceptable to save cleaning this for a future PR (perhaps by someone from torchdynamo team)? If not, I can take a further look. Not sure on the level of risk of leaving this in to fix later vs fixing now.
Contributor
|
This is getting pretty close! Thanks for working on it. |
2e69c7a to
1076fd6
Compare
Summary: This is a WIP to eventually enable `__torch_function__` support in torchdynamo. Currently some things are broken, putting up to github so it's easier to get some help on resolving them. What this is currently doing: 1. in `TorchVariable.call_function`, check for presence of __torch_function__ on the arguments. If it's found, inline it. 2. in `GetAttrVariable.call_function`, check for `super().__torch_function__` which resolves to the original. If it's found, stop inlining and insert the function call into the graph. For now this is all hardcoded to `F.sigmoid` for testing, this will be removed after I have something e2e working. Also, for now we disable graph breaks on all tensor subclasses. This needs to be changed with some kind of registration API before landing, saving that until after e2e works. The current test just creates a `__torch_function__` override which doesn't do anything but call `super().__torch_function__`, step 1 is just to get this to work without errors. Followups on actual logic inside the override will be tested in future PRs. Current status: a. tracing through the override works correctly (I think) b. during guard resolution, the guard on `__torch_function__` does not work (need to figure this out next) Test plan: ``` pytest -vsk test_simple_torch_function // currently fails with https://www.internalfb.com/phabricator/paste/view/P496842415 ```
jansel
approved these changes
Apr 25, 2022
vkuzo
added a commit
that referenced
this pull request
Apr 26, 2022
Summary: In #167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ```
vkuzo
added a commit
that referenced
this pull request
Apr 26, 2022
Summary: In #167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ```
vkuzo
added a commit
that referenced
this pull request
Apr 26, 2022
Summary: In #167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ```
vkuzo
added a commit
that referenced
this pull request
Apr 27, 2022
Summary: In #167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ```
vkuzo
added a commit
that referenced
this pull request
Apr 27, 2022
Summary: In #167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ```
shiyu22
pushed a commit
to towhee-io/towhee-compiler
that referenced
this pull request
Sep 9, 2022
. . . . . . . . . . . . . . . . . . . . . Add README.md Improve counters and stats Constant control flow Support some function calls Support calling submodules and methods Add profiler to measure coverage Measure overheads with TorchBench Refactor tests Support for unpacking, inplace, and matmul op Rewrite how guards work Cleanup and refactoring Linting, formatting, and documentation Fix crashes and add torchdynamo.reset() Disable list arg unpacking Support control flow with graph prefix Minor refactoring and naming Improve support for partial graphs Increase coverage of comparisons, constants, modules Fix for handling of iterators Extract multiple graphs from control flow Refactor binary ops Handle more type of jump instructions Support wrapping `Real` types (#1) Allow using nn.Modules inside a list (#2) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Fix key error (#4) Fix broken tests Add support for EXTENDED_ARG TorchBench and debugging improvements Add support for staticmethod Improve handling of unsupported variables Allow using Tensors inside a list/tuple (#3) * Allow using nn.Modules inside a list * Allow using Tensors inside a list/tuple Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Implement `MAKE_FUNCTION` (#5) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Stricter typing and support nn.Sequential Support tuple returns (#6) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Support global loads of bools (#7) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Support `len` (#8) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Fix bug in LOAD_GLOBAL Support float constructor (#10) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Implement `isinstance` (#9) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Revamp resuming after unsupported things Livevars and constant folding Optional dtype/device/rank/shape specialization Specialization and mutation guards Improve support for config objects Add support for IMPORT_NAME Early work on optimizations and autotuning Support for sizes at input args and dicts Minor refactoring Support for closures Support ModuleList and strings Support nested functions with closures Fix build for gcc-7 Minor fixes for avx512 machines Clean up stack size handling Rewrite how graph resume works Support for super() Support zip and enumerate Refactor variable construction into seperate file Clean up handling of variable sources Dictionary support and improve arg handling Add IPEX backend Improve handling of lists/dicts Fix python 3.7 issues Support module constants Support basic list mutation Support list comprehensions Support for inline nn.Softmax() Refactor and cleanup call_function Support inlining generators Dynamic .shape/.ndim support Skip transformers.file_utils Support property/classmethod Support dict.__setitem__ Break graph on STORE_ATTR Garbage collect generated code Add trt backends Improve cuda backends Support latest pytorch/torchenchmark Fixes in TRT baseline script Fixes for GPU measurement Refactor optimization backends and tuning Retry autotuning failures Fix static runtime backend Fix lints Cleanup list packing/unpacking Fixes for new torchbench version Work around crashes in static runtime Fix vision_maskrcnn/detectron2_maskrcnn Switch to alternate version of onnx2trt Skip pyhpc_turbulent_kinetic_energy Add backends to skipfiles Fix weakref handling Specialize on torch.is_tensor and torch.is_floating_point Analysis and functionalization passes Add dynamic dtype/device/shape propogation Support namedtuple and dtype constants Fix as_tensor issue Improve support for range() Config flag to control normalization Fix __len__ issue Support list.pop() Allow inlining methods on UnsupportedVariable Update README.md (#13) [Backend][TVM] Support boolean as output (#14) Remove extra call to torch.jit.fuser Improve support for list/dict/len/str Support list.extend and dict.update Support for no_grad/enable_grad Improve coverage of huggingface models Refactor stack_op implementation Fix chunk method in longformer Fix bug with calls between nested functions Support hasattr(namedtuple, ...) Support for len(inspect.signature(fn).parameters) Support for autograd.Function Allow adding a BaseListVariable and a ConstantVariable together if the latter is an iterable (#15) Refactor torchbench.py to use subprocess isolation Switch project to use isort import format Rewrite README.md Improve docs in ./torchbench.py --help Allow changing torchdynamo.config.dynamic_shapes without recompile Refactor offline autotuner Bugfix for bias towards eager Add online version of autotuner Fix some backend exceptions Fix for bool inputs Work around TensorRT abort() on group_norm Improve error printing when backend fails Fix bug with int64 in onnxrt Add isolation to baseline runs Fix some errors in copy_ for backend testing Skip TRT for einsum models Split out fixed_strategy1/fixed_strategy2 Disable TRT bypassing Support direct calls to module.forward Support for BUILD_TUPLE_UNPACK_WITH_CALL Support map/reduce/sum of tensors Split test_functions into two files Fix reconstructing nested attrs Improve support for HuggingFace ModelOutput() wrapper Improve support for zip and __contains__ Fix issue with list multiply Adding functorch to skipfile (#16) Adding AOT Autograd API for inference (#17) Add note about skipfiles Fixes for latest torchbenchmark version Support list mutation side effects Refactor codegen related things into codegen.py Refactor graph generation related things into output_graph.py Support inlining super() calls of nn.Module subclasses Support some simple cases of try/except Refactor variable_tracker.py into many files Support builtins module (#18) Adding training optimizations (#19) Allow constant folding through set() Support for property/__getattr__ on user defined classes Support more cases of varargs calls Add nopython=True whole-program graph mode Improve support for dataclasses Refactor BuiltinVariable call_function handling Add support for tuple iadd Add support for UNPACK_EX bytecode Improve handling for tuple constants Support BUILD_LIST_UNPACK with tensor args Support numpy integer constants Support module.__class__.__name__ Add support for dict mutation side effects Fix composability with FX generated code Fix off by one bug in profile operator counts Avoid over-specializing on dynamically created nn.Modules Helper functions for AOT Autograd testing (#20) Skip networkx for AOT (#21) Run `make format` Move non-specialized nn.Module handling to UnspecializedNNModuleVariable Reuse generated code when control flow paths converge Support resume while inside 'with no_grad()' Support tuple_iterator Support for getitem with default value Support type(obj) calls Fix bug in __getattr__ handling add ltc backend (#23) OSS Automated Fix: Addition of Code of Conduct (#25) OSS Automated Fix: Addition of Contributing (#24) Minor refactor in side_effects.py `make format` and lint issues Add lint workflow Add LICENSE Update CONTRIBUTING.md Add test workflow Update github workflows Skipping logging module (#26) Initial support for setattr side effects Improve coverage statistics measurement Support object creation side effects Improve support for torch.distributions Allow graph breaks on unsupported getitem Remove nn.Sequential from skipfiles Add CITATION.cff Improve support for dynamic mutation of nn.Modules Bugfix for mutating mutated attributes Adding Torchbench training support (#27) Deduplicate FX graph outputs Support list.clear() Adding missed random state reset (#28) Fix issue with maskrcnn Remove unneeded guards Support HF ModelOutput() wrapper class Avoid compiling the output of user_compiler Fix ./torchbench.py --nothing AOT Autograd fixes for moco, resnet50_qat, pytorch_struct (#29) Save some memory while profiling Fix AOT autograd bug where dynamo tries to compile generated backwards Fix guards for 'mod.0.bias' attributes Convert tests to use public API Workaround for AOT Autograd LSTM bug (#31) Fix handling of torch.manual_seed Add `with torchdynamo.disable()` context manager Support itertools.{chain,islice} Support Tensor.is_quantized Support multiple threads using TorchDynamo Improve support and testing of dynamic shapes Workaround for issue in hf_Bart in dynamic shape mode Don't directly import from _eval_frame Fix aliasing issue in #30 Support staticmethod/classmethod on user defined classes Improve support for closures and dunder methods Fix bug in handling of type annotations Support 3+ nestings of closures Fix threading issue for autograd threads Add pthon autograd test case IMPORT_NAME Instruction - Import the top-level package (#47) hf_T5 dataclass fields handling (#50) Using mean instead of sum to have reasonable loss value for backprop (#51) Allow passing a string with a backend name to torchdynamo.optimize Support nesting torchdynamo.optimize() decorated functions Support __init__ of HF ModelOutput (#65) Fill in missing fields in setup.py Fix pip install issue Add python key tracing backend Fix build on M1 Max Mac (#63) Fix lint Fx2trt integration improvement (#71) Refactor python_key_normalize Handle mutation propagated by getitem (#83) Torchbench changes for AOT Autograd (#84) Fix string-based backend mode Fix linter Fix bug when torch and torchdynamo are in the same folder Fx2trt pr2 (#97) Fix lint github action Add update 6 link to readme (#99) Fix torchbench.py --nothing option (#100) Skip tacotron2 and unskip vision_maskrcnn in torchbench.py (#102) Add developer setup section to README.md (#105) Fx2trt pr3 (#110) Use Low overhead version AOT Module (#113) Cleanup AOTAutograd related args (#114) Add extra tolerance for some GPU models in Torchbench (#116) Enabling few more torchbench models with AOT Autograd (#127) Support formatted literal strings (f-strings) (#128) * Support formatted literal strings (f-strings) * TensorVariable var_getattr supports __class__ and add test case * Address review comments * Fix lint error add an option to randomize inputs (#130) Fix typo in IPEX backend (#126) Handle torch size (#139) Remove `gc.collect()` upon every model compute run. (#140) Make STORE_SUBSCR break when unsupported. (#142) * Make STORE_SUBSCR break when unsupported. I probably could have done a bit more but this is enough to fix the issue and I'll let someone more intrepid get this going comprehensively. I'm also not sure how to test this. Fixes #131 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Fix Python 3.7 compatibility issues & Add testing action workflow (#143) Handle torch.cuda.current_device (#146) Handle torch.seed (#141) Handle torch.override.is_tensor_like (#144) * Handle torch.override.is_tensor_like * Making it a constant * Comments * Conflict Preserve CUDA rng states during frame analysis (#147) * Preserve CUDA rng states during frame analysis * Retrigger CI * Retrigger CI * Debugging * Debugging * Debugging AOT Autograd Training - few more models passing (#151) * AOT Autograd Training - few more models passing * Add skip API * Skip inlining Support Slice of NNModuleList (#152) * Support Slice of NNModuleList * Comments Fix import overhead by using `importlib.util.find_spec` (#153) Fix issues in #132 (#150) Variable builder - handle slice (#155) More skipfiles (#157) * Add more skip modules Signed-off-by: Edward Z. Yang <ezyang@fb.com> moco and hf_Reformer fixes (#158) * hf_Reformer fixed * Adding moco fix Revert "Fix import overhead by using `importlib.util.find_spec` (#153)" (#160) This reverts commit dedd9fa. Add support for Python 3.9 (#154) * Changes to make TorchDynamo support Python 3.9 * Fix lint * Add Python 3.9 github test workflow * Fix typo * RERAISE to TERMINAL_OPCODES set * Make IS_OP support ConstDictVariable * Address comments Monkey patch autograd.Variable (fixes Tacotron2) (#161) Break graph on Tensor grad (#163) * Break graph on Tensor grad * Comments * Suppress warning enable support for staticmethod on superclass (#162) Summary: Adds support for tracing through this syntax: ``` class Parent(torch.nn.Module): @classmethod def foo(cls, x): x = x + x return x class Child(Parent): @classmethod def helper(cls, x): // resolving super().foo failed before this PR x = super().foo(x) return x def forward(self, x): x = self.helper(x) return x ``` This is useful for eventually enabling __torch_function__ support. Test plan: ``` pytest -vsk test_super_static_method ``` Options to print fx/aot traces (#164) enable basic __torch_function__ support (#167) Summary: This adds a skeleton for `__torch_function__` support in torchdynamo. What this is currently doing: 1. in variable builder, check for __torch_function__ and wrap tensors in TensorTFOverrideVariable if found 2. in TorchVariable.call_function, inline the __torch_function__ function of TensorTFOverrideVariable arguments 3. in GetAttrVariable.call_function, check for super().__torch_function__ which resolves to the original. If it's found, stop inlining and insert the function call into the graph. The current test just creates a __torch_function__ override which doesn't do anything but call super().__torch_function__. Things left for future PRs: * supporting call_method * supporing actual logic inside the overrides * implementing the full __torch_function__ spec (currently things are hardcoded to first argument only) Test plan: ``` pytest -vsk test_simple_torch_function // used to fail with https://www.internalfb.com/phabricator/paste/view/P496842415 // currently passes ``` Revert the monkey patching of variable, fixed in PyTorch (#173) Miscellaneous small fixes and lints (#168) Fixes for pytorch tests - 1/n (#174) * Fixes for pytorch tests - 1/n * Better comment modify 'ipex' backend (#166) Add torchdynamo.config.raise_on_backend_error (#177) Fix for module returning (Tensor, None) (#176) Fixes for pytorch tests 2/n - torch.Size and nn.Parameter (#182) * Torch testing - Fix bugs for torch.Size and nn.Parameter * CI failures extend __torch_function__ support to `call_method` (#181) Summary: In pytorch/torchdynamo#167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ``` Add supports for Python 3.10 (#172) * Add supports for Python 3.10 * fix lint * Add github test workflow for python 3.10 * Add lnotab and linetable writer test case * Fix lint * Fix lint * Fix lint * Split the unit test * Remove some requirements and fix lint * Add several new bytecodes in Python 3.10 * add Cython as requirements * set numpy version * Update README.md Fix Pytorch tests 3/n - Skip exec frame (#184) * Fix Pytorch tests 3/n - Skip exec frame * CI failure [PR] pass rewrite for supporting setitem (#188) * temp changes * add pass rewrite for setitem * linter Add --float16/--float32/--cosine options to torchbench.py (#189) Don't import unused third party packages (#193) Early version of TorchInductor (#190) Fix for `KeyError: 'Size'` error (#194) Add xarray to skipfiles (#196) Update URLs to github.com/pytorch/torchdynamo (#199) Add `./torchbench.py --fast` option (#198) Pass backend-related ctx to TorchDynamo Optimize Context (#201) * Pass backend-related ctx to TorchDynamo Optimize Context * Reinit the backend ctx for every frame * Doc Detect x.new(torch.Size) and rewrite to torch.empty(tuple) (#195) * Detect x.new(torch.Size) and rewrite to torch.empty(tuple) * address comments Skip frames when no graph is found (#205) support inlining __torch_function__ with reading from closure (#197) Summary: The previous PRs to add `__torch_function__` support inlined through `__torch_function__` without adding any guards for the function. This worked for simple cases, but did not work if `__torch_function__` needs to read a nonlocal variable, for example: https://gist.github.com/vkuzo/a3388fcaa532318d049368e96652b366 The reason it was broken is because the code which bound arguments during inlining had to have a reference to a source in order to bind things properly. One way to fix this is to get the source of the `__torch_function__` attribute of the original tensor, guard on it, and persist it through all the rewrapping logic. I'm flexible if there is a better alternative, lmk. Test plan: ``` pytest -vsk test_torch_function_with_closure ``` [inductor] add lowerings for hardswish/hardsigmoid/hardtanh (#200) [inductor] Handle +/-inf constants (#210) Replace tensor.new with tensor.new_empty (#212) add configuration for modules eligible for inlining (#208) Summary: Makes the source modules for `skipfiles.is_torch_inline_allowed` configurable. This is needed for DBR quant integration exploration, we can now override this config to allow torchdynamo to inline DBR quant utility functions. Test plan: Run this: https://gist.github.com/vkuzo/010e0483c9bbb35837cc9cb27c555243 it now advances past the error of "inlining in skipfiles" [inductor] fix transposed convolution shape formula (#202) Try finally block for with context on graph break instruction (#213) * Try finally block for with context on graph break instruction * fix test * Support >= 3.9 * Support python 3.7 * Comments * Replacing the global load with GlobalSource and reconstruct Remove nn.Parameter filter hack for AOTAutograd backend (#214) Fix test failures (#218) Fix slicing list returning wrong result (#222) * Correct ListVariable source * Fix lint Remove reference cycle (#223) Refactor ConstDictVariable to support user_cls and use dict by default (#226) Pin inductor CI to specific pytorch version (#229) [inductor] support torch.linspace and torch.tensor (#217) [inductor] add heuristic to decide layouts and loop orders (#216) Revert "Fix slicing list returning wrong result (#222)" (#231) This reverts commit 243222e. Add equal_nan option to torchdynamo.testing.same() (#232) Support device constants (#230) Bail out for __setattr__ and fix ClassVariable handling (#227) - avoid compiling __setattr__ functions as they may be difficult to correctly handle for arbitrary custom classes, but also aren't likely to be useful for torch module optimization - expand the condition for constructing UserDefinedClassVariable to include ABCMeta classes via `inspect.isclass` check Remove reference cycle - with exceptions (#228) * Remove reference cycle - with exceptions * Fix for InliningInstructionTranslator Add fix for writing to closures (#233) * Add fix for writing to closures * run black * one more time Co-authored-by: Elias Ellison <eellison@devfair044.h1.fair> Delete example value for unused args (#234) Fix list slice & ConstantVariable to TupleVariable conversion missing source info (#235) * Fix list slice & ConstantVariable to TupleVariable conversion miss source info. * Update test cases * Address comment [inductor] early memory reuse and new operators (#237) Add type checking in Constant match - Fix Pytorch tests 4/n (#238) * Add type checking in Constant match * Fix test enable tracing through enum comparison (#245) add support for tracing torch.nn.ModuleDict.__contains__ (#246) Summary: Adds support for tracing through this syntax: ``` class M(torch.nn.Module): def __init__(self, module_dict): super().__init__() self.module_dict = module_dict def forward(self, x): if "foo" in self.module_dict: x = torch.mul(x, 1.0) x = torch.add(x, 1.0) return x ``` This is useful for DBR quantization. Test plan: ``` pytest -vsk test_nn_moduledict_contains ``` enumerate supports start argument (#240) * enumerate supports start argument * address comments Make eval_frame thread safe (#239) This should make eval_frame thread safe. Currently, the eval_frame is a global object, and different threads my step on each other setting a different one. This changes the behavior to instead always* have a "shim" eval_frame which then routes to the correct behavior by looking at the thread-local associated object. This is thread safe because now the callback object is always thread safe, and we only use it to drive logic at frame eval time, as opposed to at callback registration time. Currently, the logic for None/False/Callback is kept, but the False case could be easily collapsed behind the shim in a subsequent diff. *Always here means always when dynamo is running. The shim is installed and removed based on keeping track of how many dynamo threads are running at the moment. Add support for __subclasses__ (#242) Fixes #241 Inline function jump on tensor condition should be unimplemented (#249) cast model in no-isolate mode (#244) support tracing __getitem__ of torch.nn.ModuleDict (#253) Summary: Supports tracing through ``` class ModuleDict(torch.nn.Module): def __init__(self): super().__init__() self.layers = torch.nn.ModuleDict( { "0": torch.nn.Linear(10, 10), } ) def forward(self, x): x = self.layers["0"](x) return x ``` This is useful for DBR quant. Note: handling other logic for `ModuleDict` is left for future PRs. Test plan: ``` pytest -vsk test_moduledict ``` Run torch inductor test on GPU machine (Part 1) (#258) * Run torch inductor test on GPU machine * Land scale-config first Implement verify_correctness #179 (#252) * Wrapperbackend to enable verifying corretness of backends; set config.verify_correctness as True to enable it. * move testing.same() to utils.py Skip inductor tests on older pytorch versions (#257) [inductor] Multi-devices, primtorch decomps, and many new ops (#243) enable tracing through id(nn_module_variable) (#262) Summary: Enables tracing through this syntax: ``` class M(torch.nn.Module): def forward(self, x, ref_id): self_id = id(self) if self_id == ref_id: x = torch.mul(x, 1.0) x = torch.add(x, 1.0) return x ``` This is useful for DBR quant because it uses `id(module)` for some FQN gymnastics. Test plan: ``` pytest -vsk test_id_of_nn_module ``` enable tracing through frozenset contains of PyTorch ops (#251) Summary: Enables tracing through this syntax: ``` funcs = frozenset([torch.add]) def fn(x, func): if func in funcs: x = torch.add(x, 1.0) x = torch.mul(x, 1.0) return x ``` This is useful for DBR quantization. Test plan: ``` pytest -vsk test_frozenset_torch_func_contains ``` Dump conv args into file (#261) * dump convolution args into file * add option --log-conv-args in torchbench.py Fix generation tagging new (#263) Simplify eval frame, merge _run_only (#264) Allow layout=torch.strided in new_constant (#269) Decomposition for nan_to_num (#268) [inductor] Handle non-reduction reductions (#266) Use unittest.mock.patch for test_verify_correctness (#265) [inductor] Support sort/as_tensor/LongTensor (#267) Fix inline list/dict mutation (#273) * Fix inline list/dict mutation * Fix lint * Refact inline translator's replace_all * Fix recursive inline replace * Remove debug print Support split_with_sizes (#272) Light Refactor + Add support for torch.autograd.profiler.record_profile function (#274) This diff takes GradModeVariable's logic and pulls it partially into a more generic ContextWrappingVariable base class intended for making it easier to write context managed code. [inductor] Support input and slice mutation (#275) Add recompile ux tests (#270) Just a first step, this PR adds a few tests that starts to outline a proposed UX, and proposes mechanisms for setting/checking the #recompiles and cache limit to facilitate the testing Skip non tensor frame (#248) * Skip non tensor * Skip non tensor frame * Lint * Jason comments * Add decorator functionality * Comments Prioritize class method if there is duplicated attribute name (#278) * Prioritize class method if there is duplicated attribute name * Refactor var_getattr to make it consistent with native pytorch [inductor] Improve merging of contiguous loops (#279) Add support for STORE_GLOBAL (#286) Summary: 1. Create a symbolic_global table to store a global variable name to an unique object mapping, and the unique object is further used as a key to index into the store_attr_mutations table in SideEffects. 2. The actual STORE_GLOGAL action is buffered by SideEffects and later LOAD_GLOBAL just reads from SideEffects when appropriate. STORE_GLOBAL is eventually applied after the generated graph. Skip inductor CPU tests if there is no working c++ compiler (#283) Collect guard failures into one warning at cache limit hit (#281) - avoid warning on each guard failure separately (in cases cache limit > 1) - instead, bundle a summary of gaurd failure warnings together at the time of cache limit hit [inductor] Add support for more operators (#282) [inductor] Improved indexing simplification and loop body representation (#289) All base class methods take precedence if it's a nn.Module (#290) Fx2trt pr4 (#294) * temp changes * add pass rewrite for setitem * linter * temp checkin * squeeze for normalization * code clean * comments improvement * comment out int64->int32 * linter Fx2trt pr5 (#296) * temp changes * add pass rewrite for setitem * linter * temp checkin * squeeze for normalization * code clean * comments improvement * comment out int64->int32 * linter * add a threshold for fall back to non-TRT Support guarding inf constants (#300) Pytorch tests 5/n - Graph break on MemberDescriptor type (#301) * Graph break on MemberDescriptor type * CI update_locals_and_stack should use shared cache (#302) * update_locals_and_stack should use shared cache * update SideEffects.apply to use default cache Implement verbose tensor guards check (#287) Verbose guard checks are guards used outside of the hot path for providing specific failure information to the user on compile cache miss. This PR adds support for verbose guards and implements one for the tensor guard, leaving other guards alone. * Add tensor names to tensor guard failure message Low precision support (#304) * add low precision support to torchinductor triton backend * remove temporary tests * lint * lint * lint Run clang-format on torchdynamo/_guards.cpp (#306) Fix slowdown due to generation_tagging_new (#305) * Use patched init to track dynamic modules + test gen tagging Elaborate on error message for failing tensor type match (#307) ConstDictVariable reconstruct should keep original order (#308) [inductor] Minor fixes for latest PyTorch and benchmark harness codegen (#309) Verbose guard check Bugfix (#311) * Bugfix and clang format * It wasn't will, it was me - Clang formatter PyTorch tests - 6/n - Add type check for list/tuple elems in CONSTANT_MATCH (#303) * Add type check for list/tuple elems in CONSTANT_MATCH * recursive length guarding * All decomp tests pass * Filter out only useful guards Add torchdynamo.allow_in_graph and torchdynamo.disallow_in_graph (#295) Fix CI (#316) * DONT MERGE - Checking CI * fix * fix Fix repro test to unblock internal sync (#315) Add constant checks for list of numpy integers (#313) Skip inductor tests if sympy is missing (#320) [Easy] fix reference to removed variable in debug trace (#323) Add a basic compilation profiler (#312) * Add a basic compilation profiler * Include graph break reasons in compilation report * lint and import issues * Add --recompile_profiler option to torchbench.py Filter out unimportant modules from allowed modules (#324) * Filter out unimportant modules from allowed modules * Remove typo * Further cleanup * CI testing * Michael's comment * Remove few more meaningless things * Jason's comments Prevent guard creation from accessing objects between __new__ and __init__ (#322) Initial implementation of UnspecializedPrimitiveVariable (#321) * Initial implementation of UnspecializedPrimitiveVariable * Update heuristic * Add test for no recompilations for different values [inductor] Initial suppport for tiling output code (#317) Fix test skip when sympy is missing (#333) remove unconditional sympy import from test_torchdinductor.py (#334) * remove unconditional sympy import from test_torchdinductor.py conv in triton (#310) * general conv and conv1x1 implementation in triton * correctness check with torch baseline * benchmarking on resnet50 layers * enable `triton_ops.conv` to replace `aten.convolution` by setting config.triton.use_conv as True [inductor] Minor float16 fix (#338) update IPEX backend (#344) Add _refs and _prims to the allowlist This won't get exercised by real models but it's necessary so we can test that PrimTorch decomps work under dynamo. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 874bbbd8d80d07fcd089a0d57849f21b2b81756d Pull Request resolved: pytorch/torchdynamo#345 [inductor] Support additional operators (#339) [inductor] Benchmark harness for training (#337) codegen to update mutated variables with side effect should after stack value's codegen (#347) Add a master-only test that at least one PrimTorch ref can be traced nopython Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 221ff4b897648f99a4552dd955b3426a2d6cb738 Pull Request resolved: pytorch/torchdynamo#355 Convert dict to list bebefor iterate in case there is possible delete (#360) Add TROUBLESHOOTING.md (#357) - link TROUBLESHOOTING.md from README.md and from recompilation warning Addresses issue https://github.com/pytorch/torchdynamo/issues/348 Add coldstart/breakeven benchmark (#352) try with: python torchbench.py --cold-start -d cuda --training --use-eval-mode --nvfuser --isolate This actually adds 2 new benchmark metrics: coldstart: measures the worst of t_eager_compile / t_dynamo_compile as a 'speedup', where dynamo compiles twice to exercise profiling executor breakeven: predicts the number of iterations dynamo would have to run to 'break even' with eager, considering the amortization of its compile cost Not yet tested with inference or cpu, may have some other issues. Should probably adjust to repeat the whole cold-start process several times and median, but for now just does this once. Raise unimplemeted if checkpoint is empty (#351) Only wrap in TorchVariable if is allowed, not if not disallowed If you don't do this, allowed_functions_module_string_ignorelist doesn't actually affect if we try to trace these functions into the graph, since the disallowed list doesn't actually respect this config. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: fad216f8a3013caaa2814cc3efaf5348e6033ee5 Pull Request resolved: pytorch/torchdynamo#356 List append should return None (#365) Make TorchInductor use Triton's MM implementation in codegen (#325) * Rebased on upstream * Fixed make lint * Fixed make lint add yaml to requirements.txt (#367) Disabling TorchDynamo inside torch.jit and torch.onnx compiler (#361) * Disabling TorchDynamo inside torch.jit compiler * Adding trace_module * Remove the script * Also adding ONNX * Jason's comments Workaround triton float64 log issue (#379) Fix missing ir.Reduction.default_value (#378) autotune conv kernels (#364) * tuned_conv to choose the best kernel for given inputs shape, stride, layer params * set config.triton.convolution as "aten"(default), "triton" or "autotune" [inductor] Refactor fallback kernel handling (#381) Add torch._decomps to the list to trace into Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: aa557ba00ea2fc03946089d56592041a5df31275 Pull Request resolved: pytorch/torchdynamo#369 Print value of TorchVariable object I find this is helpful for debugging what exactly a given TorchVariable is; presently there is no information so it is hard to tell. Because these are PyTorch variables they should be well behaved and it shouldn't cause problems to call repr on them. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: a47f459ce2dd577b0133b9e59af0609e4123f60b Pull Request resolved: pytorch/torchdynamo#370 Skipping the namedtuple subclass constructor (#382) Adding missing guard for GET_ITER Bytecode (#386) fx2trt_oss is merged to https://github.com/pytorch/TensorRT (#385) * temp changes * fx2trt_oss is merged to https://github.com/pytorch/TensorRT * linter fix Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) (#380) * Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) * Reformatted Fix for `any` decomposition (#384) * fix lowering * restore abs * black Extend new to new_empty for tensor.shape (#387) Support for lazy modules (#391) * Support for lazy modules with test Added fixes to support lazy modules in torchdynamo. The main issue that needed to be addressed is that LazyModules register some hooks which are run when a module is called. Torchdynamo typically calls the forward method instead of __call__ so these hooks were never run. In the case of LazyModules we now run and trace the __call__ method, and allow the original module to be mutated. In the future, we could do this for all modules, but there were cases where torchdynamo does not yet support functionality used in all hooks. Enable tf32 in torchbench.py (#397) Support TensorType checking (#395) * Support TensorType checking * Update torchdynamo/variables/builtin.py Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Jason Ansel <jansel@jansel.net> [inductor] Canonicalize indexes for MemoryDep (#314) Summary: By canoicalizing indexes, we can build more accurate and flexible read-write dependency, which will allow more general kernel fusion. Suppress warnings during the pre-checks on a frame (#396) * Suppress warnings during the pre-checks on a frame * Encapsulate in PatchTorch [inductor] Do convolution bias in its own kernel (#403) [inductor] Fix convolution output stride order (#400) Added more shapes (from alexnet, BERT, hf_GPT2) to the inductor mm test (#398) Reducing overhead of has_tensor_in_frame (#406) [inductor] add manual_seed in test_round (#388) [inductor] Improve heuristic for realizing buffers (#402) [inductor] Fix typo preventing some fusions (#401) [inductor] Fix for constants on wrong device (#405) remove deepcopy in fx2trt (#407) * temp changes * remove deepcopy Fix broken link in README (#408) Adding nan (#383) * Adding nan * Jason's comments Fixing import issues for PyTorch 1.12 rc branch (#411) Raising tolerance after using tf32 (#415) [inductor] Improve handling of reductions (#404) Shape guard and isinstance fixes (#414) * Shape guard and isinstance fixes and tests * Add guard for any accessed Tensor attribute Ensure dtype instances are not mapped to TorchVariable() (#394) * Disallow dtype instances * Extract dtypes from torch automatically and add test Propagate guards from ConstDict variables (#421) Config driven support for torch.Tensor .item() (#417) [inductor] Refactor helpers into torchinductor.utils (#418) Removing generation field from the patched nn.Module (#423) * Removing generation field from the patched nn.Module * Lint * Rebseing [Inductor] Triton template for conv (#422) * TritonTemplateKernel, template_codegen, conv jinja2 template * pip install Jinja2 in setup_nightly Remove unnecessary call to clone which also caused a segfault (??) (#427) Disable TorchDynamo on frames created by fx symbolic tracing (#429) Fix for disabling triton templates (#430) Minor test fixes (#432) [inductor] Remove dead stores after fusion (#409) Summary: Use DeferredLine/DeferredIndentedBuffer to perform a lazy emit of buffer allocation/store after we determine which buffers are redundant. Add support for iterating over a dict (#436) Check if kwargs has key "fp16_mode" when determining the precision (#437) [inductor] Add heuristic to set num_warps (#433) update accuracy check for TRT fp32 (#438) * temp changes * fix an issue in fp32, change accuracy check to cosine similarity for fp32 since TRT fp32 could not meet 1e-4 Pin CI to June 20th Torch Nightly (#441) * Pin CI to June 20th Torch Nightly * Respond to commetns * duplicated set up.. [inductor] Add some prims (#431) Disabling the trace instead of symbolic trace (#443) Break graph on torch.Storage types (#428) * Break graph on torch.Storage types * Hmm, CI failing, trying Jason's suggestion * Debug CI Fix guard propagation for tuple iterators (#448) [inductor] Improve fusing of tiled + untiled (#446) Directly compute sum of a list of floats/ints (#449) * Directly compute sum of a list of floats/ints * Test Added option use_bmm to enable triton codegen for bmm (#393) * Added option use_bmm to enable triton codegen for bmm * Added more shapes to microbench Add support for crossentropy (#450) Rewrite symbolic_locals for torch.return_types (#442) * Rewrite symbolic_locals for torch.return_types * Special casing on the out kwargs * Replace == with is Remove traced op overloads before compiling (#455) * Remove op overloads * lint Raise errors when backends throw exceptions (#451) Huggingface model benchmarking (#459) [inductor] Support scatter operations (#434) Step 2 of supporting UnspecializedNumpyVariable & UnspecializedPythonVariable (#392) * Implement UnspecializedPrimitiveVariable codegen * Make UnspecilizedPrimitiveVariable as GraphArg * Update make_call_generated_code * Update min/max builtin func * Support random.random * Remove unnecessary change * Fix lint * Refactor to support multiple random.random * Refactor out unspecialized numpy and python variables * Fix RandomValueSource guard * Support multiple random functions * Rebase to updated main * Refactor out random_values_var * Fix lint * Fix lint * Move random_values_var to output graph * Add need_unwrap to distinguish unspec from x.item() * Make global rand func unique * Fix lint * Add raw value propagation for unspec variables * Fix lint * Directly load type(raw_value) & update random func example value * Fix lint Add Fake Tensor Propagation (#426) * Add Fake Tensor Propagation * extend test * lint * fix import * one more day.. * update functorch commit * bump one more day to get pytorch/pytorch#79741 * Skip test * use FakeTensorError * lint * Guard on fake tensor availability * test skips (fix en route in core) * lint * update nightly * lint * update * format * update recent [inductor] Cherry pick nll_loss_forward decomp (#456) [inductor] Register lowerings for operator overloads (#457) Add support for torch.finfo/torch.iinfo (#470) [inductor] Auto-download gcc12 from conda-forge (#471) Rename benchmarking files (#472) Add support for named_params and named_modules (#465) [inductor] Add a metric to count the number of generated kernels (#476) Summary: This can be used to prevent a regression on our fusion result. Extend python_key_normalize with support for PythonTensor class override and a post trace hook (#424) * Add support for custom class * lint * Fix unpack to reflect main * Feedback * Simplify, rebase * Lint, format Flag to skip printing of Dynamo internal exceptions (#480) Don't leak cache entry on skip Fixes pytorch/torchdynamo#477 Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 5b73c3470173b894d188c6e9290ebc3cbbef2bab Pull Request resolved: pytorch/torchdynamo#478 Reduction cache (#487) * enable cse for reductions * self.cse use fake tensors when deep copying model to check mutation (#486) * use fake tensors when deepcopying model to check mutation * fix fake tensors not available * add tests Added cos lowering (#492) fix microbenchmarks import path (#474) Compute multilayer reductions in higher precision (#484) * Compute multilayer reductions in higher precision * Compute whole mean kernel in higher precision, downcast in end * update test * lint * skip test Fix correctness checking code and non-deterministic benchmarks (#493) Improve recompilation warning (#494) - default to printing only the most recent guard failure not one failure for each cache miss - reformat the text to be (hopefully) more readable and useful Motivation: While in some cases, knowing the individual failure reasons for each of (say, 64) cache misses could be useful, i practice it is probably good enough to know the most recent one since they tend to be similar reasons (such as incrementing counters or new object ids triggering the same type of guard). Previously: torchdynamo hit recompilation cache limit (64) for function 'toy_example' (example.py:5), due to the following guard failures: [['___guarded_code.valid'], {... 62 more times...}, ['___guarded_code.valid']]to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md Now: torchdynamo hit config.cache_size_limit (64) function: 'toy_example' (example.py:5) reasons: ['___guarded_code.valid'] Add a new backend option for TVM's meta_schedule (#479) Add (experimental) support for exporting a graph and guards (#469) Makefile/packaging updates (#499) [inductor] Misc small improvements (#475) Adding logging config (#504) More Huggingface models (#500) * More Huggingface models (from simple_dl) * Comments Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326) Revert "Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)" (#509) This reverts commit c747580. Disable torchdynamo inside dispatch_trace (#508) [inductor] Support rand()/dropout() (#505) [inductor] Workaround triton bug with XBLOCK=1 (#510) Reduction deps (#502) * mark more reduction dependencies * cleanup * black * make sure canonicalization prefix is the same everywhere change tuning trial of meta schedule (#513) [inductor] Fix bug with invalidated reuse (#506) [inductor] Handle no-op slice_scatter (#507) [WIP] Adding AMP support in benchmark infra (#464) [WIP][Discussion] Write out a deeper documentation on how we go from … (#498) * [WIP][Discussion] Write out a deeper documentation on how we go from user code to producing guards * Update GuardsOverviewPt1.md * Update GuardsOverviewPt1.md
shiyu22
pushed a commit
to towhee-io/towhee-compiler
that referenced
this pull request
Sep 9, 2022
. . . . . . . . . . . . . . . . . . . . . Add README.md Improve counters and stats Constant control flow Support some function calls Support calling submodules and methods Add profiler to measure coverage Measure overheads with TorchBench Refactor tests Support for unpacking, inplace, and matmul op Rewrite how guards work Cleanup and refactoring Linting, formatting, and documentation Fix crashes and add torchdynamo.reset() Disable list arg unpacking Support control flow with graph prefix Minor refactoring and naming Improve support for partial graphs Increase coverage of comparisons, constants, modules Fix for handling of iterators Extract multiple graphs from control flow Refactor binary ops Handle more type of jump instructions Support wrapping `Real` types (#1) Allow using nn.Modules inside a list (#2) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Fix key error (#4) Fix broken tests Add support for EXTENDED_ARG TorchBench and debugging improvements Add support for staticmethod Improve handling of unsupported variables Allow using Tensors inside a list/tuple (#3) * Allow using nn.Modules inside a list * Allow using Tensors inside a list/tuple Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Implement `MAKE_FUNCTION` (#5) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Stricter typing and support nn.Sequential Support tuple returns (#6) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Support global loads of bools (#7) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Support `len` (#8) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Fix bug in LOAD_GLOBAL Support float constructor (#10) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Implement `isinstance` (#9) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Revamp resuming after unsupported things Livevars and constant folding Optional dtype/device/rank/shape specialization Specialization and mutation guards Improve support for config objects Add support for IMPORT_NAME Early work on optimizations and autotuning Support for sizes at input args and dicts Minor refactoring Support for closures Support ModuleList and strings Support nested functions with closures Fix build for gcc-7 Minor fixes for avx512 machines Clean up stack size handling Rewrite how graph resume works Support for super() Support zip and enumerate Refactor variable construction into seperate file Clean up handling of variable sources Dictionary support and improve arg handling Add IPEX backend Improve handling of lists/dicts Fix python 3.7 issues Support module constants Support basic list mutation Support list comprehensions Support for inline nn.Softmax() Refactor and cleanup call_function Support inlining generators Dynamic .shape/.ndim support Skip transformers.file_utils Support property/classmethod Support dict.__setitem__ Break graph on STORE_ATTR Garbage collect generated code Add trt backends Improve cuda backends Support latest pytorch/torchenchmark Fixes in TRT baseline script Fixes for GPU measurement Refactor optimization backends and tuning Retry autotuning failures Fix static runtime backend Fix lints Cleanup list packing/unpacking Fixes for new torchbench version Work around crashes in static runtime Fix vision_maskrcnn/detectron2_maskrcnn Switch to alternate version of onnx2trt Skip pyhpc_turbulent_kinetic_energy Add backends to skipfiles Fix weakref handling Specialize on torch.is_tensor and torch.is_floating_point Analysis and functionalization passes Add dynamic dtype/device/shape propogation Support namedtuple and dtype constants Fix as_tensor issue Improve support for range() Config flag to control normalization Fix __len__ issue Support list.pop() Allow inlining methods on UnsupportedVariable Update README.md (#13) [Backend][TVM] Support boolean as output (#14) Remove extra call to torch.jit.fuser Improve support for list/dict/len/str Support list.extend and dict.update Support for no_grad/enable_grad Improve coverage of huggingface models Refactor stack_op implementation Fix chunk method in longformer Fix bug with calls between nested functions Support hasattr(namedtuple, ...) Support for len(inspect.signature(fn).parameters) Support for autograd.Function Allow adding a BaseListVariable and a ConstantVariable together if the latter is an iterable (#15) Refactor torchbench.py to use subprocess isolation Switch project to use isort import format Rewrite README.md Improve docs in ./torchbench.py --help Allow changing torchdynamo.config.dynamic_shapes without recompile Refactor offline autotuner Bugfix for bias towards eager Add online version of autotuner Fix some backend exceptions Fix for bool inputs Work around TensorRT abort() on group_norm Improve error printing when backend fails Fix bug with int64 in onnxrt Add isolation to baseline runs Fix some errors in copy_ for backend testing Skip TRT for einsum models Split out fixed_strategy1/fixed_strategy2 Disable TRT bypassing Support direct calls to module.forward Support for BUILD_TUPLE_UNPACK_WITH_CALL Support map/reduce/sum of tensors Split test_functions into two files Fix reconstructing nested attrs Improve support for HuggingFace ModelOutput() wrapper Improve support for zip and __contains__ Fix issue with list multiply Adding functorch to skipfile (#16) Adding AOT Autograd API for inference (#17) Add note about skipfiles Fixes for latest torchbenchmark version Support list mutation side effects Refactor codegen related things into codegen.py Refactor graph generation related things into output_graph.py Support inlining super() calls of nn.Module subclasses Support some simple cases of try/except Refactor variable_tracker.py into many files Support builtins module (#18) Adding training optimizations (#19) Allow constant folding through set() Support for property/__getattr__ on user defined classes Support more cases of varargs calls Add nopython=True whole-program graph mode Improve support for dataclasses Refactor BuiltinVariable call_function handling Add support for tuple iadd Add support for UNPACK_EX bytecode Improve handling for tuple constants Support BUILD_LIST_UNPACK with tensor args Support numpy integer constants Support module.__class__.__name__ Add support for dict mutation side effects Fix composability with FX generated code Fix off by one bug in profile operator counts Avoid over-specializing on dynamically created nn.Modules Helper functions for AOT Autograd testing (#20) Skip networkx for AOT (#21) Run `make format` Move non-specialized nn.Module handling to UnspecializedNNModuleVariable Reuse generated code when control flow paths converge Support resume while inside 'with no_grad()' Support tuple_iterator Support for getitem with default value Support type(obj) calls Fix bug in __getattr__ handling add ltc backend (#23) OSS Automated Fix: Addition of Code of Conduct (#25) OSS Automated Fix: Addition of Contributing (#24) Minor refactor in side_effects.py `make format` and lint issues Add lint workflow Add LICENSE Update CONTRIBUTING.md Add test workflow Update github workflows Skipping logging module (#26) Initial support for setattr side effects Improve coverage statistics measurement Support object creation side effects Improve support for torch.distributions Allow graph breaks on unsupported getitem Remove nn.Sequential from skipfiles Add CITATION.cff Improve support for dynamic mutation of nn.Modules Bugfix for mutating mutated attributes Adding Torchbench training support (#27) Deduplicate FX graph outputs Support list.clear() Adding missed random state reset (#28) Fix issue with maskrcnn Remove unneeded guards Support HF ModelOutput() wrapper class Avoid compiling the output of user_compiler Fix ./torchbench.py --nothing AOT Autograd fixes for moco, resnet50_qat, pytorch_struct (#29) Save some memory while profiling Fix AOT autograd bug where dynamo tries to compile generated backwards Fix guards for 'mod.0.bias' attributes Convert tests to use public API Workaround for AOT Autograd LSTM bug (#31) Fix handling of torch.manual_seed Add `with torchdynamo.disable()` context manager Support itertools.{chain,islice} Support Tensor.is_quantized Support multiple threads using TorchDynamo Improve support and testing of dynamic shapes Workaround for issue in hf_Bart in dynamic shape mode Don't directly import from _eval_frame Fix aliasing issue in #30 Support staticmethod/classmethod on user defined classes Improve support for closures and dunder methods Fix bug in handling of type annotations Support 3+ nestings of closures Fix threading issue for autograd threads Add pthon autograd test case IMPORT_NAME Instruction - Import the top-level package (#47) hf_T5 dataclass fields handling (#50) Using mean instead of sum to have reasonable loss value for backprop (#51) Allow passing a string with a backend name to torchdynamo.optimize Support nesting torchdynamo.optimize() decorated functions Support __init__ of HF ModelOutput (#65) Fill in missing fields in setup.py Fix pip install issue Add python key tracing backend Fix build on M1 Max Mac (#63) Fix lint Fx2trt integration improvement (#71) Refactor python_key_normalize Handle mutation propagated by getitem (#83) Torchbench changes for AOT Autograd (#84) Fix string-based backend mode Fix linter Fix bug when torch and torchdynamo are in the same folder Fx2trt pr2 (#97) Fix lint github action Add update 6 link to readme (#99) Fix torchbench.py --nothing option (#100) Skip tacotron2 and unskip vision_maskrcnn in torchbench.py (#102) Add developer setup section to README.md (#105) Fx2trt pr3 (#110) Use Low overhead version AOT Module (#113) Cleanup AOTAutograd related args (#114) Add extra tolerance for some GPU models in Torchbench (#116) Enabling few more torchbench models with AOT Autograd (#127) Support formatted literal strings (f-strings) (#128) * Support formatted literal strings (f-strings) * TensorVariable var_getattr supports __class__ and add test case * Address review comments * Fix lint error add an option to randomize inputs (#130) Fix typo in IPEX backend (#126) Handle torch size (#139) Remove `gc.collect()` upon every model compute run. (#140) Make STORE_SUBSCR break when unsupported. (#142) * Make STORE_SUBSCR break when unsupported. I probably could have done a bit more but this is enough to fix the issue and I'll let someone more intrepid get this going comprehensively. I'm also not sure how to test this. Fixes #131 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Fix Python 3.7 compatibility issues & Add testing action workflow (#143) Handle torch.cuda.current_device (#146) Handle torch.seed (#141) Handle torch.override.is_tensor_like (#144) * Handle torch.override.is_tensor_like * Making it a constant * Comments * Conflict Preserve CUDA rng states during frame analysis (#147) * Preserve CUDA rng states during frame analysis * Retrigger CI * Retrigger CI * Debugging * Debugging * Debugging AOT Autograd Training - few more models passing (#151) * AOT Autograd Training - few more models passing * Add skip API * Skip inlining Support Slice of NNModuleList (#152) * Support Slice of NNModuleList * Comments Fix import overhead by using `importlib.util.find_spec` (#153) Fix issues in #132 (#150) Variable builder - handle slice (#155) More skipfiles (#157) * Add more skip modules Signed-off-by: Edward Z. Yang <ezyang@fb.com> moco and hf_Reformer fixes (#158) * hf_Reformer fixed * Adding moco fix Revert "Fix import overhead by using `importlib.util.find_spec` (#153)" (#160) This reverts commit dedd9fa. Add support for Python 3.9 (#154) * Changes to make TorchDynamo support Python 3.9 * Fix lint * Add Python 3.9 github test workflow * Fix typo * RERAISE to TERMINAL_OPCODES set * Make IS_OP support ConstDictVariable * Address comments Monkey patch autograd.Variable (fixes Tacotron2) (#161) Break graph on Tensor grad (#163) * Break graph on Tensor grad * Comments * Suppress warning enable support for staticmethod on superclass (#162) Summary: Adds support for tracing through this syntax: ``` class Parent(torch.nn.Module): @classmethod def foo(cls, x): x = x + x return x class Child(Parent): @classmethod def helper(cls, x): // resolving super().foo failed before this PR x = super().foo(x) return x def forward(self, x): x = self.helper(x) return x ``` This is useful for eventually enabling __torch_function__ support. Test plan: ``` pytest -vsk test_super_static_method ``` Options to print fx/aot traces (#164) enable basic __torch_function__ support (#167) Summary: This adds a skeleton for `__torch_function__` support in torchdynamo. What this is currently doing: 1. in variable builder, check for __torch_function__ and wrap tensors in TensorTFOverrideVariable if found 2. in TorchVariable.call_function, inline the __torch_function__ function of TensorTFOverrideVariable arguments 3. in GetAttrVariable.call_function, check for super().__torch_function__ which resolves to the original. If it's found, stop inlining and insert the function call into the graph. The current test just creates a __torch_function__ override which doesn't do anything but call super().__torch_function__. Things left for future PRs: * supporting call_method * supporing actual logic inside the overrides * implementing the full __torch_function__ spec (currently things are hardcoded to first argument only) Test plan: ``` pytest -vsk test_simple_torch_function // used to fail with https://www.internalfb.com/phabricator/paste/view/P496842415 // currently passes ``` Revert the monkey patching of variable, fixed in PyTorch (#173) Miscellaneous small fixes and lints (#168) Fixes for pytorch tests - 1/n (#174) * Fixes for pytorch tests - 1/n * Better comment modify 'ipex' backend (#166) Add torchdynamo.config.raise_on_backend_error (#177) Fix for module returning (Tensor, None) (#176) Fixes for pytorch tests 2/n - torch.Size and nn.Parameter (#182) * Torch testing - Fix bugs for torch.Size and nn.Parameter * CI failures extend __torch_function__ support to `call_method` (#181) Summary: In pytorch/torchdynamo#167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ``` Add supports for Python 3.10 (#172) * Add supports for Python 3.10 * fix lint * Add github test workflow for python 3.10 * Add lnotab and linetable writer test case * Fix lint * Fix lint * Fix lint * Split the unit test * Remove some requirements and fix lint * Add several new bytecodes in Python 3.10 * add Cython as requirements * set numpy version * Update README.md Fix Pytorch tests 3/n - Skip exec frame (#184) * Fix Pytorch tests 3/n - Skip exec frame * CI failure [PR] pass rewrite for supporting setitem (#188) * temp changes * add pass rewrite for setitem * linter Add --float16/--float32/--cosine options to torchbench.py (#189) Don't import unused third party packages (#193) Early version of TorchInductor (#190) Fix for `KeyError: 'Size'` error (#194) Add xarray to skipfiles (#196) Update URLs to github.com/pytorch/torchdynamo (#199) Add `./torchbench.py --fast` option (#198) Pass backend-related ctx to TorchDynamo Optimize Context (#201) * Pass backend-related ctx to TorchDynamo Optimize Context * Reinit the backend ctx for every frame * Doc Detect x.new(torch.Size) and rewrite to torch.empty(tuple) (#195) * Detect x.new(torch.Size) and rewrite to torch.empty(tuple) * address comments Skip frames when no graph is found (#205) support inlining __torch_function__ with reading from closure (#197) Summary: The previous PRs to add `__torch_function__` support inlined through `__torch_function__` without adding any guards for the function. This worked for simple cases, but did not work if `__torch_function__` needs to read a nonlocal variable, for example: https://gist.github.com/vkuzo/a3388fcaa532318d049368e96652b366 The reason it was broken is because the code which bound arguments during inlining had to have a reference to a source in order to bind things properly. One way to fix this is to get the source of the `__torch_function__` attribute of the original tensor, guard on it, and persist it through all the rewrapping logic. I'm flexible if there is a better alternative, lmk. Test plan: ``` pytest -vsk test_torch_function_with_closure ``` [inductor] add lowerings for hardswish/hardsigmoid/hardtanh (#200) [inductor] Handle +/-inf constants (#210) Replace tensor.new with tensor.new_empty (#212) add configuration for modules eligible for inlining (#208) Summary: Makes the source modules for `skipfiles.is_torch_inline_allowed` configurable. This is needed for DBR quant integration exploration, we can now override this config to allow torchdynamo to inline DBR quant utility functions. Test plan: Run this: https://gist.github.com/vkuzo/010e0483c9bbb35837cc9cb27c555243 it now advances past the error of "inlining in skipfiles" [inductor] fix transposed convolution shape formula (#202) Try finally block for with context on graph break instruction (#213) * Try finally block for with context on graph break instruction * fix test * Support >= 3.9 * Support python 3.7 * Comments * Replacing the global load with GlobalSource and reconstruct Remove nn.Parameter filter hack for AOTAutograd backend (#214) Fix test failures (#218) Fix slicing list returning wrong result (#222) * Correct ListVariable source * Fix lint Remove reference cycle (#223) Refactor ConstDictVariable to support user_cls and use dict by default (#226) Pin inductor CI to specific pytorch version (#229) [inductor] support torch.linspace and torch.tensor (#217) [inductor] add heuristic to decide layouts and loop orders (#216) Revert "Fix slicing list returning wrong result (#222)" (#231) This reverts commit 243222e. Add equal_nan option to torchdynamo.testing.same() (#232) Support device constants (#230) Bail out for __setattr__ and fix ClassVariable handling (#227) - avoid compiling __setattr__ functions as they may be difficult to correctly handle for arbitrary custom classes, but also aren't likely to be useful for torch module optimization - expand the condition for constructing UserDefinedClassVariable to include ABCMeta classes via `inspect.isclass` check Remove reference cycle - with exceptions (#228) * Remove reference cycle - with exceptions * Fix for InliningInstructionTranslator Add fix for writing to closures (#233) * Add fix for writing to closures * run black * one more time Co-authored-by: Elias Ellison <eellison@devfair044.h1.fair> Delete example value for unused args (#234) Fix list slice & ConstantVariable to TupleVariable conversion missing source info (#235) * Fix list slice & ConstantVariable to TupleVariable conversion miss source info. * Update test cases * Address comment [inductor] early memory reuse and new operators (#237) Add type checking in Constant match - Fix Pytorch tests 4/n (#238) * Add type checking in Constant match * Fix test enable tracing through enum comparison (#245) add support for tracing torch.nn.ModuleDict.__contains__ (#246) Summary: Adds support for tracing through this syntax: ``` class M(torch.nn.Module): def __init__(self, module_dict): super().__init__() self.module_dict = module_dict def forward(self, x): if "foo" in self.module_dict: x = torch.mul(x, 1.0) x = torch.add(x, 1.0) return x ``` This is useful for DBR quantization. Test plan: ``` pytest -vsk test_nn_moduledict_contains ``` enumerate supports start argument (#240) * enumerate supports start argument * address comments Make eval_frame thread safe (#239) This should make eval_frame thread safe. Currently, the eval_frame is a global object, and different threads my step on each other setting a different one. This changes the behavior to instead always* have a "shim" eval_frame which then routes to the correct behavior by looking at the thread-local associated object. This is thread safe because now the callback object is always thread safe, and we only use it to drive logic at frame eval time, as opposed to at callback registration time. Currently, the logic for None/False/Callback is kept, but the False case could be easily collapsed behind the shim in a subsequent diff. *Always here means always when dynamo is running. The shim is installed and removed based on keeping track of how many dynamo threads are running at the moment. Add support for __subclasses__ (#242) Fixes #241 Inline function jump on tensor condition should be unimplemented (#249) cast model in no-isolate mode (#244) support tracing __getitem__ of torch.nn.ModuleDict (#253) Summary: Supports tracing through ``` class ModuleDict(torch.nn.Module): def __init__(self): super().__init__() self.layers = torch.nn.ModuleDict( { "0": torch.nn.Linear(10, 10), } ) def forward(self, x): x = self.layers["0"](x) return x ``` This is useful for DBR quant. Note: handling other logic for `ModuleDict` is left for future PRs. Test plan: ``` pytest -vsk test_moduledict ``` Run torch inductor test on GPU machine (Part 1) (#258) * Run torch inductor test on GPU machine * Land scale-config first Implement verify_correctness #179 (#252) * Wrapperbackend to enable verifying corretness of backends; set config.verify_correctness as True to enable it. * move testing.same() to utils.py Skip inductor tests on older pytorch versions (#257) [inductor] Multi-devices, primtorch decomps, and many new ops (#243) enable tracing through id(nn_module_variable) (#262) Summary: Enables tracing through this syntax: ``` class M(torch.nn.Module): def forward(self, x, ref_id): self_id = id(self) if self_id == ref_id: x = torch.mul(x, 1.0) x = torch.add(x, 1.0) return x ``` This is useful for DBR quant because it uses `id(module)` for some FQN gymnastics. Test plan: ``` pytest -vsk test_id_of_nn_module ``` enable tracing through frozenset contains of PyTorch ops (#251) Summary: Enables tracing through this syntax: ``` funcs = frozenset([torch.add]) def fn(x, func): if func in funcs: x = torch.add(x, 1.0) x = torch.mul(x, 1.0) return x ``` This is useful for DBR quantization. Test plan: ``` pytest -vsk test_frozenset_torch_func_contains ``` Dump conv args into file (#261) * dump convolution args into file * add option --log-conv-args in torchbench.py Fix generation tagging new (#263) Simplify eval frame, merge _run_only (#264) Allow layout=torch.strided in new_constant (#269) Decomposition for nan_to_num (#268) [inductor] Handle non-reduction reductions (#266) Use unittest.mock.patch for test_verify_correctness (#265) [inductor] Support sort/as_tensor/LongTensor (#267) Fix inline list/dict mutation (#273) * Fix inline list/dict mutation * Fix lint * Refact inline translator's replace_all * Fix recursive inline replace * Remove debug print Support split_with_sizes (#272) Light Refactor + Add support for torch.autograd.profiler.record_profile function (#274) This diff takes GradModeVariable's logic and pulls it partially into a more generic ContextWrappingVariable base class intended for making it easier to write context managed code. [inductor] Support input and slice mutation (#275) Add recompile ux tests (#270) Just a first step, this PR adds a few tests that starts to outline a proposed UX, and proposes mechanisms for setting/checking the #recompiles and cache limit to facilitate the testing Skip non tensor frame (#248) * Skip non tensor * Skip non tensor frame * Lint * Jason comments * Add decorator functionality * Comments Prioritize class method if there is duplicated attribute name (#278) * Prioritize class method if there is duplicated attribute name * Refactor var_getattr to make it consistent with native pytorch [inductor] Improve merging of contiguous loops (#279) Add support for STORE_GLOBAL (#286) Summary: 1. Create a symbolic_global table to store a global variable name to an unique object mapping, and the unique object is further used as a key to index into the store_attr_mutations table in SideEffects. 2. The actual STORE_GLOGAL action is buffered by SideEffects and later LOAD_GLOBAL just reads from SideEffects when appropriate. STORE_GLOBAL is eventually applied after the generated graph. Skip inductor CPU tests if there is no working c++ compiler (#283) Collect guard failures into one warning at cache limit hit (#281) - avoid warning on each guard failure separately (in cases cache limit > 1) - instead, bundle a summary of gaurd failure warnings together at the time of cache limit hit [inductor] Add support for more operators (#282) [inductor] Improved indexing simplification and loop body representation (#289) All base class methods take precedence if it's a nn.Module (#290) Fx2trt pr4 (#294) * temp changes * add pass rewrite for setitem * linter * temp checkin * squeeze for normalization * code clean * comments improvement * comment out int64->int32 * linter Fx2trt pr5 (#296) * temp changes * add pass rewrite for setitem * linter * temp checkin * squeeze for normalization * code clean * comments improvement * comment out int64->int32 * linter * add a threshold for fall back to non-TRT Support guarding inf constants (#300) Pytorch tests 5/n - Graph break on MemberDescriptor type (#301) * Graph break on MemberDescriptor type * CI update_locals_and_stack should use shared cache (#302) * update_locals_and_stack should use shared cache * update SideEffects.apply to use default cache Implement verbose tensor guards check (#287) Verbose guard checks are guards used outside of the hot path for providing specific failure information to the user on compile cache miss. This PR adds support for verbose guards and implements one for the tensor guard, leaving other guards alone. * Add tensor names to tensor guard failure message Low precision support (#304) * add low precision support to torchinductor triton backend * remove temporary tests * lint * lint * lint Run clang-format on torchdynamo/_guards.cpp (#306) Fix slowdown due to generation_tagging_new (#305) * Use patched init to track dynamic modules + test gen tagging Elaborate on error message for failing tensor type match (#307) ConstDictVariable reconstruct should keep original order (#308) [inductor] Minor fixes for latest PyTorch and benchmark harness codegen (#309) Verbose guard check Bugfix (#311) * Bugfix and clang format * It wasn't will, it was me - Clang formatter PyTorch tests - 6/n - Add type check for list/tuple elems in CONSTANT_MATCH (#303) * Add type check for list/tuple elems in CONSTANT_MATCH * recursive length guarding * All decomp tests pass * Filter out only useful guards Add torchdynamo.allow_in_graph and torchdynamo.disallow_in_graph (#295) Fix CI (#316) * DONT MERGE - Checking CI * fix * fix Fix repro test to unblock internal sync (#315) Add constant checks for list of numpy integers (#313) Skip inductor tests if sympy is missing (#320) [Easy] fix reference to removed variable in debug trace (#323) Add a basic compilation profiler (#312) * Add a basic compilation profiler * Include graph break reasons in compilation report * lint and import issues * Add --recompile_profiler option to torchbench.py Filter out unimportant modules from allowed modules (#324) * Filter out unimportant modules from allowed modules * Remove typo * Further cleanup * CI testing * Michael's comment * Remove few more meaningless things * Jason's comments Prevent guard creation from accessing objects between __new__ and __init__ (#322) Initial implementation of UnspecializedPrimitiveVariable (#321) * Initial implementation of UnspecializedPrimitiveVariable * Update heuristic * Add test for no recompilations for different values [inductor] Initial suppport for tiling output code (#317) Fix test skip when sympy is missing (#333) remove unconditional sympy import from test_torchdinductor.py (#334) * remove unconditional sympy import from test_torchdinductor.py conv in triton (#310) * general conv and conv1x1 implementation in triton * correctness check with torch baseline * benchmarking on resnet50 layers * enable `triton_ops.conv` to replace `aten.convolution` by setting config.triton.use_conv as True [inductor] Minor float16 fix (#338) update IPEX backend (#344) Add _refs and _prims to the allowlist This won't get exercised by real models but it's necessary so we can test that PrimTorch decomps work under dynamo. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 874bbbd8d80d07fcd089a0d57849f21b2b81756d Pull Request resolved: pytorch/torchdynamo#345 [inductor] Support additional operators (#339) [inductor] Benchmark harness for training (#337) codegen to update mutated variables with side effect should after stack value's codegen (#347) Add a master-only test that at least one PrimTorch ref can be traced nopython Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 221ff4b897648f99a4552dd955b3426a2d6cb738 Pull Request resolved: pytorch/torchdynamo#355 Convert dict to list bebefor iterate in case there is possible delete (#360) Add TROUBLESHOOTING.md (#357) - link TROUBLESHOOTING.md from README.md and from recompilation warning Addresses issue https://github.com/pytorch/torchdynamo/issues/348 Add coldstart/breakeven benchmark (#352) try with: python torchbench.py --cold-start -d cuda --training --use-eval-mode --nvfuser --isolate This actually adds 2 new benchmark metrics: coldstart: measures the worst of t_eager_compile / t_dynamo_compile as a 'speedup', where dynamo compiles twice to exercise profiling executor breakeven: predicts the number of iterations dynamo would have to run to 'break even' with eager, considering the amortization of its compile cost Not yet tested with inference or cpu, may have some other issues. Should probably adjust to repeat the whole cold-start process several times and median, but for now just does this once. Raise unimplemeted if checkpoint is empty (#351) Only wrap in TorchVariable if is allowed, not if not disallowed If you don't do this, allowed_functions_module_string_ignorelist doesn't actually affect if we try to trace these functions into the graph, since the disallowed list doesn't actually respect this config. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: fad216f8a3013caaa2814cc3efaf5348e6033ee5 Pull Request resolved: pytorch/torchdynamo#356 List append should return None (#365) Make TorchInductor use Triton's MM implementation in codegen (#325) * Rebased on upstream * Fixed make lint * Fixed make lint add yaml to requirements.txt (#367) Disabling TorchDynamo inside torch.jit and torch.onnx compiler (#361) * Disabling TorchDynamo inside torch.jit compiler * Adding trace_module * Remove the script * Also adding ONNX * Jason's comments Workaround triton float64 log issue (#379) Fix missing ir.Reduction.default_value (#378) autotune conv kernels (#364) * tuned_conv to choose the best kernel for given inputs shape, stride, layer params * set config.triton.convolution as "aten"(default), "triton" or "autotune" [inductor] Refactor fallback kernel handling (#381) Add torch._decomps to the list to trace into Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: aa557ba00ea2fc03946089d56592041a5df31275 Pull Request resolved: pytorch/torchdynamo#369 Print value of TorchVariable object I find this is helpful for debugging what exactly a given TorchVariable is; presently there is no information so it is hard to tell. Because these are PyTorch variables they should be well behaved and it shouldn't cause problems to call repr on them. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: a47f459ce2dd577b0133b9e59af0609e4123f60b Pull Request resolved: pytorch/torchdynamo#370 Skipping the namedtuple subclass constructor (#382) Adding missing guard for GET_ITER Bytecode (#386) fx2trt_oss is merged to https://github.com/pytorch/TensorRT (#385) * temp changes * fx2trt_oss is merged to https://github.com/pytorch/TensorRT * linter fix Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) (#380) * Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) * Reformatted Fix for `any` decomposition (#384) * fix lowering * restore abs * black Extend new to new_empty for tensor.shape (#387) Support for lazy modules (#391) * Support for lazy modules with test Added fixes to support lazy modules in torchdynamo. The main issue that needed to be addressed is that LazyModules register some hooks which are run when a module is called. Torchdynamo typically calls the forward method instead of __call__ so these hooks were never run. In the case of LazyModules we now run and trace the __call__ method, and allow the original module to be mutated. In the future, we could do this for all modules, but there were cases where torchdynamo does not yet support functionality used in all hooks. Enable tf32 in torchbench.py (#397) Support TensorType checking (#395) * Support TensorType checking * Update torchdynamo/variables/builtin.py Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Jason Ansel <jansel@jansel.net> [inductor] Canonicalize indexes for MemoryDep (#314) Summary: By canoicalizing indexes, we can build more accurate and flexible read-write dependency, which will allow more general kernel fusion. Suppress warnings during the pre-checks on a frame (#396) * Suppress warnings during the pre-checks on a frame * Encapsulate in PatchTorch [inductor] Do convolution bias in its own kernel (#403) [inductor] Fix convolution output stride order (#400) Added more shapes (from alexnet, BERT, hf_GPT2) to the inductor mm test (#398) Reducing overhead of has_tensor_in_frame (#406) [inductor] add manual_seed in test_round (#388) [inductor] Improve heuristic for realizing buffers (#402) [inductor] Fix typo preventing some fusions (#401) [inductor] Fix for constants on wrong device (#405) remove deepcopy in fx2trt (#407) * temp changes * remove deepcopy Fix broken link in README (#408) Adding nan (#383) * Adding nan * Jason's comments Fixing import issues for PyTorch 1.12 rc branch (#411) Raising tolerance after using tf32 (#415) [inductor] Improve handling of reductions (#404) Shape guard and isinstance fixes (#414) * Shape guard and isinstance fixes and tests * Add guard for any accessed Tensor attribute Ensure dtype instances are not mapped to TorchVariable() (#394) * Disallow dtype instances * Extract dtypes from torch automatically and add test Propagate guards from ConstDict variables (#421) Config driven support for torch.Tensor .item() (#417) [inductor] Refactor helpers into torchinductor.utils (#418) Removing generation field from the patched nn.Module (#423) * Removing generation field from the patched nn.Module * Lint * Rebseing [Inductor] Triton template for conv (#422) * TritonTemplateKernel, template_codegen, conv jinja2 template * pip install Jinja2 in setup_nightly Remove unnecessary call to clone which also caused a segfault (??) (#427) Disable TorchDynamo on frames created by fx symbolic tracing (#429) Fix for disabling triton templates (#430) Minor test fixes (#432) [inductor] Remove dead stores after fusion (#409) Summary: Use DeferredLine/DeferredIndentedBuffer to perform a lazy emit of buffer allocation/store after we determine which buffers are redundant. Add support for iterating over a dict (#436) Check if kwargs has key "fp16_mode" when determining the precision (#437) [inductor] Add heuristic to set num_warps (#433) update accuracy check for TRT fp32 (#438) * temp changes * fix an issue in fp32, change accuracy check to cosine similarity for fp32 since TRT fp32 could not meet 1e-4 Pin CI to June 20th Torch Nightly (#441) * Pin CI to June 20th Torch Nightly * Respond to commetns * duplicated set up.. [inductor] Add some prims (#431) Disabling the trace instead of symbolic trace (#443) Break graph on torch.Storage types (#428) * Break graph on torch.Storage types * Hmm, CI failing, trying Jason's suggestion * Debug CI Fix guard propagation for tuple iterators (#448) [inductor] Improve fusing of tiled + untiled (#446) Directly compute sum of a list of floats/ints (#449) * Directly compute sum of a list of floats/ints * Test Added option use_bmm to enable triton codegen for bmm (#393) * Added option use_bmm to enable triton codegen for bmm * Added more shapes to microbench Add support for crossentropy (#450) Rewrite symbolic_locals for torch.return_types (#442) * Rewrite symbolic_locals for torch.return_types * Special casing on the out kwargs * Replace == with is Remove traced op overloads before compiling (#455) * Remove op overloads * lint Raise errors when backends throw exceptions (#451) Huggingface model benchmarking (#459) [inductor] Support scatter operations (#434) Step 2 of supporting UnspecializedNumpyVariable & UnspecializedPythonVariable (#392) * Implement UnspecializedPrimitiveVariable codegen * Make UnspecilizedPrimitiveVariable as GraphArg * Update make_call_generated_code * Update min/max builtin func * Support random.random * Remove unnecessary change * Fix lint * Refactor to support multiple random.random * Refactor out unspecialized numpy and python variables * Fix RandomValueSource guard * Support multiple random functions * Rebase to updated main * Refactor out random_values_var * Fix lint * Fix lint * Move random_values_var to output graph * Add need_unwrap to distinguish unspec from x.item() * Make global rand func unique * Fix lint * Add raw value propagation for unspec variables * Fix lint * Directly load type(raw_value) & update random func example value * Fix lint Add Fake Tensor Propagation (#426) * Add Fake Tensor Propagation * extend test * lint * fix import * one more day.. * update functorch commit * bump one more day to get pytorch/pytorch#79741 * Skip test * use FakeTensorError * lint * Guard on fake tensor availability * test skips (fix en route in core) * lint * update nightly * lint * update * format * update recent [inductor] Cherry pick nll_loss_forward decomp (#456) [inductor] Register lowerings for operator overloads (#457) Add support for torch.finfo/torch.iinfo (#470) [inductor] Auto-download gcc12 from conda-forge (#471) Rename benchmarking files (#472) Add support for named_params and named_modules (#465) [inductor] Add a metric to count the number of generated kernels (#476) Summary: This can be used to prevent a regression on our fusion result. Extend python_key_normalize with support for PythonTensor class override and a post trace hook (#424) * Add support for custom class * lint * Fix unpack to reflect main * Feedback * Simplify, rebase * Lint, format Flag to skip printing of Dynamo internal exceptions (#480) Don't leak cache entry on skip Fixes pytorch/torchdynamo#477 Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 5b73c3470173b894d188c6e9290ebc3cbbef2bab Pull Request resolved: pytorch/torchdynamo#478 Reduction cache (#487) * enable cse for reductions * self.cse use fake tensors when deep copying model to check mutation (#486) * use fake tensors when deepcopying model to check mutation * fix fake tensors not available * add tests Added cos lowering (#492) fix microbenchmarks import path (#474) Compute multilayer reductions in higher precision (#484) * Compute multilayer reductions in higher precision * Compute whole mean kernel in higher precision, downcast in end * update test * lint * skip test Fix correctness checking code and non-deterministic benchmarks (#493) Improve recompilation warning (#494) - default to printing only the most recent guard failure not one failure for each cache miss - reformat the text to be (hopefully) more readable and useful Motivation: While in some cases, knowing the individual failure reasons for each of (say, 64) cache misses could be useful, i practice it is probably good enough to know the most recent one since they tend to be similar reasons (such as incrementing counters or new object ids triggering the same type of guard). Previously: torchdynamo hit recompilation cache limit (64) for function 'toy_example' (example.py:5), due to the following guard failures: [['___guarded_code.valid'], {... 62 more times...}, ['___guarded_code.valid']]to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md Now: torchdynamo hit config.cache_size_limit (64) function: 'toy_example' (example.py:5) reasons: ['___guarded_code.valid'] Add a new backend option for TVM's meta_schedule (#479) Add (experimental) support for exporting a graph and guards (#469) Makefile/packaging updates (#499) [inductor] Misc small improvements (#475) Adding logging config (#504) More Huggingface models (#500) * More Huggingface models (from simple_dl) * Comments Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326) Revert "Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)" (#509) This reverts commit c747580. Disable torchdynamo inside dispatch_trace (#508) [inductor] Support rand()/dropout() (#505) [inductor] Workaround triton bug with XBLOCK=1 (#510) Reduction deps (#502) * mark more reduction dependencies * cleanup * black * make sure canonicalization prefix is the same everywhere change tuning trial of meta schedule (#513) [inductor] Fix bug with invalidated reuse (#506) [inductor] Handle no-op slice_scatter (#507) [WIP] Adding AMP support in benchmark infra (#464) [WIP][Discussion] Write out a deeper documentation on how we go from … (#498) * [WIP][Discussion] Write out a deeper documentation on how we go from user code to producing guards * Update GuardsOverviewPt1.md * Update GuardsOverviewPt1.md
shiyu22
pushed a commit
to towhee-io/towhee-compiler
that referenced
this pull request
Sep 9, 2022
. . . . . . . . . . . . . . . . . . . . . Add README.md Improve counters and stats Constant control flow Support some function calls Support calling submodules and methods Add profiler to measure coverage Measure overheads with TorchBench Refactor tests Support for unpacking, inplace, and matmul op Rewrite how guards work Cleanup and refactoring Linting, formatting, and documentation Fix crashes and add torchdynamo.reset() Disable list arg unpacking Support control flow with graph prefix Minor refactoring and naming Improve support for partial graphs Increase coverage of comparisons, constants, modules Fix for handling of iterators Extract multiple graphs from control flow Refactor binary ops Handle more type of jump instructions Support wrapping `Real` types (#1) Allow using nn.Modules inside a list (#2) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Fix key error (#4) Fix broken tests Add support for EXTENDED_ARG TorchBench and debugging improvements Add support for staticmethod Improve handling of unsupported variables Allow using Tensors inside a list/tuple (#3) * Allow using nn.Modules inside a list * Allow using Tensors inside a list/tuple Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Implement `MAKE_FUNCTION` (#5) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Stricter typing and support nn.Sequential Support tuple returns (#6) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Support global loads of bools (#7) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Support `len` (#8) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Fix bug in LOAD_GLOBAL Support float constructor (#10) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Implement `isinstance` (#9) Co-authored-by: Ansley Adelaide Ussery <ansley@fb.com> Revamp resuming after unsupported things Livevars and constant folding Optional dtype/device/rank/shape specialization Specialization and mutation guards Improve support for config objects Add support for IMPORT_NAME Early work on optimizations and autotuning Support for sizes at input args and dicts Minor refactoring Support for closures Support ModuleList and strings Support nested functions with closures Fix build for gcc-7 Minor fixes for avx512 machines Clean up stack size handling Rewrite how graph resume works Support for super() Support zip and enumerate Refactor variable construction into seperate file Clean up handling of variable sources Dictionary support and improve arg handling Add IPEX backend Improve handling of lists/dicts Fix python 3.7 issues Support module constants Support basic list mutation Support list comprehensions Support for inline nn.Softmax() Refactor and cleanup call_function Support inlining generators Dynamic .shape/.ndim support Skip transformers.file_utils Support property/classmethod Support dict.__setitem__ Break graph on STORE_ATTR Garbage collect generated code Add trt backends Improve cuda backends Support latest pytorch/torchenchmark Fixes in TRT baseline script Fixes for GPU measurement Refactor optimization backends and tuning Retry autotuning failures Fix static runtime backend Fix lints Cleanup list packing/unpacking Fixes for new torchbench version Work around crashes in static runtime Fix vision_maskrcnn/detectron2_maskrcnn Switch to alternate version of onnx2trt Skip pyhpc_turbulent_kinetic_energy Add backends to skipfiles Fix weakref handling Specialize on torch.is_tensor and torch.is_floating_point Analysis and functionalization passes Add dynamic dtype/device/shape propogation Support namedtuple and dtype constants Fix as_tensor issue Improve support for range() Config flag to control normalization Fix __len__ issue Support list.pop() Allow inlining methods on UnsupportedVariable Update README.md (#13) [Backend][TVM] Support boolean as output (#14) Remove extra call to torch.jit.fuser Improve support for list/dict/len/str Support list.extend and dict.update Support for no_grad/enable_grad Improve coverage of huggingface models Refactor stack_op implementation Fix chunk method in longformer Fix bug with calls between nested functions Support hasattr(namedtuple, ...) Support for len(inspect.signature(fn).parameters) Support for autograd.Function Allow adding a BaseListVariable and a ConstantVariable together if the latter is an iterable (#15) Refactor torchbench.py to use subprocess isolation Switch project to use isort import format Rewrite README.md Improve docs in ./torchbench.py --help Allow changing torchdynamo.config.dynamic_shapes without recompile Refactor offline autotuner Bugfix for bias towards eager Add online version of autotuner Fix some backend exceptions Fix for bool inputs Work around TensorRT abort() on group_norm Improve error printing when backend fails Fix bug with int64 in onnxrt Add isolation to baseline runs Fix some errors in copy_ for backend testing Skip TRT for einsum models Split out fixed_strategy1/fixed_strategy2 Disable TRT bypassing Support direct calls to module.forward Support for BUILD_TUPLE_UNPACK_WITH_CALL Support map/reduce/sum of tensors Split test_functions into two files Fix reconstructing nested attrs Improve support for HuggingFace ModelOutput() wrapper Improve support for zip and __contains__ Fix issue with list multiply Adding functorch to skipfile (#16) Adding AOT Autograd API for inference (#17) Add note about skipfiles Fixes for latest torchbenchmark version Support list mutation side effects Refactor codegen related things into codegen.py Refactor graph generation related things into output_graph.py Support inlining super() calls of nn.Module subclasses Support some simple cases of try/except Refactor variable_tracker.py into many files Support builtins module (#18) Adding training optimizations (#19) Allow constant folding through set() Support for property/__getattr__ on user defined classes Support more cases of varargs calls Add nopython=True whole-program graph mode Improve support for dataclasses Refactor BuiltinVariable call_function handling Add support for tuple iadd Add support for UNPACK_EX bytecode Improve handling for tuple constants Support BUILD_LIST_UNPACK with tensor args Support numpy integer constants Support module.__class__.__name__ Add support for dict mutation side effects Fix composability with FX generated code Fix off by one bug in profile operator counts Avoid over-specializing on dynamically created nn.Modules Helper functions for AOT Autograd testing (#20) Skip networkx for AOT (#21) Run `make format` Move non-specialized nn.Module handling to UnspecializedNNModuleVariable Reuse generated code when control flow paths converge Support resume while inside 'with no_grad()' Support tuple_iterator Support for getitem with default value Support type(obj) calls Fix bug in __getattr__ handling add ltc backend (#23) OSS Automated Fix: Addition of Code of Conduct (#25) OSS Automated Fix: Addition of Contributing (#24) Minor refactor in side_effects.py `make format` and lint issues Add lint workflow Add LICENSE Update CONTRIBUTING.md Add test workflow Update github workflows Skipping logging module (#26) Initial support for setattr side effects Improve coverage statistics measurement Support object creation side effects Improve support for torch.distributions Allow graph breaks on unsupported getitem Remove nn.Sequential from skipfiles Add CITATION.cff Improve support for dynamic mutation of nn.Modules Bugfix for mutating mutated attributes Adding Torchbench training support (#27) Deduplicate FX graph outputs Support list.clear() Adding missed random state reset (#28) Fix issue with maskrcnn Remove unneeded guards Support HF ModelOutput() wrapper class Avoid compiling the output of user_compiler Fix ./torchbench.py --nothing AOT Autograd fixes for moco, resnet50_qat, pytorch_struct (#29) Save some memory while profiling Fix AOT autograd bug where dynamo tries to compile generated backwards Fix guards for 'mod.0.bias' attributes Convert tests to use public API Workaround for AOT Autograd LSTM bug (#31) Fix handling of torch.manual_seed Add `with torchdynamo.disable()` context manager Support itertools.{chain,islice} Support Tensor.is_quantized Support multiple threads using TorchDynamo Improve support and testing of dynamic shapes Workaround for issue in hf_Bart in dynamic shape mode Don't directly import from _eval_frame Fix aliasing issue in #30 Support staticmethod/classmethod on user defined classes Improve support for closures and dunder methods Fix bug in handling of type annotations Support 3+ nestings of closures Fix threading issue for autograd threads Add pthon autograd test case IMPORT_NAME Instruction - Import the top-level package (#47) hf_T5 dataclass fields handling (#50) Using mean instead of sum to have reasonable loss value for backprop (#51) Allow passing a string with a backend name to torchdynamo.optimize Support nesting torchdynamo.optimize() decorated functions Support __init__ of HF ModelOutput (#65) Fill in missing fields in setup.py Fix pip install issue Add python key tracing backend Fix build on M1 Max Mac (#63) Fix lint Fx2trt integration improvement (#71) Refactor python_key_normalize Handle mutation propagated by getitem (#83) Torchbench changes for AOT Autograd (#84) Fix string-based backend mode Fix linter Fix bug when torch and torchdynamo are in the same folder Fx2trt pr2 (#97) Fix lint github action Add update 6 link to readme (#99) Fix torchbench.py --nothing option (#100) Skip tacotron2 and unskip vision_maskrcnn in torchbench.py (#102) Add developer setup section to README.md (#105) Fx2trt pr3 (#110) Use Low overhead version AOT Module (#113) Cleanup AOTAutograd related args (#114) Add extra tolerance for some GPU models in Torchbench (#116) Enabling few more torchbench models with AOT Autograd (#127) Support formatted literal strings (f-strings) (#128) * Support formatted literal strings (f-strings) * TensorVariable var_getattr supports __class__ and add test case * Address review comments * Fix lint error add an option to randomize inputs (#130) Fix typo in IPEX backend (#126) Handle torch size (#139) Remove `gc.collect()` upon every model compute run. (#140) Make STORE_SUBSCR break when unsupported. (#142) * Make STORE_SUBSCR break when unsupported. I probably could have done a bit more but this is enough to fix the issue and I'll let someone more intrepid get this going comprehensively. I'm also not sure how to test this. Fixes #131 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Fix Python 3.7 compatibility issues & Add testing action workflow (#143) Handle torch.cuda.current_device (#146) Handle torch.seed (#141) Handle torch.override.is_tensor_like (#144) * Handle torch.override.is_tensor_like * Making it a constant * Comments * Conflict Preserve CUDA rng states during frame analysis (#147) * Preserve CUDA rng states during frame analysis * Retrigger CI * Retrigger CI * Debugging * Debugging * Debugging AOT Autograd Training - few more models passing (#151) * AOT Autograd Training - few more models passing * Add skip API * Skip inlining Support Slice of NNModuleList (#152) * Support Slice of NNModuleList * Comments Fix import overhead by using `importlib.util.find_spec` (#153) Fix issues in #132 (#150) Variable builder - handle slice (#155) More skipfiles (#157) * Add more skip modules Signed-off-by: Edward Z. Yang <ezyang@fb.com> moco and hf_Reformer fixes (#158) * hf_Reformer fixed * Adding moco fix Revert "Fix import overhead by using `importlib.util.find_spec` (#153)" (#160) This reverts commit dedd9fa. Add support for Python 3.9 (#154) * Changes to make TorchDynamo support Python 3.9 * Fix lint * Add Python 3.9 github test workflow * Fix typo * RERAISE to TERMINAL_OPCODES set * Make IS_OP support ConstDictVariable * Address comments Monkey patch autograd.Variable (fixes Tacotron2) (#161) Break graph on Tensor grad (#163) * Break graph on Tensor grad * Comments * Suppress warning enable support for staticmethod on superclass (#162) Summary: Adds support for tracing through this syntax: ``` class Parent(torch.nn.Module): @classmethod def foo(cls, x): x = x + x return x class Child(Parent): @classmethod def helper(cls, x): // resolving super().foo failed before this PR x = super().foo(x) return x def forward(self, x): x = self.helper(x) return x ``` This is useful for eventually enabling __torch_function__ support. Test plan: ``` pytest -vsk test_super_static_method ``` Options to print fx/aot traces (#164) enable basic __torch_function__ support (#167) Summary: This adds a skeleton for `__torch_function__` support in torchdynamo. What this is currently doing: 1. in variable builder, check for __torch_function__ and wrap tensors in TensorTFOverrideVariable if found 2. in TorchVariable.call_function, inline the __torch_function__ function of TensorTFOverrideVariable arguments 3. in GetAttrVariable.call_function, check for super().__torch_function__ which resolves to the original. If it's found, stop inlining and insert the function call into the graph. The current test just creates a __torch_function__ override which doesn't do anything but call super().__torch_function__. Things left for future PRs: * supporting call_method * supporing actual logic inside the overrides * implementing the full __torch_function__ spec (currently things are hardcoded to first argument only) Test plan: ``` pytest -vsk test_simple_torch_function // used to fail with https://www.internalfb.com/phabricator/paste/view/P496842415 // currently passes ``` Revert the monkey patching of variable, fixed in PyTorch (#173) Miscellaneous small fixes and lints (#168) Fixes for pytorch tests - 1/n (#174) * Fixes for pytorch tests - 1/n * Better comment modify 'ipex' backend (#166) Add torchdynamo.config.raise_on_backend_error (#177) Fix for module returning (Tensor, None) (#176) Fixes for pytorch tests 2/n - torch.Size and nn.Parameter (#182) * Torch testing - Fix bugs for torch.Size and nn.Parameter * CI failures extend __torch_function__ support to `call_method` (#181) Summary: In pytorch/torchdynamo#167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ``` Add supports for Python 3.10 (#172) * Add supports for Python 3.10 * fix lint * Add github test workflow for python 3.10 * Add lnotab and linetable writer test case * Fix lint * Fix lint * Fix lint * Split the unit test * Remove some requirements and fix lint * Add several new bytecodes in Python 3.10 * add Cython as requirements * set numpy version * Update README.md Fix Pytorch tests 3/n - Skip exec frame (#184) * Fix Pytorch tests 3/n - Skip exec frame * CI failure [PR] pass rewrite for supporting setitem (#188) * temp changes * add pass rewrite for setitem * linter Add --float16/--float32/--cosine options to torchbench.py (#189) Don't import unused third party packages (#193) Early version of TorchInductor (#190) Fix for `KeyError: 'Size'` error (#194) Add xarray to skipfiles (#196) Update URLs to github.com/pytorch/torchdynamo (#199) Add `./torchbench.py --fast` option (#198) Pass backend-related ctx to TorchDynamo Optimize Context (#201) * Pass backend-related ctx to TorchDynamo Optimize Context * Reinit the backend ctx for every frame * Doc Detect x.new(torch.Size) and rewrite to torch.empty(tuple) (#195) * Detect x.new(torch.Size) and rewrite to torch.empty(tuple) * address comments Skip frames when no graph is found (#205) support inlining __torch_function__ with reading from closure (#197) Summary: The previous PRs to add `__torch_function__` support inlined through `__torch_function__` without adding any guards for the function. This worked for simple cases, but did not work if `__torch_function__` needs to read a nonlocal variable, for example: https://gist.github.com/vkuzo/a3388fcaa532318d049368e96652b366 The reason it was broken is because the code which bound arguments during inlining had to have a reference to a source in order to bind things properly. One way to fix this is to get the source of the `__torch_function__` attribute of the original tensor, guard on it, and persist it through all the rewrapping logic. I'm flexible if there is a better alternative, lmk. Test plan: ``` pytest -vsk test_torch_function_with_closure ``` [inductor] add lowerings for hardswish/hardsigmoid/hardtanh (#200) [inductor] Handle +/-inf constants (#210) Replace tensor.new with tensor.new_empty (#212) add configuration for modules eligible for inlining (#208) Summary: Makes the source modules for `skipfiles.is_torch_inline_allowed` configurable. This is needed for DBR quant integration exploration, we can now override this config to allow torchdynamo to inline DBR quant utility functions. Test plan: Run this: https://gist.github.com/vkuzo/010e0483c9bbb35837cc9cb27c555243 it now advances past the error of "inlining in skipfiles" [inductor] fix transposed convolution shape formula (#202) Try finally block for with context on graph break instruction (#213) * Try finally block for with context on graph break instruction * fix test * Support >= 3.9 * Support python 3.7 * Comments * Replacing the global load with GlobalSource and reconstruct Remove nn.Parameter filter hack for AOTAutograd backend (#214) Fix test failures (#218) Fix slicing list returning wrong result (#222) * Correct ListVariable source * Fix lint Remove reference cycle (#223) Refactor ConstDictVariable to support user_cls and use dict by default (#226) Pin inductor CI to specific pytorch version (#229) [inductor] support torch.linspace and torch.tensor (#217) [inductor] add heuristic to decide layouts and loop orders (#216) Revert "Fix slicing list returning wrong result (#222)" (#231) This reverts commit 243222e. Add equal_nan option to torchdynamo.testing.same() (#232) Support device constants (#230) Bail out for __setattr__ and fix ClassVariable handling (#227) - avoid compiling __setattr__ functions as they may be difficult to correctly handle for arbitrary custom classes, but also aren't likely to be useful for torch module optimization - expand the condition for constructing UserDefinedClassVariable to include ABCMeta classes via `inspect.isclass` check Remove reference cycle - with exceptions (#228) * Remove reference cycle - with exceptions * Fix for InliningInstructionTranslator Add fix for writing to closures (#233) * Add fix for writing to closures * run black * one more time Co-authored-by: Elias Ellison <eellison@devfair044.h1.fair> Delete example value for unused args (#234) Fix list slice & ConstantVariable to TupleVariable conversion missing source info (#235) * Fix list slice & ConstantVariable to TupleVariable conversion miss source info. * Update test cases * Address comment [inductor] early memory reuse and new operators (#237) Add type checking in Constant match - Fix Pytorch tests 4/n (#238) * Add type checking in Constant match * Fix test enable tracing through enum comparison (#245) add support for tracing torch.nn.ModuleDict.__contains__ (#246) Summary: Adds support for tracing through this syntax: ``` class M(torch.nn.Module): def __init__(self, module_dict): super().__init__() self.module_dict = module_dict def forward(self, x): if "foo" in self.module_dict: x = torch.mul(x, 1.0) x = torch.add(x, 1.0) return x ``` This is useful for DBR quantization. Test plan: ``` pytest -vsk test_nn_moduledict_contains ``` enumerate supports start argument (#240) * enumerate supports start argument * address comments Make eval_frame thread safe (#239) This should make eval_frame thread safe. Currently, the eval_frame is a global object, and different threads my step on each other setting a different one. This changes the behavior to instead always* have a "shim" eval_frame which then routes to the correct behavior by looking at the thread-local associated object. This is thread safe because now the callback object is always thread safe, and we only use it to drive logic at frame eval time, as opposed to at callback registration time. Currently, the logic for None/False/Callback is kept, but the False case could be easily collapsed behind the shim in a subsequent diff. *Always here means always when dynamo is running. The shim is installed and removed based on keeping track of how many dynamo threads are running at the moment. Add support for __subclasses__ (#242) Fixes #241 Inline function jump on tensor condition should be unimplemented (#249) cast model in no-isolate mode (#244) support tracing __getitem__ of torch.nn.ModuleDict (#253) Summary: Supports tracing through ``` class ModuleDict(torch.nn.Module): def __init__(self): super().__init__() self.layers = torch.nn.ModuleDict( { "0": torch.nn.Linear(10, 10), } ) def forward(self, x): x = self.layers["0"](x) return x ``` This is useful for DBR quant. Note: handling other logic for `ModuleDict` is left for future PRs. Test plan: ``` pytest -vsk test_moduledict ``` Run torch inductor test on GPU machine (Part 1) (#258) * Run torch inductor test on GPU machine * Land scale-config first Implement verify_correctness #179 (#252) * Wrapperbackend to enable verifying corretness of backends; set config.verify_correctness as True to enable it. * move testing.same() to utils.py Skip inductor tests on older pytorch versions (#257) [inductor] Multi-devices, primtorch decomps, and many new ops (#243) enable tracing through id(nn_module_variable) (#262) Summary: Enables tracing through this syntax: ``` class M(torch.nn.Module): def forward(self, x, ref_id): self_id = id(self) if self_id == ref_id: x = torch.mul(x, 1.0) x = torch.add(x, 1.0) return x ``` This is useful for DBR quant because it uses `id(module)` for some FQN gymnastics. Test plan: ``` pytest -vsk test_id_of_nn_module ``` enable tracing through frozenset contains of PyTorch ops (#251) Summary: Enables tracing through this syntax: ``` funcs = frozenset([torch.add]) def fn(x, func): if func in funcs: x = torch.add(x, 1.0) x = torch.mul(x, 1.0) return x ``` This is useful for DBR quantization. Test plan: ``` pytest -vsk test_frozenset_torch_func_contains ``` Dump conv args into file (#261) * dump convolution args into file * add option --log-conv-args in torchbench.py Fix generation tagging new (#263) Simplify eval frame, merge _run_only (#264) Allow layout=torch.strided in new_constant (#269) Decomposition for nan_to_num (#268) [inductor] Handle non-reduction reductions (#266) Use unittest.mock.patch for test_verify_correctness (#265) [inductor] Support sort/as_tensor/LongTensor (#267) Fix inline list/dict mutation (#273) * Fix inline list/dict mutation * Fix lint * Refact inline translator's replace_all * Fix recursive inline replace * Remove debug print Support split_with_sizes (#272) Light Refactor + Add support for torch.autograd.profiler.record_profile function (#274) This diff takes GradModeVariable's logic and pulls it partially into a more generic ContextWrappingVariable base class intended for making it easier to write context managed code. [inductor] Support input and slice mutation (#275) Add recompile ux tests (#270) Just a first step, this PR adds a few tests that starts to outline a proposed UX, and proposes mechanisms for setting/checking the #recompiles and cache limit to facilitate the testing Skip non tensor frame (#248) * Skip non tensor * Skip non tensor frame * Lint * Jason comments * Add decorator functionality * Comments Prioritize class method if there is duplicated attribute name (#278) * Prioritize class method if there is duplicated attribute name * Refactor var_getattr to make it consistent with native pytorch [inductor] Improve merging of contiguous loops (#279) Add support for STORE_GLOBAL (#286) Summary: 1. Create a symbolic_global table to store a global variable name to an unique object mapping, and the unique object is further used as a key to index into the store_attr_mutations table in SideEffects. 2. The actual STORE_GLOGAL action is buffered by SideEffects and later LOAD_GLOBAL just reads from SideEffects when appropriate. STORE_GLOBAL is eventually applied after the generated graph. Skip inductor CPU tests if there is no working c++ compiler (#283) Collect guard failures into one warning at cache limit hit (#281) - avoid warning on each guard failure separately (in cases cache limit > 1) - instead, bundle a summary of gaurd failure warnings together at the time of cache limit hit [inductor] Add support for more operators (#282) [inductor] Improved indexing simplification and loop body representation (#289) All base class methods take precedence if it's a nn.Module (#290) Fx2trt pr4 (#294) * temp changes * add pass rewrite for setitem * linter * temp checkin * squeeze for normalization * code clean * comments improvement * comment out int64->int32 * linter Fx2trt pr5 (#296) * temp changes * add pass rewrite for setitem * linter * temp checkin * squeeze for normalization * code clean * comments improvement * comment out int64->int32 * linter * add a threshold for fall back to non-TRT Support guarding inf constants (#300) Pytorch tests 5/n - Graph break on MemberDescriptor type (#301) * Graph break on MemberDescriptor type * CI update_locals_and_stack should use shared cache (#302) * update_locals_and_stack should use shared cache * update SideEffects.apply to use default cache Implement verbose tensor guards check (#287) Verbose guard checks are guards used outside of the hot path for providing specific failure information to the user on compile cache miss. This PR adds support for verbose guards and implements one for the tensor guard, leaving other guards alone. * Add tensor names to tensor guard failure message Low precision support (#304) * add low precision support to torchinductor triton backend * remove temporary tests * lint * lint * lint Run clang-format on torchdynamo/_guards.cpp (#306) Fix slowdown due to generation_tagging_new (#305) * Use patched init to track dynamic modules + test gen tagging Elaborate on error message for failing tensor type match (#307) ConstDictVariable reconstruct should keep original order (#308) [inductor] Minor fixes for latest PyTorch and benchmark harness codegen (#309) Verbose guard check Bugfix (#311) * Bugfix and clang format * It wasn't will, it was me - Clang formatter PyTorch tests - 6/n - Add type check for list/tuple elems in CONSTANT_MATCH (#303) * Add type check for list/tuple elems in CONSTANT_MATCH * recursive length guarding * All decomp tests pass * Filter out only useful guards Add torchdynamo.allow_in_graph and torchdynamo.disallow_in_graph (#295) Fix CI (#316) * DONT MERGE - Checking CI * fix * fix Fix repro test to unblock internal sync (#315) Add constant checks for list of numpy integers (#313) Skip inductor tests if sympy is missing (#320) [Easy] fix reference to removed variable in debug trace (#323) Add a basic compilation profiler (#312) * Add a basic compilation profiler * Include graph break reasons in compilation report * lint and import issues * Add --recompile_profiler option to torchbench.py Filter out unimportant modules from allowed modules (#324) * Filter out unimportant modules from allowed modules * Remove typo * Further cleanup * CI testing * Michael's comment * Remove few more meaningless things * Jason's comments Prevent guard creation from accessing objects between __new__ and __init__ (#322) Initial implementation of UnspecializedPrimitiveVariable (#321) * Initial implementation of UnspecializedPrimitiveVariable * Update heuristic * Add test for no recompilations for different values [inductor] Initial suppport for tiling output code (#317) Fix test skip when sympy is missing (#333) remove unconditional sympy import from test_torchdinductor.py (#334) * remove unconditional sympy import from test_torchdinductor.py conv in triton (#310) * general conv and conv1x1 implementation in triton * correctness check with torch baseline * benchmarking on resnet50 layers * enable `triton_ops.conv` to replace `aten.convolution` by setting config.triton.use_conv as True [inductor] Minor float16 fix (#338) update IPEX backend (#344) Add _refs and _prims to the allowlist This won't get exercised by real models but it's necessary so we can test that PrimTorch decomps work under dynamo. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 874bbbd8d80d07fcd089a0d57849f21b2b81756d Pull Request resolved: pytorch/torchdynamo#345 [inductor] Support additional operators (#339) [inductor] Benchmark harness for training (#337) codegen to update mutated variables with side effect should after stack value's codegen (#347) Add a master-only test that at least one PrimTorch ref can be traced nopython Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 221ff4b897648f99a4552dd955b3426a2d6cb738 Pull Request resolved: pytorch/torchdynamo#355 Convert dict to list bebefor iterate in case there is possible delete (#360) Add TROUBLESHOOTING.md (#357) - link TROUBLESHOOTING.md from README.md and from recompilation warning Addresses issue https://github.com/pytorch/torchdynamo/issues/348 Add coldstart/breakeven benchmark (#352) try with: python torchbench.py --cold-start -d cuda --training --use-eval-mode --nvfuser --isolate This actually adds 2 new benchmark metrics: coldstart: measures the worst of t_eager_compile / t_dynamo_compile as a 'speedup', where dynamo compiles twice to exercise profiling executor breakeven: predicts the number of iterations dynamo would have to run to 'break even' with eager, considering the amortization of its compile cost Not yet tested with inference or cpu, may have some other issues. Should probably adjust to repeat the whole cold-start process several times and median, but for now just does this once. Raise unimplemeted if checkpoint is empty (#351) Only wrap in TorchVariable if is allowed, not if not disallowed If you don't do this, allowed_functions_module_string_ignorelist doesn't actually affect if we try to trace these functions into the graph, since the disallowed list doesn't actually respect this config. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: fad216f8a3013caaa2814cc3efaf5348e6033ee5 Pull Request resolved: pytorch/torchdynamo#356 List append should return None (#365) Make TorchInductor use Triton's MM implementation in codegen (#325) * Rebased on upstream * Fixed make lint * Fixed make lint add yaml to requirements.txt (#367) Disabling TorchDynamo inside torch.jit and torch.onnx compiler (#361) * Disabling TorchDynamo inside torch.jit compiler * Adding trace_module * Remove the script * Also adding ONNX * Jason's comments Workaround triton float64 log issue (#379) Fix missing ir.Reduction.default_value (#378) autotune conv kernels (#364) * tuned_conv to choose the best kernel for given inputs shape, stride, layer params * set config.triton.convolution as "aten"(default), "triton" or "autotune" [inductor] Refactor fallback kernel handling (#381) Add torch._decomps to the list to trace into Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: aa557ba00ea2fc03946089d56592041a5df31275 Pull Request resolved: pytorch/torchdynamo#369 Print value of TorchVariable object I find this is helpful for debugging what exactly a given TorchVariable is; presently there is no information so it is hard to tell. Because these are PyTorch variables they should be well behaved and it shouldn't cause problems to call repr on them. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: a47f459ce2dd577b0133b9e59af0609e4123f60b Pull Request resolved: pytorch/torchdynamo#370 Skipping the namedtuple subclass constructor (#382) Adding missing guard for GET_ITER Bytecode (#386) fx2trt_oss is merged to https://github.com/pytorch/TensorRT (#385) * temp changes * fx2trt_oss is merged to https://github.com/pytorch/TensorRT * linter fix Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) (#380) * Added microbenchmarks to measure performance of torch.mm vs torch.mm + relu vs inductor (triton mm) * Reformatted Fix for `any` decomposition (#384) * fix lowering * restore abs * black Extend new to new_empty for tensor.shape (#387) Support for lazy modules (#391) * Support for lazy modules with test Added fixes to support lazy modules in torchdynamo. The main issue that needed to be addressed is that LazyModules register some hooks which are run when a module is called. Torchdynamo typically calls the forward method instead of __call__ so these hooks were never run. In the case of LazyModules we now run and trace the __call__ method, and allow the original module to be mutated. In the future, we could do this for all modules, but there were cases where torchdynamo does not yet support functionality used in all hooks. Enable tf32 in torchbench.py (#397) Support TensorType checking (#395) * Support TensorType checking * Update torchdynamo/variables/builtin.py Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Jason Ansel <jansel@jansel.net> [inductor] Canonicalize indexes for MemoryDep (#314) Summary: By canoicalizing indexes, we can build more accurate and flexible read-write dependency, which will allow more general kernel fusion. Suppress warnings during the pre-checks on a frame (#396) * Suppress warnings during the pre-checks on a frame * Encapsulate in PatchTorch [inductor] Do convolution bias in its own kernel (#403) [inductor] Fix convolution output stride order (#400) Added more shapes (from alexnet, BERT, hf_GPT2) to the inductor mm test (#398) Reducing overhead of has_tensor_in_frame (#406) [inductor] add manual_seed in test_round (#388) [inductor] Improve heuristic for realizing buffers (#402) [inductor] Fix typo preventing some fusions (#401) [inductor] Fix for constants on wrong device (#405) remove deepcopy in fx2trt (#407) * temp changes * remove deepcopy Fix broken link in README (#408) Adding nan (#383) * Adding nan * Jason's comments Fixing import issues for PyTorch 1.12 rc branch (#411) Raising tolerance after using tf32 (#415) [inductor] Improve handling of reductions (#404) Shape guard and isinstance fixes (#414) * Shape guard and isinstance fixes and tests * Add guard for any accessed Tensor attribute Ensure dtype instances are not mapped to TorchVariable() (#394) * Disallow dtype instances * Extract dtypes from torch automatically and add test Propagate guards from ConstDict variables (#421) Config driven support for torch.Tensor .item() (#417) [inductor] Refactor helpers into torchinductor.utils (#418) Removing generation field from the patched nn.Module (#423) * Removing generation field from the patched nn.Module * Lint * Rebseing [Inductor] Triton template for conv (#422) * TritonTemplateKernel, template_codegen, conv jinja2 template * pip install Jinja2 in setup_nightly Remove unnecessary call to clone which also caused a segfault (??) (#427) Disable TorchDynamo on frames created by fx symbolic tracing (#429) Fix for disabling triton templates (#430) Minor test fixes (#432) [inductor] Remove dead stores after fusion (#409) Summary: Use DeferredLine/DeferredIndentedBuffer to perform a lazy emit of buffer allocation/store after we determine which buffers are redundant. Add support for iterating over a dict (#436) Check if kwargs has key "fp16_mode" when determining the precision (#437) [inductor] Add heuristic to set num_warps (#433) update accuracy check for TRT fp32 (#438) * temp changes * fix an issue in fp32, change accuracy check to cosine similarity for fp32 since TRT fp32 could not meet 1e-4 Pin CI to June 20th Torch Nightly (#441) * Pin CI to June 20th Torch Nightly * Respond to commetns * duplicated set up.. [inductor] Add some prims (#431) Disabling the trace instead of symbolic trace (#443) Break graph on torch.Storage types (#428) * Break graph on torch.Storage types * Hmm, CI failing, trying Jason's suggestion * Debug CI Fix guard propagation for tuple iterators (#448) [inductor] Improve fusing of tiled + untiled (#446) Directly compute sum of a list of floats/ints (#449) * Directly compute sum of a list of floats/ints * Test Added option use_bmm to enable triton codegen for bmm (#393) * Added option use_bmm to enable triton codegen for bmm * Added more shapes to microbench Add support for crossentropy (#450) Rewrite symbolic_locals for torch.return_types (#442) * Rewrite symbolic_locals for torch.return_types * Special casing on the out kwargs * Replace == with is Remove traced op overloads before compiling (#455) * Remove op overloads * lint Raise errors when backends throw exceptions (#451) Huggingface model benchmarking (#459) [inductor] Support scatter operations (#434) Step 2 of supporting UnspecializedNumpyVariable & UnspecializedPythonVariable (#392) * Implement UnspecializedPrimitiveVariable codegen * Make UnspecilizedPrimitiveVariable as GraphArg * Update make_call_generated_code * Update min/max builtin func * Support random.random * Remove unnecessary change * Fix lint * Refactor to support multiple random.random * Refactor out unspecialized numpy and python variables * Fix RandomValueSource guard * Support multiple random functions * Rebase to updated main * Refactor out random_values_var * Fix lint * Fix lint * Move random_values_var to output graph * Add need_unwrap to distinguish unspec from x.item() * Make global rand func unique * Fix lint * Add raw value propagation for unspec variables * Fix lint * Directly load type(raw_value) & update random func example value * Fix lint Add Fake Tensor Propagation (#426) * Add Fake Tensor Propagation * extend test * lint * fix import * one more day.. * update functorch commit * bump one more day to get pytorch/pytorch#79741 * Skip test * use FakeTensorError * lint * Guard on fake tensor availability * test skips (fix en route in core) * lint * update nightly * lint * update * format * update recent [inductor] Cherry pick nll_loss_forward decomp (#456) [inductor] Register lowerings for operator overloads (#457) Add support for torch.finfo/torch.iinfo (#470) [inductor] Auto-download gcc12 from conda-forge (#471) Rename benchmarking files (#472) Add support for named_params and named_modules (#465) [inductor] Add a metric to count the number of generated kernels (#476) Summary: This can be used to prevent a regression on our fusion result. Extend python_key_normalize with support for PythonTensor class override and a post trace hook (#424) * Add support for custom class * lint * Fix unpack to reflect main * Feedback * Simplify, rebase * Lint, format Flag to skip printing of Dynamo internal exceptions (#480) Don't leak cache entry on skip Fixes pytorch/torchdynamo#477 Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 5b73c3470173b894d188c6e9290ebc3cbbef2bab Pull Request resolved: pytorch/torchdynamo#478 Reduction cache (#487) * enable cse for reductions * self.cse use fake tensors when deep copying model to check mutation (#486) * use fake tensors when deepcopying model to check mutation * fix fake tensors not available * add tests Added cos lowering (#492) fix microbenchmarks import path (#474) Compute multilayer reductions in higher precision (#484) * Compute multilayer reductions in higher precision * Compute whole mean kernel in higher precision, downcast in end * update test * lint * skip test Fix correctness checking code and non-deterministic benchmarks (#493) Improve recompilation warning (#494) - default to printing only the most recent guard failure not one failure for each cache miss - reformat the text to be (hopefully) more readable and useful Motivation: While in some cases, knowing the individual failure reasons for each of (say, 64) cache misses could be useful, i practice it is probably good enough to know the most recent one since they tend to be similar reasons (such as incrementing counters or new object ids triggering the same type of guard). Previously: torchdynamo hit recompilation cache limit (64) for function 'toy_example' (example.py:5), due to the following guard failures: [['___guarded_code.valid'], {... 62 more times...}, ['___guarded_code.valid']]to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md Now: torchdynamo hit config.cache_size_limit (64) function: 'toy_example' (example.py:5) reasons: ['___guarded_code.valid'] Add a new backend option for TVM's meta_schedule (#479) Add (experimental) support for exporting a graph and guards (#469) Makefile/packaging updates (#499) [inductor] Misc small improvements (#475) Adding logging config (#504) More Huggingface models (#500) * More Huggingface models (from simple_dl) * Comments Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326) Revert "Enable guard_nn_modules - Observe module changes to drive invalidation instead of id_match + training guards (#326)" (#509) This reverts commit c747580. Disable torchdynamo inside dispatch_trace (#508) [inductor] Support rand()/dropout() (#505) [inductor] Workaround triton bug with XBLOCK=1 (#510) Reduction deps (#502) * mark more reduction dependencies * cleanup * black * make sure canonicalization prefix is the same everywhere change tuning trial of meta schedule (#513) [inductor] Fix bug with invalidated reuse (#506) [inductor] Handle no-op slice_scatter (#507) [WIP] Adding AMP support in benchmark infra (#464) [WIP][Discussion] Write out a deeper documentation on how we go from … (#498) * [WIP][Discussion] Write out a deeper documentation on how we go from user code to producing guards * Update GuardsOverviewPt1.md * Update GuardsOverviewPt1.md
davidmiller4185101
added a commit
to davidmiller4185101/django-torch-dynamo-develop
that referenced
this pull request
Sep 29, 2022
Summary: In pytorch/torchdynamo#167 we added `__torch_function__` support for tracing through `call_function`. This PR extends the support to also work on `call_method`. Note: the LOC is high because some code was refactored to be reusable. Note: implementing correct rewrapping logic for methods is saved for a future PR, hope that's OK. Test plan: ``` pytest -vsk test_simple_torch_function ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
This adds a skeleton for
__torch_function__support in torchdynamo.What this is currently doing:
__torch_function__and wrap tensors inTensorTFOverrideVariableif foundTorchVariable.call_function, inline the__torch_function__function ofTensorTFOverrideVariableargumentsGetAttrVariable.call_function, check forsuper().__torch_function__which resolves to the original. If it's found, stop inlining and insert the function call into the graph.The current test just creates a
__torch_function__override which doesn't doanything but call
super().__torch_function__.Things left for future PRs:
call_method__torch_function__spec (currently things are hardcoded to first argument only)Test plan: