dynamo.optimizations.training.aot_autograd does not trace correct overload

### 🐛 Describe the bug

In the following Python program, `x + 2` gets traced as `torch.ops.aten.add.Tensor` instead of `torch.ops.aten.add.Scalar` which would be more technically correct (and is the op that torch.jit.script gives). I noticed this because in Torch-MLIR we verify that the operands match the schema exactly. Is it possible to have aot_autograd trace the more technically correct overload? Or can you provide guidance about how to correctly emulate the promotion semantics in appropriate generality?

Also, to add onto that, `torch.ops.aten.add.Tensor` is missing an argument. I need special-case code in my importer to handle this (by querying the schema for the default value). One possibility could be that PyTorch provides a utility that normalizes the graph of aten ops to "typecheck" correctly against the schemas of the constituent ops including reifying all default values.

```python
from typing import List

import torch
import torch._dynamo as dynamo
from torch._dynamo.optimizations.training import aot_autograd


def my_backend(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]):
    gm.print_readable()
    return gm


@dynamo.optimize(aot_autograd(fw_compiler=my_backend))
def f(x):
    return x + 2


example_inputs = (torch.randn(3, 4),)
f(*example_inputs)
```

Output:
```
class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: f32[3, 4]):
        # File: /tmp/repro2.py:15, code: return x + 2
        add: f32[3, 4] = torch.ops.aten.add.Tensor(arg0_1, 2);  arg0_1 = None
        return (add,)     
```

### Versions

PyTorch version: 2.0.0.dev20221213+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux rodete (x86_64)
GCC version: (Debian 12.2.0-3) 12.2.0
Clang version: 14.0.6-2
CMake version: version 3.25.0
Libc version: glibc-2.35

Python version: 3.10.8 (main, Nov  3 2022, 15:17:13) [GCC 12.2.0] (64-bit runtime)
Python platform: Linux-5.19.11-1rodete1-amd64-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.0rc2
[pip3] torch==2.0.0.dev20221213+cpu
[pip3] torchvision==0.15.0.dev20221213+cpu
[conda] Could not collect

cc @chauhang @penguinwu @avikchaudhuri @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4 @bdhirsh @bobrenjc93 @aorenste @ezyang @anijain2305 @gmagogsfm @zou3519 @msaroufim @wconstab @soumith @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamo.optimizations.training.aot_autograd does not trace correct overload #90923

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

dynamo.optimizations.training.aot_autograd does not trace correct overload #90923

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions