Difference between pytorch and onnx in inference

### 🐛 Describe the bug

Hi,

I have convert AdaBins_kitti.pt to onnx format using this code : 

```
import onnxscript
import torch
from models import UnetAdaptiveBins
import model_io
import torch.onnx
import torchview

OPSET_VERSION=14
opset_str='opset'+str(OPSET_VERSION)
import_opset = 'from onnxscript.onnx_opset import ' + opset_str + ' as op'
print(import_opset)
eval("exec( '" + import_opset + "')")
device = torch.device('cpu')
model = UnetAdaptiveBins.build(n_bins=256, min_val=1e-3, max_val=80)
dummy_input = torch.rand((1, 3, 480, 640), requires_grad=True)#.to(device)
model.eval()
model, _, _ = model_io.load_checkpoint(r"C:\Users\laurent\Downloads\AdaBins_kitti.pt", model)
print("EXPORT MODEL TO ONNX")
torch.onnx.export(model, 
                  dummy_input,
                  'AdaBins_kitti.onnx',
#                  custom_opsets = {"torch.onnx": OPSET_VERSION}, 
                  opset_version=OPSET_VERSION,
                  input_names=['image_in'],
                  output_names=['depth_out']
                  )

```

from models import UnetAdaptiveBins can be found here https://github.com/shariqfarooq123/AdaBins/tree/main
model is [here ](https://doc-14-c8-docs.googleusercontent.com/docs/securesc/3mr94gkfrtq7tuicph1rujatro1o6vr7/2srn7k4nes93shf4n3hsh6pc0r0b8cam/1687698000000/12617225915674687510/04069296435779562625Z/1HMgff-FV6qw1L0ywQZJ7ECa9VPq1bIoj?e=download&uuid=22d02057-f7ee-4cef-94ea-58a0e963be0b&nonce=qiqc5gvsefthk&user=04069296435779562625Z&hash=459bv4bkg110tv1ru6tth06nqskl33n6)
link given here https://github.com/shariqfarooq123/AdaBins/tree/main#download-links


Now I want to compare inference : 

```
import torch
import torch.nn
import numpy as np
from models import UnetAdaptiveBins
import model_io
import onnx
import onnxscript
import onnxruntime as rt
from PIL import Image
import cv2 as cv

MIN_DEPTH = 1e-3
MAX_DEPTH_NYU = 10
MAX_DEPTH_KITTI = 80

N_BINS = 256 
device = torch.device('cpu')
# KITTI
model = UnetAdaptiveBins.build(n_bins=N_BINS, min_val=MIN_DEPTH, max_val=MAX_DEPTH_KITTI)
min_depth = MIN_DEPTH
max_depth = MAX_DEPTH_KITTI

pretrained_path = "AdaBins_kitti.pt"
model, _, _ = model_io.load_checkpoint(pretrained_path, model)
img = cv.imread("classroom__rgb_00283.jpg")
image = torch.from_numpy(img.transpose((2, 0, 1))).unsqueeze(0).float().to(device)
bins, pred = model(image)

disparity_torch = pred.cpu().detach().numpy()
print("disparity_torch[0:10,0:10]")
print(disparity_torch[0:10,0:10])

blob = np.transpose(img, [2, 0, 1])

onnx_name = "AdaBins_kitti.onnx"

sess = rt.InferenceSession(onnx_name) 
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name
pred = sess.run([sess.get_outputs()[0].name, 
                 sess.get_outputs()[1].name,
               ], 
                 {input_name: [blob]})
disparity_onnx = pred[1][0, 0,: , :]
print("disparity_onnx[0:10,0:10]")
print(disparity_onnx[0:10,0:10])
```

Results are : 

```
Using cache found in C:\Users\laurent/.cache\torch\hub\rwightman_gen-efficientnet-pytorch_master
Loading base model (tf_efficientnet_b5_ap)...Done.
Removing last two layers (global_pool & classifier).
C:\Users\laurent\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\transformer.py:218: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because  self.layers[0].self_attn.batch_first was not True
  warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Building Encoder-Decoder model..Done.
disparity_torch[0:10,0:10]
[[[[7.003864  6.6035943 6.5599413 ... 6.4345403 6.524028  6.894148 ]
   [6.175882  6.0317497 5.9946747 ... 5.961024  5.9829626 6.0528393]
   [6.014679  5.941796  5.9283185 ... 5.918662  5.9255667 5.9526787]
   ...
   [5.9058013 5.905388  5.905355  ... 5.9054017 5.9054503 5.9059396]
   [5.9061584 5.90544   5.9053693 ... 5.9054766 5.9055605 5.9066734]
   [5.9206076 5.906718  5.905578  ... 5.9055915 5.9060435 5.914852 ]]]]
disparity_onnx[0:10,0:10]
[[78.20976   78.20976   78.20976   78.20976   78.20976   78.20976
  78.20976   78.20976   78.20976   78.20976  ]
 [78.20976   78.20976    5.873928   5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]
 [78.20976   78.20976    5.873928   5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]
 [78.20976   78.20976    5.8739305  5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]
 [78.20976   78.20976    7.769227   5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]
 [78.20976   78.20976   16.408312   5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]
 [78.20976   78.20976   47.379463   5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]
 [78.20976   78.20976   10.156654   5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]
 [78.20976   78.20976    8.455482   5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]
 [78.20976   78.20976    7.4906383  5.873928   5.873928   5.873928
   5.873928   5.873928   5.873928   5.873928 ]]

```

Inference for ONNX is wrong. Is it my code or onnx export?

### Versions




Collecting environment information...
PyTorch version: 2.1.0.dev20230626+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Professionnel
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.26.1
Libc version: N/A

Python version: 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 531.14
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3000
DeviceID=CPU0
Family=207
L2CacheSize=16384
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3000
Name=13th Gen Intel(R) Core(TM) i9-13900KF
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] msgpack-numpy==0.4.8
[pip3] numpy==1.23.5
[pip3] torch==2.1.0.dev20230626+cu118
[pip3] torchaudio==2.1.0.dev20230626+cu118
[pip3] torchview==0.2.6
[pip3] torchvision==0.16.0.dev20230627+cu118
[conda] Could not collect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between pytorch and onnx in inference #105661

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Difference between pytorch and onnx in inference #105661

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions