`detectron2_maskrcnn` OOMs on eager with A100 40G.

### 🐛 Describe the bug

It looks odd asking for 20209.02 GiB of memory.

```bash
python benchmarks/dynamo/torchbench.py \
    --accuracy --no-translation-validation --inference --bfloat16 \
    --backend inductor --disable-cudagraphs --device cuda --no-skip \
    -k '^detectron2_maskrcnn$'
```

```python
cuda eval  detectron2_maskrcnn
Traceback (most recent call last):
  File "benchmarks/dynamo/common.py", line 2171, in validate_model
    self.model_iter_fn(model, example_inputs)
  File "benchmarks/dynamo/torchbench.py", line 469, in forward_pass
    return mod(*inputs)
  File "torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 150, in forward
    return self.inference(batched_inputs)
  File "/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 213, in inference
    results, _ = self.roi_heads(images, features, proposals, None)
  File "torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 747, in forward
    pred_instances = self._forward_box(features, proposals)
  File "/lib/python3.8/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 798, in _forward_box
    box_features = self.box_pooler(features, [x.proposal_boxes for x in proposals])
  File "torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/detectron2/modeling/poolers.py", line 261, in forward
    output.index_put_((inds,), pooler(x[level], pooler_fmt_boxes_level))
  File "torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/detectron2/layers/roi_align.py", line 58, in forward
    return roi_align(
  File "/lib/python3.8/site-packages/torchvision-0.18.0a0+a52607e-py3.8-linux-x86_64.egg/torchvision/ops/roi_align.py", line 236, in roi_align
    return _roi_align(input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned)
  File "/lib/python3.8/site-packages/torchvision-0.18.0a0+a52607e-py3.8-linux-x86_64.egg/torchvision/ops/roi_align.py", line 168, in _roi_align
    val = _bilinear_interpolate(input, roi_batch_ind, y, x, ymask, xmask)  # [K, C, PH, PW, IY, IX]
  File "/lib/python3.8/site-packages/torchvision-0.18.0a0+a52607e-py3.8-linux-x86_64.egg/torchvision/ops/roi_align.py", line 62, in _bilinear_interpolate
    v1 = masked_index(y_low, x_low)
  File "/lib/python3.8/site-packages/torchvision-0.18.0a0+a52607e-py3.8-linux-x86_64.egg/torchvision/ops/roi_align.py", line 55, in masked_index
    return input[
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20209.02 GiB. GPU 0 has a total capacity of 39.39 GiB of which 34.52 GiB is free. Process 7680 has 4.86 GiB memory in use. Of the allocated memory 4.22 GiB is allocated by PyTorch, and 119.07 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "benchmarks/dynamo/common.py", line 3826, in run
    ) = runner.load_model(
  File "benchmarks/dynamo/torchbench.py", line 405, in load_model
    self.validate_model(model, example_inputs)
  File "benchmarks/dynamo/common.py", line 2173, in validate_model
    raise RuntimeError("Eager run failed") from e
RuntimeError: Eager run failed

```

### Versions

- PyTorch: 5d6e323549bd5d3997e8c344532e611078be7011
- detectron2: 0.6

cc @ezyang @msaroufim @bdhirsh @anijain2305 @zou3519 @chauhang @miladm @lezcano 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`detectron2_maskrcnn` OOMs on eager with A100 40G. #120115

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

detectron2_maskrcnn OOMs on eager with A100 40G. #120115

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`detectron2_maskrcnn` OOMs on eager with A100 40G. #120115