Inductor: don't reuse buffer if it would increase peak memory

### 🚀 The feature, motivation and pitch

Inductor has a config `allow_buffer_reuse` which will reuse a dead Tensor in memory allocation if the Tensor matches the newly allocated # of bytes. In some cases, if we are reusing a buffer during peak memory, this can increase memory usage. 

We should track current allocated and peak memory during inductor codegen. and only reuse a buffer if it does not increase peak memory. We already do a similar memory tracking [here](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/memory.py).  See, [buffer reuse logic](https://github.com/pytorch/pytorch/blob/af43b445a5b03ffbeab1d430d2232f48dec3053d/torch/_inductor/codegen/wrapper.py#L492-L496). 

### Alternatives

master

### Additional context

_No response_

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inductor: don't reuse buffer if it would increase peak memory #145883

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inductor: don't reuse buffer if it would increase peak memory #145883

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions