A significant overhead when running fastrnns with autograd.profiler

## 🐛 Bug

Enabling `autograd.profiler` introduces nearly a **180%** overhead (4.6s v. 8.17s)

## To Reproduce

1. **From benchmarks** dir, run `numactl -C 3 python -m fastrnns.bench --fuser=te --executor=profiling --group=rnns --rnns=jit_premul_bias` to get a baseline
2. Apply the following patch to instrument the bench with `autograd.profiler`: https://github.com/pytorch/pytorch/commit/dd4b3268605c055a1a8653f8554ccffc9e874044.patch (or `git cherry-pick dd4b3268605c055a1a8653f8554ccffc9e874044`
3. Re-run the bench (no re-comp is necessary as the patch contains only python changes).

## Expected behavior

An overhead that's within 5-10%.

@ngimel  @ilia-cher 


cc @VitalyFedyunin @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A significant overhead when running fastrnns with autograd.profiler #49900

🐛 Bug

To Reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A significant overhead when running fastrnns with autograd.profiler #49900

Description

🐛 Bug

To Reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions