Skip to content

A significant overhead when running fastrnns with autograd.profiler #49900

@Krovatkin

Description

@Krovatkin

🐛 Bug

Enabling autograd.profiler introduces nearly a 180% overhead (4.6s v. 8.17s)

To Reproduce

  1. From benchmarks dir, run numactl -C 3 python -m fastrnns.bench --fuser=te --executor=profiling --group=rnns --rnns=jit_premul_bias to get a baseline
  2. Apply the following patch to instrument the bench with autograd.profiler: https://github.com/pytorch/pytorch/commit/dd4b3268605c055a1a8653f8554ccffc9e874044.patch (or git cherry-pick dd4b3268605c055a1a8653f8554ccffc9e874044
  3. Re-run the bench (no re-comp is necessary as the patch contains only python changes).

Expected behavior

An overhead that's within 5-10%.

@ngimel @ilia-cher

cc @VitalyFedyunin @ngimel

Metadata

Metadata

Assignees

Labels

module: performanceIssues related to performance, either of kernel code or framework glueoncall: profilerprofiler-related issues (cpu, gpu, kineto)triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions