Skip to content

[profiler] Speed up postprocessing#58021

Closed
ilia-cher wants to merge 10 commits intogh/ilia-cher/106/basefrom
gh/ilia-cher/106/head
Closed

[profiler] Speed up postprocessing#58021
ilia-cher wants to merge 10 commits intogh/ilia-cher/106/basefrom
gh/ilia-cher/106/head

Conversation

@ilia-cher
Copy link
Copy Markdown
Contributor

@ilia-cher ilia-cher commented May 11, 2021

Stack from ghstack:

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:

import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s

Differential Revision: D28347217

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id=0, thread=0, name="child", start_us=st+10, end_us=st+90))

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented May 11, 2021

💊 CI failures summary and remediations

As of commit cdc3f46 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@ilia-cher
Copy link
Copy Markdown
Contributor Author

(#56623)

@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id=0, thread=0, name="child", start_us=st+10, end_us=st+90))

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
ilia-cher pushed a commit that referenced this pull request May 11, 2021
Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id=0, thread=0, name="child", start_us=st+10, end_us=st+90))

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

ghstack-source-id: 07f47c2
Pull Request resolved: #58021
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

dgl-intel pushed a commit to dgl-intel/pytorch that referenced this pull request May 11, 2021
Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id=0, thread=0, name="child", start_us=st+10, end_us=st+90))

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

ghstack-source-id: 1e2335d
Pull Request resolved: pytorch#58021
Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ilia-cher merged this pull request in cdf161c.

@facebook-github-bot facebook-github-bot deleted the gh/ilia-cher/106/head branch May 15, 2021 14:23
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#58021

Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Reviewed By: gdankel

Differential Revision: D28347217

Pulled By: ilia-cher

fbshipit-source-id: d62da3400009f1fa8cb41a11a828aa8307f190bf
@ilia-cher ilia-cher changed the title [profiler][small] Speed up postprocessing [profiler] Speed up postprocessing May 25, 2021
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Summary:
Pull Request resolved: pytorch#58021

Improve complexity of _remove_dup_nodes function

Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time

evts = EventList()
id_cnt = 0
for r in range(10*1000):
    st = r * 1000
    evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
    evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
    evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
    id_cnt+=3

st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```

```
After:
python test_prof.py
Elapsed: 0.203s

Before:
python test_prof.py
Elapsed: 3.653s
```

Reviewed By: gdankel

Differential Revision: D28347217

Pulled By: ilia-cher

fbshipit-source-id: d62da3400009f1fa8cb41a11a828aa8307f190bf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants