[profiler] Speed up postprocessing#58021
[profiler] Speed up postprocessing#58021ilia-cher wants to merge 10 commits intogh/ilia-cher/106/basefrom
Conversation
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id=0, thread=0, name="child", start_us=st+10, end_us=st+90))
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit cdc3f46 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
(#56623) |
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id=0, thread=0, name="child", start_us=st+10, end_us=st+90))
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id=0, thread=0, name="child", start_us=st+10, end_us=st+90))
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
ghstack-source-id: 07f47c2
Pull Request resolved: #58021
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id=0, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id=0, thread=0, name="child", start_us=st+10, end_us=st+90))
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
ghstack-source-id: 1e2335d
Pull Request resolved: pytorch#58021
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Differential Revision: [D28347217](https://our.internmc.facebook.com/intern/diff/D28347217)
[ghstack-poisoned]
|
@ilia-cher has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@ilia-cher merged this pull request in cdf161c. |
Summary: Pull Request resolved: pytorch#58021 Improve complexity of _remove_dup_nodes function Test Plan: using trivial microbenchmark: ``` import torch from torch.autograd.profiler import * import time evts = EventList() id_cnt = 0 for r in range(10*1000): st = r * 1000 evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100)) evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99)) evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90)) id_cnt+=3 st = time.time() evts._build_tree() print("Elapsed: {:.3f}s".format(time.time() - st)) ``` ``` After: python test_prof.py Elapsed: 0.203s Before: python test_prof.py Elapsed: 3.653s ``` Reviewed By: gdankel Differential Revision: D28347217 Pulled By: ilia-cher fbshipit-source-id: d62da3400009f1fa8cb41a11a828aa8307f190bf
Summary: Pull Request resolved: pytorch#58021 Improve complexity of _remove_dup_nodes function Test Plan: using trivial microbenchmark: ``` import torch from torch.autograd.profiler import * import time evts = EventList() id_cnt = 0 for r in range(10*1000): st = r * 1000 evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100)) evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99)) evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90)) id_cnt+=3 st = time.time() evts._build_tree() print("Elapsed: {:.3f}s".format(time.time() - st)) ``` ``` After: python test_prof.py Elapsed: 0.203s Before: python test_prof.py Elapsed: 3.653s ``` Reviewed By: gdankel Differential Revision: D28347217 Pulled By: ilia-cher fbshipit-source-id: d62da3400009f1fa8cb41a11a828aa8307f190bf
Stack from ghstack:
Summary:
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
Differential Revision: D28347217