perf(selector): more caching for faster selector creation#4611
Merged
WilcoFiers merged 4 commits intodevelopfrom Oct 16, 2024
Merged
perf(selector): more caching for faster selector creation#4611WilcoFiers merged 4 commits intodevelopfrom
WilcoFiers merged 4 commits intodevelopfrom
Conversation
straker
approved these changes
Oct 16, 2024
Contributor
straker
left a comment
There was a problem hiding this comment.
Ran memory performance tests on the changes. I feared it would significantly increase memory size of the cache, but testing shows that's not the case. Even on a very complex page the increase is only a few MB.
To get the results: I ran a heap snapshot on the page after axe-core was loaded on the page, then ran axe-core (with the teardown method disabled). After results came back I ran another heap snapshot to see the difference in memory size of the cache and virtual tree. I did this 3-5 times (depending on the size of the site) and got an average of the difference. Even for a complex website with 35k+ nodes, the new changes only added ~8MB.
| Before changes | After changes | |||||
|---|---|---|---|---|---|---|
| Axe-core loaded size (mb) | After run size (mb) | Diff size (mb) | Axe-core loaded size (mb) | After run size (mb) | Diff size (mb) | |
| https://www.amazon.com/ | 17.56 | 36.28 | 18.72 | 22.92 | 41.76 | 18.84 |
| https://js.tensorflow.org/api/latest/ | 20.5 | 318.67 | 298.17 | 28.6 | 335.33 | 306.73 |
straker
pushed a commit
that referenced
this pull request
Oct 16, 2024
While doing some perf testing I discovered axe hits a huge performance
bottleneck when there are lots of similar elements on the page. There
are essentially three problems with that:
1. Selectors aren't cached. An element tested more than once has a
selector created more than once.
2. When an element has no unique features, axe creates the same selector
and does a QSA with it. If there are lots of those, axe ends up running
the same SQA over and over. That can be avoided by caching too
3. When that QSA returns lots of similar elements, axe will refine the
selector using parents. Each time it does that it runs .matches on the
similar element with the updated selector. With many similar elements,
running QSA again is much faster than matching each individually,
especially so if the parent's aren't distinct either, so that the QSA
cache can be used.
I've been using the following code to test this out (with different
numbers in rows and cols):
```html
<main>
<h1>Hello World</h1>
<table></table>
</main>
<script>
const table = document.querySelector('table');
for (let row = 0; row < 200; row++) {
const tr = document.createElement('tr');
table.appendChild(tr);
for (let col = 0; col < 40; col++) {
const td = document.createElement('td');
const btn = document.createElement('button');
td.appendChild(btn);
tr.appendChild(td);
}
}
</script>
```
## Perf test results
| btns | cur | memo 1 | memo 2 | 50 | 100 | 500 | 600 | 750 | 1000 |
2000
|--------|--------|--------|--------|--------|-------|------|-------|------|-------|------
| 2000 | 2513 | 1511 | 469 | 519 | 192 | 190 | 195 | 190 | 188 | 464
| 2000 | 2585 | 1524 | 470 | 517 | 200 | 190 | 195 | 198 | 203 | 481
| 2000 | 2540 | 1501 | 475 | 525 | 194 | 203 | 190 | 191 | 188 | 468
| 4000 | 10748 | 6183 | 2500 | 2454 | 2452 | 1433 | 838 | 826 | 823 |
850
| 4000 | 10784 | 6199 | 2400 | 2516 | 2452 | 1433 | 839 | 854 | 845 |
819
| 4000 | 10701 | 6203 | 2400 | 2497 | 2442 | 1444 | 849 | 848 | 866 |
838
| 8000 | 43117 | 26398 | 8166 | 10954 | 10895 | 3680 | 2349 | 2220 |
3800 | 2176
| 8000 | 43365 | 26339 | 9747 | 10576 | 10742 | 3775 | 2313 | 2302 |
3826 | 2169
| 8000 | 44899 | 26167 | 9723 | 10615 | 10729 | 3772 | 2235 | 2210 |
3765 | 2238
| 12000 | 100254 | 61300 | 19291 | 20647 | 20623 | 6036 | 6137 | 6146 |
5974 | 6418
| 12000 | 100406 | 60615 | 19543 | 20924 | 20669 | 6122 | 6022 | 6287 |
6094 | 6108
| 12000 | 101401 | 61792 | 19522 | 20662 | 20834 | 5978 | 6170 | 6155 |
6102 | 6347
Numbers are taken from `Audit reporter` with `performanceTimer: true`.
- cur: Current situation
- memo 1: memoize doc.querySelectorAll
- memo 2: memoize doc.querySelectorAll + getSelector
- xxx: memoize 2 + re-run QSA if similar elements > XXX
## Memory usage
I've tested this approach on 10 different websites and saw no difference
in memory usage between this approach and our current approach.
## Performance on real-world websites
I tested this against 20 real-world websites and saw no change in
performance between them. The scenario where this matters is when there
is lots of repetition in a page such as in very large tables. The
important part is that improving performance on for this edge case
doesn't seem to hurt performance elsewhere.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While doing some perf testing I discovered axe hits a huge performance bottleneck when there are lots of similar elements on the page. There are essentially three problems with that:
I've been using the following code to test this out (with different numbers in rows and cols):
Perf test results
Numbers are taken from
Audit reporterwithperformanceTimer: true.Memory usage
I've tested this approach on 10 different websites and saw no difference in memory usage between this approach and our current approach.
Performance on real-world websites
I tested this against 20 real-world websites and saw no change in performance between them. The scenario where this matters is when there is lots of repetition in a page such as in very large tables. The important part is that improving performance on for this edge case doesn't seem to hurt performance elsewhere.