Skip to content

Use a SmallVec for ABIArgs#4584

Merged
fitzgen merged 1 commit intobytecodealliance:mainfrom
fitzgen:abi-arg-small-vec
Aug 2, 2022
Merged

Use a SmallVec for ABIArgs#4584
fitzgen merged 1 commit intobytecodealliance:mainfrom
fitzgen:abi-arg-small-vec

Conversation

@fitzgen
Copy link
Copy Markdown
Member

@fitzgen fitzgen commented Aug 2, 2022

Instead of a regular Vec.

These vectors are usually very small, for example here is the histogram of sizes
when running Sightglass's pulldown-cmark benchmark:

;; Number of samples = 10332
;; Min = 0
;; Max = 11
;;
;; Mean = 2.496128532713901
;; Standard deviation = 2.2859559855427243
;; Variance = 5.225594767838607
;;
;; Each ∎ is a count of 62
;;
 0 ..  1 [ 3134 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 1 ..  2 [ 2032 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 2 ..  3 [  159 ]: ∎∎
 3 ..  4 [  838 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
 4 ..  5 [  970 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 5 ..  6 [ 2566 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 6 ..  7 [  303 ]: ∎∎∎∎
 7 ..  8 [  272 ]: ∎∎∎∎
 8 ..  9 [   40 ]:
 9 .. 10 [   18 ]:

By using a SmallVec with capacity of 6 we avoid the vast majority of heap
allocations and get some nice benchmark wins of up to ~1.11x faster compilation.

Sightglass Benchmark Results

compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 340361395.90 ± 63384608.15 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [3101467423 3425524333.41 4060621653] main.so
  [2820915877 3085162937.51 3375167352] smallvec.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 988446098.59 ± 184075718.89 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [9006994951 9948091070.66 11792481990] main.so
  [8192243090 8959644972.07 9801848982] smallvec.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 7854567.87 ± 2215491.16 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [80354527 93864666.76 119789198] main.so
  [77554917 86010098.89 94726994] smallvec.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 22810509.85 ± 6434024.63 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [233358190 272593088.57 347880715] main.so
  [225227821 249782578.72 275097380] smallvec.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 10849521.41 ± 4324757.85 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [133875427 156859544.47 222455440] main.so
  [126073854 146010023.06 181611647] smallvec.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 31508176.97 ± 12559561.91 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [388788638 455536988.31 646034523] main.so
  [366132033 424028811.34 527419755] smallvec.so

@fitzgen fitzgen requested a review from cfallin August 2, 2022 21:37
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! I'm really liking these casual 10% speedups from allocation profiling :-)

@cfallin
Copy link
Copy Markdown
Member

cfallin commented Aug 2, 2022

Incidentally I'd love to know which tool/crate you're using to do the histogram -- I've done ad-hoc things (println | sort -n | uniq -c level of hackery) but if there's a stats/sampling framework somewhere we should be using I'd love to know about it!

@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented Aug 2, 2022

I wrote this tool a while back for command-line histograms: https://github.com/fitzgen/histo

@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented Aug 2, 2022

I just hack in println!s, pipe through grep and cut, and then pipe into that tool.

@fitzgen fitzgen force-pushed the abi-arg-small-vec branch from d18b8a3 to 32ead2e Compare August 2, 2022 21:52
Instead of a regular `Vec`.

These vectors are usually very small, for example here is the histogram of sizes
when running Sightglass's `pulldown-cmark` benchmark:

```
;; Number of samples = 10332
;; Min = 0
;; Max = 11
;;
;; Mean = 2.496128532713901
;; Standard deviation = 2.2859559855427243
;; Variance = 5.225594767838607
;;
;; Each ∎ is a count of 62
;;
 0 ..  1 [ 3134 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 1 ..  2 [ 2032 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 2 ..  3 [  159 ]: ∎∎
 3 ..  4 [  838 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
 4 ..  5 [  970 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 5 ..  6 [ 2566 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 6 ..  7 [  303 ]: ∎∎∎∎
 7 ..  8 [  272 ]: ∎∎∎∎
 8 ..  9 [   40 ]:
 9 .. 10 [   18 ]:
```

By using a `SmallVec` with capacity of 6 we avoid the vast majority of heap
allocations and get some nice benchmark wins of up to ~1.11x faster compilation.

<h3>Sightglass Benchmark Results</h3>

```
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 340361395.90 ± 63384608.15 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [3101467423 3425524333.41 4060621653] main.so
  [2820915877 3085162937.51 3375167352] smallvec.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 988446098.59 ± 184075718.89 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [9006994951 9948091070.66 11792481990] main.so
  [8192243090 8959644972.07 9801848982] smallvec.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 7854567.87 ± 2215491.16 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [80354527 93864666.76 119789198] main.so
  [77554917 86010098.89 94726994] smallvec.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 22810509.85 ± 6434024.63 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [233358190 272593088.57 347880715] main.so
  [225227821 249782578.72 275097380] smallvec.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 10849521.41 ± 4324757.85 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [133875427 156859544.47 222455440] main.so
  [126073854 146010023.06 181611647] smallvec.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 31508176.97 ± 12559561.91 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [388788638 455536988.31 646034523] main.so
  [366132033 424028811.34 527419755] smallvec.so
```
@fitzgen fitzgen force-pushed the abi-arg-small-vec branch from 32ead2e to 1a43150 Compare August 2, 2022 21:58
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Aug 2, 2022
@fitzgen fitzgen merged commit ab1cf3d into bytecodealliance:main Aug 2, 2022
@fitzgen fitzgen deleted the abi-arg-small-vec branch August 2, 2022 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants