PoC: precompute embeded string literals hash code#553
Closed
casperisfine wants to merge 1 commit intomasterfrom
Closed
PoC: precompute embeded string literals hash code#553casperisfine wants to merge 1 commit intomasterfrom
casperisfine wants to merge 1 commit intomasterfrom
Conversation
casperisfine
commented
Apr 8, 2024
compile.c
Outdated
| } | ||
| else { | ||
| lit = rb_fstring(lit); | ||
| lit = rb_fstring(rb_str_precompute_hash(rb_str_freeze(lit))); |
Author
There was a problem hiding this comment.
NB: the compiler should always feed pre-frozen strings to rb_fstring as to save a dup.
Author
There was a problem hiding this comment.
Or even better, it could use rb_enc_interned_str to save a bunch of allocs.
casperisfine
commented
Apr 8, 2024
e205c02 to
f22b383
Compare
Author
|
Note to self, we're precomputing in the compiler, we'd need to handle loading cached ISeq too. |
Author
|
@XrXr: shared https://bugs.ruby-lang.org/issues/15331, but seems like that patch was lazily computing the hashcode, here we eagerly do it during compilation, so I think we benefit more from it. |
f22b383 to
1a17cf3
Compare
Unclear how much worth it, it would be, but with embeded
strings we often have some space left in the slot, which
we can use to store the string Hash code.
It's probably only worth it for string literals, as they
are the ones likely to be used as hash keys.
We could even ony do this for string literal actually
used as hash keys, as seen by the compiler.
It is also unclear what the best layout would be. We
currently store it right after the string terminator
as to make it easy/fast to compute, but ideally it would
be at a fixed position, however this means creating another
`RString` union, which would complexify the code base significantly.
It's also unclear to us whether trying to respect alignment would
lead to a performance gain or not.
```ruby
hash = 10.times.to_h do |i|
[i, i]
end
dyn_sym = "dynamic_symbol".to_sym
hash[:some_symbol] = 1
hash[dyn_sym] = 1
hash["small"] = 2
hash["frozen_string_literal"] = 2
Benchmark.ips do |x|
x.report("symbol") { hash[:some_symbol] }
x.report("dyn_symbol") { hash[:some_symbol] }
x.report("small_lit") { hash["small"] }
x.report("frozen_lit") { hash["frozen_string_literal"] }
x.compare!(order: :baseline)
end
```
Before:
```
ruby 3.3.0 (2023-12-25 revision 5124f9a) [arm64-darwin23]
Warming up --------------------------------------
symbol 2.392M i/100ms
dyn_symbol 2.440M i/100ms
small_lit 2.155M i/100ms
frozen_lit 2.010M i/100ms
Calculating -------------------------------------
symbol 24.175M (± 1.7%) i/s - 122.002M in 5.048306s
dyn_symbol 24.345M (± 1.6%) i/s - 122.019M in 5.013400s
small_lit 21.252M (± 2.1%) i/s - 107.744M in 5.072042s
frozen_lit 20.095M (± 1.3%) i/s - 100.489M in 5.001681s
Comparison:
symbol: 24174848.1 i/s
dyn_symbol: 24345476.9 i/s - same-ish: difference falls within error
small_lit: 21252403.2 i/s - 1.14x slower
frozen_lit: 20094766.0 i/s - 1.20x slower
```
After:
```
ruby 3.4.0dev (2024-04-08T07:20:15Z interned-string-ha.. 76efed6) [arm64-darwin23]
Warming up --------------------------------------
symbol 2.400M i/100ms
dyn_symbol 2.405M i/100ms
small_lit 2.308M i/100ms
frozen_lit 2.268M i/100ms
Calculating -------------------------------------
symbol 23.528M (± 6.9%) i/s - 117.584M in 5.033231s
dyn_symbol 23.777M (± 4.7%) i/s - 120.231M in 5.071734s
small_lit 23.066M (± 2.9%) i/s - 115.376M in 5.006947s
frozen_lit 22.729M (± 1.1%) i/s - 115.693M in 5.090700s
Comparison:
symbol: 23527823.6 i/s
dyn_symbol: 23776757.8 i/s - same-ish: difference falls within error
small_lit: 23065535.3 i/s - same-ish: difference falls within error
frozen_lit: 22729351.6 i/s - same-ish: difference falls within error
```
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
1a17cf3 to
49c75af
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Unclear how much worth it, it would be, but with embeded strings we often have some space left in the slot, which we can use to store the string Hash code.
It's probably only worth it for string literals, as they are the ones likely to be used as hash keys.
We could even ony do this for string literal actually used as hash keys, as seen by the compiler.
It is also unclear what the best layout would be. We currently store it right after the string terminator as to make it easy/fast to compute, but ideally it would be at a fixed position, however this means creating another
RStringunion, which would complexify the code base significantly.It's also unclear to us whether trying to respect alignment would lead to a performance gain or not.
Before:
After:
cc @etiennebarrie