Optimize String#gsub for single-char find/replace#15661
Closed
petekinnecom wants to merge 1 commit intoruby:masterfrom
Closed
Optimize String#gsub for single-char find/replace#15661petekinnecom wants to merge 1 commit intoruby:masterfrom
petekinnecom wants to merge 1 commit intoruby:masterfrom
Conversation
The `String#tr` method and `String#gsub` method have identical return
values when passed two single-character strings. For example,
`my_str.tr("a", "b") == my_str.gsub("a", "b")`. It turns out, however,
that the performance of `tr` is faster than gsub under these conditions.
We can check the parameters in `gsub` and invoke `tr` in order to
optimize that case. The check comes at a fixed cost for usages that
don't hit this condition.
Note: the scenario where gsub is passed a single-char search and an
empty string is also handled by `tr` which itself calls `delete`.
❌ 2/67072 Tests Failedspec/ruby/core/matchdata/regexp_spec.rb#MatchData#regexp returns a Regexp for the result of gsub(String)spec/ruby/core/matchdata/string_spec.rb#MatchData#string returns a frozen copy of the matched string for gsub!(String) |
Member
|
Thank you for working on this.
No, unfortunately. Simply replacing |
Author
|
Thanks I'll take a look at it and see how it goes. I also just realized I was only running the bootstrap test suite not the full test suite. 🤦 I'll close this for now and reopen this if it turns out reasonable. |
Author
|
I reopened it here because I force-pushed against the branch which makes the PR non-reopenable. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
String#trmethod andString#gsubmethod have identical return values when passed two single-character strings. For example,my_str.tr("a", "b") == my_str.gsub("a", "b"). It turns out, however, that the performance oftris significantly faster than gsub under these conditions.We can check the parameters in
gsuband invoketrin order to optimize that case. The check comes at a fixed cost for usages that don't hit this condition.Note: the scenario where gsub is passed a single-char search and an empty string is also handled by
trwhich itself callsdelete.I'm a ruby-dev poking around the code here, so if I'm off-base feel free to close or correct as needed. I came across this rubocop rule which suggests replacing
gsubwithtrunder these conditions due to this benchmark. I figured this could be optimized under the hood.Added some benchmarks to ensure that things sped up and didn't hurt performance of other cases too much. Here's a snippet of the results.
Comparison: escape ruby-master/bin/ruby: 4354.5 i/s ruby-gsub_special_case/bin/ruby: 4327.4 i/s - 1.01x slower escape_bin ruby-gsub_special_case/bin/ruby: 7637.9 i/s ruby-master/bin/ruby: 7441.2 i/s - 1.03x slower escape_utf8 ruby-master/bin/ruby: 4334.9 i/s ruby-gsub_special_case/bin/ruby: 4302.3 i/s - 1.01x slower escape_utf8_bin ruby-master/bin/ruby: 8337.7 i/s ruby-gsub_special_case/bin/ruby: 8278.4 i/s - 1.01x slower single_char_with_string ruby-gsub_special_case/bin/ruby: 70216.7 i/s ruby-master/bin/ruby: 22187.8 i/s - 3.16x slower single_char_with_string_bang ruby-gsub_special_case/bin/ruby: 70370.8 i/s ruby-master/bin/ruby: 22034.5 i/s - 3.19x slower single_char_with_regex ruby-master/bin/ruby: 10695.8 i/s ruby-gsub_special_case/bin/ruby: 10438.6 i/s - 1.02x slower multi_char_with_string ruby-master/bin/ruby: 11069.7 i/s ruby-gsub_special_case/bin/ruby: 10870.9 i/s - 1.02x slowerThe benchmark included substitutes a relatively rare character. When the replaced character is super common, the speed difference is much greater:
Comparison: single_char_with_string ruby-gsub_special_case/bin/ruby: 52916.7 i/s ruby-master/bin/ruby: 1441.2 i/s - 36.72x slowerThe new results of the benchmark referenced by rubocop: