Skip to content

Optimize String#gsub for single-char find/replace#15661

Closed
petekinnecom wants to merge 1 commit intoruby:masterfrom
petekinnecom:gsub_special_case
Closed

Optimize String#gsub for single-char find/replace#15661
petekinnecom wants to merge 1 commit intoruby:masterfrom
petekinnecom:gsub_special_case

Conversation

@petekinnecom
Copy link
Copy Markdown

The String#tr method and String#gsub method have identical return values when passed two single-character strings. For example, my_str.tr("a", "b") == my_str.gsub("a", "b"). It turns out, however, that the performance of tr is significantly faster than gsub under these conditions.

We can check the parameters in gsub and invoke tr in order to optimize that case. The check comes at a fixed cost for usages that don't hit this condition.

Note: the scenario where gsub is passed a single-char search and an empty string is also handled by tr which itself calls delete.


I'm a ruby-dev poking around the code here, so if I'm off-base feel free to close or correct as needed. I came across this rubocop rule which suggests replacing gsub with tr under these conditions due to this benchmark. I figured this could be optimized under the hood.

Added some benchmarks to ensure that things sped up and didn't hurt performance of other cases too much. Here's a snippet of the results.

Comparison:
                                  escape
ruby-master/bin/ruby:             4354.5 i/s
ruby-gsub_special_case/bin/ruby:  4327.4 i/s - 1.01x  slower

                                  escape_bin
ruby-gsub_special_case/bin/ruby:  7637.9 i/s
ruby-master/bin/ruby:             7441.2 i/s - 1.03x  slower

                                  escape_utf8
ruby-master/bin/ruby:             4334.9 i/s
ruby-gsub_special_case/bin/ruby:  4302.3 i/s - 1.01x  slower

                                  escape_utf8_bin
ruby-master/bin/ruby:             8337.7 i/s
ruby-gsub_special_case/bin/ruby:  8278.4 i/s - 1.01x  slower

                                  single_char_with_string
ruby-gsub_special_case/bin/ruby:  70216.7 i/s
ruby-master/bin/ruby:             22187.8 i/s - 3.16x  slower

                                  single_char_with_string_bang
ruby-gsub_special_case/bin/ruby:  70370.8 i/s
ruby-master/bin/ruby:             22034.5 i/s - 3.19x  slower

                                  single_char_with_regex
ruby-master/bin/ruby:             10695.8 i/s
ruby-gsub_special_case/bin/ruby:  10438.6 i/s - 1.02x  slower

                                  multi_char_with_string
ruby-master/bin/ruby:             11069.7 i/s
ruby-gsub_special_case/bin/ruby:  10870.9 i/s - 1.02x  slower

The benchmark included substitutes a relatively rare character. When the replaced character is super common, the speed difference is much greater:

prelude: |
  STR = ((("a" * 31) + "<") * 1000).freeze

  single_char_with_string: |
    STR.gsub("a", "2")
Comparison:
                                  single_char_with_string
ruby-gsub_special_case/bin/ruby:  52916.7 i/s
ruby-master/bin/ruby:             1441.2 i/s - 36.72x  slower

The new results of the benchmark referenced by rubocop:

Comparison:
           String#tr:  6178111.8 i/s
         String#gsub:  6146252.2 i/s - same-ish: difference falls within error

The `String#tr` method and `String#gsub` method have identical return
values when passed two single-character strings. For example,
`my_str.tr("a", "b") == my_str.gsub("a", "b")`. It turns out, however,
that the performance of `tr` is faster than gsub under these conditions.

We can check the parameters in `gsub` and invoke `tr` in order to
optimize that case. The check comes at a fixed cost for usages that
don't hit this condition.

Note: the scenario where gsub is passed a single-char search and an
empty string is also handled by `tr` which itself calls `delete`.
@launchable-app
Copy link
Copy Markdown

launchable-app bot commented Dec 20, 2025

2/67072 Tests Failed

spec/ruby/core/matchdata/regexp_spec.rb#MatchData#regexp returns a Regexp for the result of gsub(String)
Expected /hay/ == /\[/
to be truthy but was false
/Users/runner/work/ruby/ruby/src/spec/ruby/core/matchdata/regexp_spec.rb:22:in 'block (2 levels) in <top (required)>'
/Users/runner/work/ruby/ruby/src/spec/ruby/core/matchdata/regexp_spec.rb:3:in '<top (required)>'
spec/ruby/core/matchdata/string_spec.rb#MatchData#string returns a frozen copy of the matched string for gsub!(String)
Expected "THX1138." == "he[[o"
to be truthy but was false
/home/runner/work/ruby/ruby/src/spec/ruby/core/matchdata/string_spec.rb:23:in 'block (2 levels) in <top (required)>'
/home/runner/work/ruby/ruby/src/spec/ruby/core/matchdata/string_spec.rb:3:in '<top (required)>'

[-> View Test suite health in main branch]

@mame
Copy link
Copy Markdown
Member

mame commented Dec 20, 2025

Thank you for working on this.

For example, my_str.tr("a", "b") == my_str.gsub("a", "b").

No, unfortunately. String#gsub has a side effect that creates and sets a MatchData to $~, while String#tr does not.

$ ruby -e '"aaa".tr("a", "b"); p $~'
nil

$ ruby -e '"aaa".gsub("a", "b"); p $~'
#<MatchData: a>

Simply replacing gsub with tr will cause incompatibility. Can you fix the issue and re-evaluate it?

@petekinnecom
Copy link
Copy Markdown
Author

Thanks I'll take a look at it and see how it goes. I also just realized I was only running the bootstrap test suite not the full test suite. 🤦 I'll close this for now and reopen this if it turns out reasonable.

@petekinnecom
Copy link
Copy Markdown
Author

I reopened it here because I force-pushed against the branch which makes the PR non-reopenable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants