Skip to content

Fix incorrect Position character value in LSP#14704

Merged
bbatsov merged 1 commit intorubocop:masterfrom
tmtm:fix-utf16-position
Dec 16, 2025
Merged

Fix incorrect Position character value in LSP#14704
bbatsov merged 1 commit intorubocop:masterfrom
tmtm:fix-utf16-position

Conversation

@tmtm
Copy link
Copy Markdown
Contributor

@tmtm tmtm commented Dec 11, 2025

For utf-16 positionEncoding, characters above U+FFFF must be counted as 2, but they were incorrectly counted as 1.


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

For utf-16 positionEncoding, characters above U+FFFF must be counted as 2, but they were incorrectly counted as 1.
@tmtm tmtm force-pushed the fix-utf16-position branch from 37e08c1 to 9149c56 Compare December 11, 2025 14:41
@tmtm tmtm marked this pull request as ready for review December 11, 2025 14:53
@tmtm
Copy link
Copy Markdown
Contributor Author

tmtm commented Dec 14, 2025

Performance

Previously reverted implementation:

str.size + str.count("\u{10000}-\u{10FFFF}")

Current Implementation:

line_length = 0
line.codepoints.each do |codepoint|
line_length += 1
line_length += 1 if codepoint > RubyLsp::Document::Scanner::SURROGATE_PAIR_START
end
line_length

Implementation of this pull request:

str.length + str.b.count("\xf0-\xff".b)

Benchmark:

require 'benchmark'

str = "abcdefg"*100
n = 100

Benchmark.benchmark do |r|
  r.report 'reverted' do
    n.times do
      str.size + str.count("\u{10000}-\u{10FFFF}")
    end
  end
  r.report 'current' do
    n.times do
      line_length = 0
      str.codepoints.each do |codepoint|
        line_length += 1
        line_length += 1 if codepoint > 0xFFFF
      end
    end
  end
  r.report 'this'  do
    n.times do
      str.length + str.b.count("\xf0-\xff".b)
    end
  end
end

Results

              user     system      total        real
reverted  6.277922   0.234874   6.512796 (  6.515005)
current   0.001901   0.000000   0.001901 (  0.001901)
this      0.000042   0.000000   0.000042 (  0.000043)

@koic
Copy link
Copy Markdown
Member

koic commented Dec 15, 2025

I revised it to use benchmark-ips:

# bench.rb
require 'benchmark/ips'

str = "abcdefg" * 100

Benchmark.ips do |r|
  r.report 'reverted' do
    str.size + str.count("\u{10000}-\u{10FFFF}")
  end
  r.report 'current' do
    line_length = 0
    str.codepoints.each do |codepoint|
      line_length += 1
      line_length += 1 if codepoint > 0xFFFF
    end
  end
  r.report 'this' do
    str.length + str.b.count("\xf0-\xff".b)
  end
  r.compare!
end

I revised it to use benchmark-ips. It doesn’t become about 100,000× slower, but it still seems to be about 60× slower. That said, this may be an acceptable trade-off if the performance cost is necessary to compute the correct result.

$ ruby bench.rb
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [arm64-darwin24]
Warming up --------------------------------------
            reverted     3.000 i/100ms
             current     5.460k i/100ms
                this   328.754k i/100ms
Calculating -------------------------------------
            reverted     31.845 (± 0.0%) i/s   (31.40 ms/i) -    162.000 in   5.088269s
             current     54.600k (± 0.7%) i/s   (18.31 μs/i) -    273.000k in   5.000219s
                this      3.263M (± 0.5%) i/s  (306.43 ns/i) -     16.438M in   5.037162s

Comparison:
                this:  3263364.7 i/s
             current:    54600.5 i/s - 59.77x  slower
            reverted:       31.8 i/s - 102474.95x  slower

cc @jonleighton @bjeanes

@bbatsov
Copy link
Copy Markdown
Collaborator

bbatsov commented Dec 16, 2025

I think that ideally we should optimize this a bit more before going forward with it, although probably the current implementation won't cause issues in practice.

@tmtm
Copy link
Copy Markdown
Contributor Author

tmtm commented Dec 16, 2025

Since this is implemented in this pull request, it demonstrates that this pull request provides the best result.

@bbatsov
Copy link
Copy Markdown
Collaborator

bbatsov commented Dec 16, 2025

Ah, my bad. Seem me and @koic misread the results. 🤦

@bbatsov bbatsov merged commit 9cc142e into rubocop:master Dec 16, 2025
22 checks passed
@koic
Copy link
Copy Markdown
Member

koic commented Dec 16, 2025

Yeah, I misread it 🙇 Thanks!

@jonleighton
Copy link
Copy Markdown

Great outcome! Thanks folks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants