Fix incorrect Position character value in LSP#14704
Conversation
For utf-16 positionEncoding, characters above U+FFFF must be counted as 2, but they were incorrectly counted as 1.
37e08c1 to
9149c56
Compare
PerformancePreviously reverted implementation: rubocop/lib/rubocop/lsp/diagnostic.rb Line 185 in cd3c71a Current Implementation: rubocop/lib/rubocop/lsp/diagnostic.rb Lines 167 to 172 in 2c9071d Implementation of this pull request: str.length + str.b.count("\xf0-\xff".b)Benchmark: require 'benchmark'
str = "abcdefg"*100
n = 100
Benchmark.benchmark do |r|
r.report 'reverted' do
n.times do
str.size + str.count("\u{10000}-\u{10FFFF}")
end
end
r.report 'current' do
n.times do
line_length = 0
str.codepoints.each do |codepoint|
line_length += 1
line_length += 1 if codepoint > 0xFFFF
end
end
end
r.report 'this' do
n.times do
str.length + str.b.count("\xf0-\xff".b)
end
end
endResults |
|
I revised it to use benchmark-ips: # bench.rb
require 'benchmark/ips'
str = "abcdefg" * 100
Benchmark.ips do |r|
r.report 'reverted' do
str.size + str.count("\u{10000}-\u{10FFFF}")
end
r.report 'current' do
line_length = 0
str.codepoints.each do |codepoint|
line_length += 1
line_length += 1 if codepoint > 0xFFFF
end
end
r.report 'this' do
str.length + str.b.count("\xf0-\xff".b)
end
r.compare!
endI revised it to use benchmark-ips. It doesn’t become about 100,000× slower, but it still seems to be about 60× slower. That said, this may be an acceptable trade-off if the performance cost is necessary to compute the correct result. $ ruby bench.rb
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [arm64-darwin24]
Warming up --------------------------------------
reverted 3.000 i/100ms
current 5.460k i/100ms
this 328.754k i/100ms
Calculating -------------------------------------
reverted 31.845 (± 0.0%) i/s (31.40 ms/i) - 162.000 in 5.088269s
current 54.600k (± 0.7%) i/s (18.31 μs/i) - 273.000k in 5.000219s
this 3.263M (± 0.5%) i/s (306.43 ns/i) - 16.438M in 5.037162s
Comparison:
this: 3263364.7 i/s
current: 54600.5 i/s - 59.77x slower
reverted: 31.8 i/s - 102474.95x slower |
|
I think that ideally we should optimize this a bit more before going forward with it, although probably the current implementation won't cause issues in practice. |
|
Since |
|
Ah, my bad. Seem me and @koic misread the results. 🤦 |
|
Yeah, I misread it 🙇 Thanks! |
|
Great outcome! Thanks folks |
For utf-16 positionEncoding, characters above U+FFFF must be counted as 2, but they were incorrectly counted as 1.
Before submitting the PR make sure the following are checked:
[Fix #issue-number](if the related issue exists).master(if not - rebase it).bundle exec rake default. It executes all tests and runs RuboCop on its own code.{change_type}_{change_description}.mdif the new code introduces user-observable changes. See changelog entry format for details.