Skip to content

[BigString] Fix character indexing operations#485

Merged
lorentey merged 2 commits intoapple:release/1.2from
lorentey:fix-character-distance-1.2
Jun 23, 2025
Merged

[BigString] Fix character indexing operations#485
lorentey merged 2 commits intoapple:release/1.2from
lorentey:fix-character-distance-1.2

Conversation

@lorentey
Copy link
Copy Markdown
Member

BigString sometimes miscounts distances in its character view due to alignment issues.

let big = BigString("a" + String(repeating: "\u{301}", count: 300))

print(big.count) // 1 ✔️
print(big.distance(from: big.startIndex, to: big.endIndex)) // 2 ❌

// If `big` had two characters, then the index after the start should not be at the end
let i = big.index(after: big.startIndex)
print(i == big.endIndex) // true ✔️

print(Array(big[...])) // Trap: invalid collection 💥

Resolve this by overhauling distance calculations in the character metric:

  • Introduce a prefixSize function in _StringMetric, distinct from distance.
  • In the metric-agnostic index distance algorithm, use prefixCount to avoid double-counting characters that cross (or end on) chunk boundaries.
  • Simplify _StringMetric.distance by requiring that start <= end.
  • Crucially, change character-index operations to start by rounding each input index down to the nearest character break.

rdar://153412693

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (if appropriate).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
  • I've updated the documentation if necessary.

lorentey added 2 commits June 17, 2025 17:27
BigString sometimes miscounted distances in its character view due to alignment issues. Resolve this by overhauling distance calculations in the character metric:

- Introduce a `prefixSize` function in `_StringMetric`, distinct from `distance`.
- In the metric-agnostic index distance algorithm, use `prefixCount` to avoid double-counting characters that cross (or end on) chunk boundaries.
- Simplify `_StringMetric.distance` by requiring that `start <= end`.
- Crucially, change character-index operations to start by rounding each input index down to the nearest character break.
@lorentey lorentey added this to the 1.2.1 milestone Jun 18, 2025
@lorentey lorentey requested a review from Azoy June 18, 2025 00:33
@lorentey lorentey added the RopeModule Positional B-trees label Jun 18, 2025
@lorentey lorentey merged commit f02550b into apple:release/1.2 Jun 23, 2025
20 checks passed
@lorentey lorentey deleted the fix-character-distance-1.2 branch June 23, 2025 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RopeModule Positional B-trees

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant