Fix byte/character offset confusion in FlAccessibleTextField#188138
Conversation
get_line_at_offset and get_paragraph_at_offset used PangoLayoutLine start_index and length (byte offsets) directly as character offsets. They were compared against the ATK character offset, reported back as character offsets, and passed to get_substring (which uses g_utf8_substring, expecting character offsets). For ASCII text byte and character offsets coincide, but for multi-byte UTF-8 text they diverge, producing incorrect substrings and offsets. Convert byte indices to character offsets with g_utf8_pointer_to_offset before comparing, reporting, or substringing. Add TextBoundaryMultiByte test covering line and paragraph boundaries on text containing multi-byte UTF-8 characters.
There was a problem hiding this comment.
Code Review
This pull request updates FlAccessibleTextField to compute line and paragraph boundaries using character offsets instead of byte offsets to correctly handle multi-byte UTF-8 characters, and adds a test case for this behavior. The review feedback suggests optimizing the performance of these computations by converting the input character offset to a byte offset once at the start of the functions, avoiding repeated
mattkae
left a comment
There was a problem hiding this comment.
Looks to be an improvement overall! Just have a suggestion
| gint line_start = g_utf8_pointer_to_offset(text, text + line->start_index); | ||
| gint line_end = | ||
| g_utf8_pointer_to_offset(text, text + line->start_index + line->length); | ||
| if (offset >= line_start && offset <= line_end) { |
There was a problem hiding this comment.
Can we add a comment explaining how we'd get here?
There was a problem hiding this comment.
Added some comments on the functions. The functions are still pretty complex but hopefully this helps.
Iterate the Pango layout lines with a for loop instead of a while loop, keeping the list iterator scoped to the loop. No functional change.
Add comments describing what each function returns, including the start and end character offsets and how lines and paragraphs are delimited.
flutter/flutter@e228771...87224e0 2026-06-23 engine-flutter-autoroll@skia.org Roll Dart SDK from 5cae7f9ada62 to 3a66ea7b9aaa (1 revision) (flutter/flutter#188379) 2026-06-23 engine-flutter-autoroll@skia.org Roll Dart SDK from 1e6c246bb73a to 5cae7f9ada62 (2 revisions) (flutter/flutter#188370) 2026-06-23 engine-flutter-autoroll@skia.org Roll Skia from 766f21ae61dc to ffac3e91fbc7 (24 revisions) (flutter/flutter#188366) 2026-06-23 737941+loic-sharma@users.noreply.github.com [Windows] Add public API to post task to platform thread (flutter/flutter#187365) 2026-06-23 engine-flutter-autoroll@skia.org Roll Dart SDK from 7ab0179ce4d4 to 1e6c246bb73a (1 revision) (flutter/flutter#188354) 2026-06-23 robert.ancell@canonical.com Fix byte/character offset confusion in FlAccessibleTextField (flutter/flutter#188138) 2026-06-22 engine-flutter-autoroll@skia.org Roll Fuchsia Linux SDK from Lm76V7lvxVA0r1De5... to RymJjIj7dd5vQ3Cnh... (flutter/flutter#188353) 2026-06-22 137456488+flutter-pub-roller-bot@users.noreply.github.com Roll pub packages (flutter/flutter#188355) 2026-06-22 154381524+flutteractionsbot@users.noreply.github.com Sync CHANGELOG.md from stable (flutter/flutter#188331) 2026-06-22 engine-flutter-autoroll@skia.org Roll Skia from 5fbb9bbd889c to 766f21ae61dc (2 revisions) (flutter/flutter#188184) 2026-06-22 49699333+dependabot[bot]@users.noreply.github.com Bump actions/checkout from 6.0.3 to 7.0.0 in the all-github-actions group (flutter/flutter#188350) 2026-06-22 robert.ancell@canonical.com Use g_signal_connect_object in the Linux embedder (flutter/flutter#188241) 2026-06-22 robert.ancell@canonical.com Disconnect from parent window signal when view is destroyed (flutter/flutter#185521) 2026-06-22 rmacnak@google.com Remove many absolute paths from build commands. (flutter/flutter#187765) 2026-06-22 haiderqadir.hq@gmail.com Fix spelling mistake in documentation (wether → whether) (flutter/flutter#186141) 2026-06-22 engine-flutter-autoroll@skia.org Roll Dart SDK from a748c4b15399 to 7ab0179ce4d4 (2 revisions) (flutter/flutter#188332) 2026-06-22 robert.ancell@canonical.com [Linux] Move compositor shader into its own GObject (flutter/flutter#188144) 2026-06-22 bkonyi@google.com Add agent skills for orchestrating cherry-picks to stable and beta channels (flutter/flutter#187860) 2026-06-22 engine-flutter-autoroll@skia.org Roll Packages from c516c92 to cd5194a (1 revision) (flutter/flutter#188312) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-packages Please CC stuartmorgan@google.com on the revert to ensure that a human is aware of the problem. To file a bug in Packages: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
…#188138) get_line_at_offset and get_paragraph_at_offset used PangoLayoutLine start_index and length (byte offsets) directly as character offsets. They were compared against the ATK character offset, reported back as character offsets, and passed to get_substring (which uses g_utf8_substring, expecting character offsets). For ASCII text byte and character offsets coincide, but for multi-byte UTF-8 text they diverge, producing incorrect substrings and offsets. Convert byte indices to character offsets with g_utf8_pointer_to_offset before comparing, reporting, or substringing. Add TextBoundaryMultiByte test covering line and paragraph boundaries on text containing multi-byte UTF-8 characters.
get_line_at_offset and get_paragraph_at_offset used PangoLayoutLine start_index and length (byte offsets) directly as character offsets. They were compared against the ATK character offset, reported back as character offsets, and passed to get_substring (which uses g_utf8_substring, expecting character offsets).
For ASCII text byte and character offsets coincide, but for multi-byte UTF-8 text they diverge, producing incorrect substrings and offsets.
Convert byte indices to character offsets with g_utf8_pointer_to_offset before comparing, reporting, or substringing.
Add TextBoundaryMultiByte test covering line and paragraph boundaries on text containing multi-byte UTF-8 characters.