Skip to content

fix(core): calculate offset correctly when replacing marker in transform (fixes crash)🙀#11071

Merged
srl295 merged 4 commits intobetafrom
fix/core/11057-string-too-long
Mar 28, 2024
Merged

fix(core): calculate offset correctly when replacing marker in transform (fixes crash)🙀#11071
srl295 merged 4 commits intobetafrom
fix/core/11057-string-too-long

Conversation

@srl295
Copy link
Copy Markdown
Member

@srl295 srl295 commented Mar 25, 2024

  • handle the case where old and new context strings are the same (i.e. no work to do).
  • fix pointer arithmetic error in feat(core): unescape u 🙀 #10356 - the special case where one marker is being replaced by another. The named regression test didn't actually hit this case.

Fixes: bug(core): 'string too long' #11057

@keymanapp-test-bot skip

srl295 added 2 commits March 25, 2024 15:57
- ldml_test_source had a trim() implementation that was causing asserts in std::isspace() which turned out to be due to negative integers being passed
- The actual line that caused trouble was, quote, '«" />'
- handle the case where old and new context strings are the same (i.e. no work to do).
- fix pointer arithmetic error in #10356 - the special case where one marker is being replaced by another. The named regression test didn't actually hit this case.

Fixes: bug(core): 'string too long' #11057
@srl295 srl295 self-assigned this Mar 25, 2024
@srl295 srl295 added this to the B17S4 milestone Mar 25, 2024
@keymanapp-test-bot
Copy link
Copy Markdown

keymanapp-test-bot bot commented Mar 25, 2024

@github-actions github-actions bot added core/ Keyman Core fix labels Mar 25, 2024
@srl295 srl295 linked an issue Mar 25, 2024 that may be closed by this pull request
// marker change.
// We can detect this because the unchanged_prefix will end with u+FFFF U+0008
//
// Oh, and yes, test case 'regex-test-8a-0' hits this.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what an embarrasingly proud comment :(

That test case used to hit this. But, I added 8a as a prefix in the output string to distinguish it from the other cases in k_009.

For this fix I took a different tack:

  • 1 completely separate isolated file (k_212) with the regression test.
  • 1 original-plus-embedded-markup file (k_213) as submitted by the Affected User

Yes, it makes meson take longer to compile. But now there's a more stable regression test.

Comment on lines +129 to +131
// std::isspace chokes on negative (!) ints here, so spare it the trouble if ch is negative
// (possibly unsigned char to integer cast?)
s.erase(s.begin(), std::find_if(s.begin(), s.end(), [](int ch) { return (ch < 0) || !std::isspace(ch); }));
Copy link
Copy Markdown
Member Author

@srl295 srl295 Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k_213 contains some random chars. It was a 'scratch' area I was using while editing. This fixes a crash in std::isspace which, surprisingly, is to spec.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned that we are missing something more important. How can ch be < 0?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codepoint U+00AB in UTF-8 is C2 AB, and 0xC2 as a signed char cast to integer is -62

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified, PTAL.

(Won't be hit if we drop K213)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::ifstream and std::string by default seem to read in utf-8 as an array of chars.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see parent comment- i have a commit ready that drops K213 and also removes the lstrip() code, then we can revist that in 18

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood. sounds good to me

ctxt_prefix.first -= 2;
ctxt_prefix.second += 2;
ctxt_prefix.first -= 2; // backup the 'mismatch' point to before the FFFF
ctxt_prefix.second -= 2; // backup the 'mismatch' point to before the FFFF
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the bug here. I think I switched the sign sometime between the initial fix and commit. Coverage would have helped here.

@srl295 srl295 marked this pull request as ready for review March 26, 2024 00:35
@srl295 srl295 requested review from mcdurdin and rc-swag as code owners March 26, 2024 00:35
@mcdurdin mcdurdin changed the title fix(core): fix 2 marker cases in ldml_event_state::emit_difference 🙀 fix(core): calculate offset correctly when replacing marker in transform (fixes crash) + chore: optimization 🙀 Mar 26, 2024
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future maintainers, it would help if this was a more minimal fixture

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could drop K213 since we have K212.

Comment on lines +129 to +131
// std::isspace chokes on negative (!) ints here, so spare it the trouble if ch is negative
// (possibly unsigned char to integer cast?)
s.erase(s.begin(), std::find_if(s.begin(), s.end(), [](int ch) { return (ch < 0) || !std::isspace(ch); }));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned that we are missing something more important. How can ch be < 0?

- ldml_test_source: clarify UTF-8 processing in ltrim()
@srl295
Copy link
Copy Markdown
Member Author

srl295 commented Mar 27, 2024

@mcdurdin I'll drop K213 (non-minimal reproducer), then can remove the ltrim() stuff and deal with that separately from the crasher.

@mcdurdin
Copy link
Copy Markdown
Member

@mcdurdin I'll drop K213 (non-minimal reproducer), then can remove the ltrim() stuff and deal with that separately from the crasher.

Sounds good. Perhaps split the optimization into a separate PR too? I think it's better to keep each PR without riders.

- ldml_test_source fix moved off to issue #11083
- an optimization in ldml_processor also rolled back
@srl295
Copy link
Copy Markdown
Member Author

srl295 commented Mar 27, 2024

@mcdurdin done. (Originally I thought the optimization was the fix, hence the history).

I'll put this into a separate PR:

  if(ctxt_prefix.first == old_ctxt.end() && ctxt_prefix.second == new_ctxt.end()) {
    return; // No mismatch. We can just exit, there's nothing to do.
  }

@srl295 srl295 changed the title fix(core): calculate offset correctly when replacing marker in transform (fixes crash) + chore: optimization 🙀 fix(core): calculate offset correctly when replacing marker in transform (fixes crash)🙀 Mar 27, 2024
@srl295 srl295 requested a review from mcdurdin March 27, 2024 13:26
Copy link
Copy Markdown
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Base automatically changed from fix/core/11062-error-no-match to beta March 28, 2024 05:11
@srl295 srl295 merged commit 1ac5bdf into beta Mar 28, 2024
@srl295 srl295 deleted the fix/core/11057-string-too-long branch March 28, 2024 05:12
@keyman-server
Copy link
Copy Markdown
Collaborator

Changes in this pull request will be available for download in Keyman version 17.0.297-beta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core/ Keyman Core fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(core): crash due to bad pointer (was: 'string too long')

3 participants