fix(core): calculate offset correctly when replacing marker in transform (fixes crash)🙀#11071
fix(core): calculate offset correctly when replacing marker in transform (fixes crash)🙀#11071
Conversation
- ldml_test_source had a trim() implementation that was causing asserts in std::isspace() which turned out to be due to negative integers being passed - The actual line that caused trouble was, quote, '«" />'
User Test ResultsTest specification and instructions User tests are not required Test Artifacts
|
| // marker change. | ||
| // We can detect this because the unchanged_prefix will end with u+FFFF U+0008 | ||
| // | ||
| // Oh, and yes, test case 'regex-test-8a-0' hits this. |
There was a problem hiding this comment.
what an embarrasingly proud comment :(
That test case used to hit this. But, I added 8a as a prefix in the output string to distinguish it from the other cases in k_009.
For this fix I took a different tack:
- 1 completely separate isolated file (k_212) with the regression test.
- 1 original-plus-embedded-markup file (k_213) as submitted by the Affected User
Yes, it makes meson take longer to compile. But now there's a more stable regression test.
| // std::isspace chokes on negative (!) ints here, so spare it the trouble if ch is negative | ||
| // (possibly unsigned char to integer cast?) | ||
| s.erase(s.begin(), std::find_if(s.begin(), s.end(), [](int ch) { return (ch < 0) || !std::isspace(ch); })); |
There was a problem hiding this comment.
k_213 contains some random chars. It was a 'scratch' area I was using while editing. This fixes a crash in std::isspace which, surprisingly, is to spec.
There was a problem hiding this comment.
I'm a little concerned that we are missing something more important. How can ch be < 0?
There was a problem hiding this comment.
Codepoint U+00AB in UTF-8 is C2 AB, and 0xC2 as a signed char cast to integer is -62
There was a problem hiding this comment.
clarified, PTAL.
(Won't be hit if we drop K213)
There was a problem hiding this comment.
std::ifstream and std::string by default seem to read in utf-8 as an array of chars.
There was a problem hiding this comment.
see parent comment- i have a commit ready that drops K213 and also removes the lstrip() code, then we can revist that in 18
| ctxt_prefix.first -= 2; | ||
| ctxt_prefix.second += 2; | ||
| ctxt_prefix.first -= 2; // backup the 'mismatch' point to before the FFFF | ||
| ctxt_prefix.second -= 2; // backup the 'mismatch' point to before the FFFF |
There was a problem hiding this comment.
this was the bug here. I think I switched the sign sometime between the initial fix and commit. Coverage would have helped here.
There was a problem hiding this comment.
For future maintainers, it would help if this was a more minimal fixture
There was a problem hiding this comment.
I could drop K213 since we have K212.
| // std::isspace chokes on negative (!) ints here, so spare it the trouble if ch is negative | ||
| // (possibly unsigned char to integer cast?) | ||
| s.erase(s.begin(), std::find_if(s.begin(), s.end(), [](int ch) { return (ch < 0) || !std::isspace(ch); })); |
There was a problem hiding this comment.
I'm a little concerned that we are missing something more important. How can ch be < 0?
- ldml_test_source: clarify UTF-8 processing in ltrim()
|
@mcdurdin I'll drop K213 (non-minimal reproducer), then can remove the ltrim() stuff and deal with that separately from the crasher. |
Sounds good. Perhaps split the optimization into a separate PR too? I think it's better to keep each PR without riders. |
- ldml_test_source fix moved off to issue #11083 - an optimization in ldml_processor also rolled back
|
@mcdurdin done. (Originally I thought the optimization was the fix, hence the history). I'll put this into a separate PR: if(ctxt_prefix.first == old_ctxt.end() && ctxt_prefix.second == new_ctxt.end()) {
return; // No mismatch. We can just exit, there's nothing to do.
} |
|
Changes in this pull request will be available for download in Keyman version 17.0.297-beta |

Fixes: bug(core): 'string too long' #11057
@keymanapp-test-bot skip