-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[WTF] Fix behavioral issues in searching in spans #54312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
EWS run on previous version of this PR (hash e92b220) Details |
aperezdc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added logic LGTM; but please check the comments about checkedSum().
|
EWS run on previous version of this PR (hash 53cbb1d) Details |
|
EWS run on previous version of this PR (hash 8b80d51) Details |
Source/WTF/wtf/text/StringCommon.h
Outdated
| if (matchCharacters.empty()) | ||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think this is the best value to return if matchCharacters is empty. I think we want to return std::min(startOffset, source.size()) rather than 0. Please add test cases for this, and if you think 0 is better, please explain why.
| if (matchCharacters.empty()) | |
| return 0; | |
| if (matchCharacters.empty()) | |
| return std::min(startOffset, source.size()); |
But I might suggest that if startOffset is > source.size() we should return notFound, even if we are looking for the empty string. Easiest way to do that is probably to remove the special check for empty string entirely (see my suggestion below). But another way would be to write something like this:
| if (matchCharacters.empty()) | |
| return 0; | |
| if (matchCharacters.empty()) | |
| return startOffset > source.size() ? notFound : startOffset; |
Source/WTF/wtf/text/StringCommon.h
Outdated
| auto difference = checkedDifference<size_t>(source.size(), startOffset); | ||
| if (difference.hasOverflowed()) { | ||
| ASSERT_NOT_REACHED_WITH_MESSAGE("source size %zu and starting offset %zu difference overflows", source.size(), startOffset); | ||
| return notFound; | ||
| } | ||
|
|
||
| if (difference.value() < matchCharacters.size()) | ||
| return notFound; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we should do this checking by preflighting. It’s too easy to get it wrong when the check is different from the code below that relies on the check. Instead I think we should incorporate the check into the code below, and take advantage of the clamping that is already done by the subspan function rather than redoing it.
We could either use checkedDifference to compute delta, or just write:
if (startSearchedCharacters.size() < matchCharacters.size())
return notFound;
But an even better way to write this that I think would be clearer is to eliminate startSearchedCharacters and delta entirely, and not add an explicit check of the length:
for (size_t offset = startOffset; offset < source.size(); ++offset) {
if (equalIgnoringASCIICaseWithLength(source.subspan(offset), matchCharacters, matchCharacters.size())
return offset;
}
return notFound;
I don’t see any downside to writing it this more straightforward way, and if we write enough test cases we can make sure this new version is correct. I think we could also consider eliminating the explicit check for empty matchCharacters and just write this:
for (size_t offset = startOffset; offset <= source.size(); ++offset) {
if (equalIgnoringASCIICaseWithLength(source.subspan(offset), matchCharacters, matchCharacters.size())
return offset;
}
return notFound;
I’m pretty sure this handles edge cases like startOffset > source.size() correctly without adding any preflights or checked arithmetic, and I would prefer this version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I managed to change it from "pre-flight" to "in-flight". All tests pass.
Source/WTF/wtf/text/StringCommon.h
Outdated
| if (matchCharacters.empty()) | ||
| return searchCharacters.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to add this? What test fails if we don’t add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that part.
|
EWS run on previous version of this PR (hash 680d690) Details |
|
EWS run on previous version of this PR (hash e22a574) Details |
|
I think we would be ready for review. |
| if (haystack.size() < needle.size()) | ||
| return notFound; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this check needed? This is already what memem does when the needle size is greater than the haystack size. Did some test indicate that was not the case on some platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I can remove this part and leave only the check above.
Source/WTF/wtf/text/StringCommon.h
Outdated
| for (size_t i = 0; i <= delta; ++i) { | ||
| if (equalIgnoringASCIICaseWithLength(startSearchedCharacters.subspan(i), matchCharacters, matchCharacters.size())) | ||
| return startOffset + i; | ||
| for (size_t offset = startOffset; offset <= source.size() && (source.size() - offset) >= matchCharacters.size(); ++offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally like it better without the parentheses:
| for (size_t offset = startOffset; offset <= source.size() && (source.size() - offset) >= matchCharacters.size(); ++offset) { | |
| for (size_t offset = startOffset; offset <= source.size() && source.size() - offset >= matchCharacters.size(); ++offset) { |
|
EWS run on previous version of this PR (hash 6dd739e) Details |
|
I'll land this tomorrow in the morning (in Europe) |
|
EWS run on current version of this PR (hash c8834ec) Details |
https://bugs.webkit.org/show_bug.cgi?id=302925 Reviewed by Darin Adler. There were several cases where we were not handling properly certain cases when inspecting spans for data and some codepaths that had different behaviors. Fixed them and improved test coverage. reverseFind lost the starting offset that was not used anywhere and for coherence with the find counterpart. Test: Tools/TestWebKitAPI/Tests/WTF/StringCommon.cpp * Source/WTF/wtf/StdLibExtras.h: (WTF::find): * Source/WTF/wtf/text/StringCommon.h: (WTF::findIgnoringASCIICase): (WTF::reverseFindInner): (WTF::reverseFind): * Tools/TestWebKitAPI/Tests/WTF/StringCommon.cpp: (TestWebKitAPI::TEST(WTF_StringCommon, Equal)): (TestWebKitAPI::TEST(WTF_StringCommon, EqualIgnoringASCIICase)): (TestWebKitAPI::TEST(WTF_StringCommon, StartsWith)): (TestWebKitAPI::TEST(WTF_StringCommon, EndsWith)): (TestWebKitAPI::TEST(WTF_StringCommon, Find)): (TestWebKitAPI::TEST(WTF_StringCommon, ReverseFind)): (TestWebKitAPI::TEST(WTF_StringCommon, Contains)): (TestWebKitAPI::TEST(WTF_StringCommon, StartsWithLettersIgnoringASCIICase)): (TestWebKitAPI::TEST(WTF_StringCommon, EndsWithLettersIgnoringASCIICase)): (TestWebKitAPI::TEST(WTF_StringCommon, FindIgnoringASCIICase)): (TestWebKitAPI::TEST(WTF_StringCommon, ContainsIgnoringASCIICase)): (TestWebKitAPI::TEST(WTF_StringCommon, CharactersAreAllASCII)): Canonical link: https://commits.webkit.org/303965@main
c8834ec to
647785e
Compare
|
Committed 303965@main (647785e): https://commits.webkit.org/303965@main Reviewed commits have been landed. Closing PR #54312 and removing active labels. |
🧪 services
647785e
c8834ec