Skip to content

Conversation

@calvaris
Copy link
Contributor

@calvaris calvaris commented Nov 21, 2025

647785e

[WTF] Fix behavioral issues in searching in spans
https://bugs.webkit.org/show_bug.cgi?id=302925

Reviewed by Darin Adler.

There were several cases where we were not handling properly certain cases when inspecting spans for data and some
codepaths that had different behaviors. Fixed them and improved test coverage. reverseFind lost the starting offset that
was not used anywhere and for coherence with the find counterpart.

Test: Tools/TestWebKitAPI/Tests/WTF/StringCommon.cpp
* Source/WTF/wtf/StdLibExtras.h:
(WTF::find):
* Source/WTF/wtf/text/StringCommon.h:
(WTF::findIgnoringASCIICase):
(WTF::reverseFindInner):
(WTF::reverseFind):
* Tools/TestWebKitAPI/Tests/WTF/StringCommon.cpp:
(TestWebKitAPI::TEST(WTF_StringCommon, Equal)):
(TestWebKitAPI::TEST(WTF_StringCommon, EqualIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, StartsWith)):
(TestWebKitAPI::TEST(WTF_StringCommon, EndsWith)):
(TestWebKitAPI::TEST(WTF_StringCommon, Find)):
(TestWebKitAPI::TEST(WTF_StringCommon, ReverseFind)):
(TestWebKitAPI::TEST(WTF_StringCommon, Contains)):
(TestWebKitAPI::TEST(WTF_StringCommon, StartsWithLettersIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, EndsWithLettersIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, FindIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, ContainsIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, CharactersAreAllASCII)):

Canonical link: https://commits.webkit.org/303965@main

c8834ec

Misc iOS, visionOS, tvOS & watchOS macOS Linux Windows
✅ 🧪 style ✅ 🛠 ios ✅ 🛠 mac ✅ 🛠 wpe ✅ 🛠 win
✅ 🧪 bindings ✅ 🛠 ios-sim ✅ 🛠 mac-AS-debug ✅ 🧪 wpe-wk2 ❌ 🧪 win-tests
✅ 🧪 webkitperl ✅ 🧪 ios-wk2 ✅ 🧪 api-mac ✅ 🧪 api-wpe
✅ 🧪 ios-wk2-wpt ✅ 🧪 api-mac-debug ⏳ 🛠 wpe-cairo
✅ 🛠 🧪 jsc ✅ 🧪 api-ios ✅ 🧪 mac-wk1 ✅ 🛠 gtk
✅ 🛠 🧪 jsc-arm64 ✅ 🛠 vision ✅ 🧪 mac-wk2 ✅ 🧪 gtk-wk2
✅ 🛠 vision-sim ✅ 🧪 mac-AS-debug-wk2 ✅ 🧪 api-gtk
✅ 🛠 🧪 merge ✅ 🧪 vision-wk2 ✅ 🧪 mac-wk2-stress ✅ 🛠 playstation
✅ 🛠 tv ✅ 🧪 mac-intel-wk2 ✅ 🛠 jsc-armv7
✅ 🛠 tv-sim ✅ 🛠 mac-safer-cpp ✅ 🧪 jsc-armv7-tests
✅ 🛠 watch
✅ 🛠 watch-sim

@calvaris calvaris self-assigned this Nov 21, 2025
@calvaris calvaris added the New Bugs Unclassified bugs are placed in this component until the correct component can be determined. label Nov 21, 2025
@calvaris calvaris requested a review from darinadler November 21, 2025 12:25
Copy link
Contributor

@aperezdc aperezdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added logic LGTM; but please check the comments about checkedSum().

@calvaris calvaris marked this pull request as draft November 21, 2025 14:55
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Nov 21, 2025
@calvaris calvaris removed the merging-blocked Applied to prevent a change from being merged label Nov 21, 2025
@calvaris calvaris marked this pull request as ready for review November 21, 2025 22:43
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Nov 21, 2025
@calvaris calvaris removed the merging-blocked Applied to prevent a change from being merged label Nov 22, 2025
Comment on lines 525 to 526
if (matchCharacters.empty())
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is the best value to return if matchCharacters is empty. I think we want to return std::min(startOffset, source.size()) rather than 0. Please add test cases for this, and if you think 0 is better, please explain why.

Suggested change
if (matchCharacters.empty())
return 0;
if (matchCharacters.empty())
return std::min(startOffset, source.size());

But I might suggest that if startOffset is > source.size() we should return notFound, even if we are looking for the empty string. Easiest way to do that is probably to remove the special check for empty string entirely (see my suggestion below). But another way would be to write something like this:

Suggested change
if (matchCharacters.empty())
return 0;
if (matchCharacters.empty())
return startOffset > source.size() ? notFound : startOffset;

Comment on lines 528 to 535
auto difference = checkedDifference<size_t>(source.size(), startOffset);
if (difference.hasOverflowed()) {
ASSERT_NOT_REACHED_WITH_MESSAGE("source size %zu and starting offset %zu difference overflows", source.size(), startOffset);
return notFound;
}

if (difference.value() < matchCharacters.size())
return notFound;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we should do this checking by preflighting. It’s too easy to get it wrong when the check is different from the code below that relies on the check. Instead I think we should incorporate the check into the code below, and take advantage of the clamping that is already done by the subspan function rather than redoing it.

We could either use checkedDifference to compute delta, or just write:

if (startSearchedCharacters.size() < matchCharacters.size())
    return notFound;

But an even better way to write this that I think would be clearer is to eliminate startSearchedCharacters and delta entirely, and not add an explicit check of the length:

for (size_t offset = startOffset; offset < source.size(); ++offset) {
    if (equalIgnoringASCIICaseWithLength(source.subspan(offset), matchCharacters, matchCharacters.size())
        return offset;
}
return notFound;

I don’t see any downside to writing it this more straightforward way, and if we write enough test cases we can make sure this new version is correct. I think we could also consider eliminating the explicit check for empty matchCharacters and just write this:

for (size_t offset = startOffset; offset <= source.size(); ++offset) {
    if (equalIgnoringASCIICaseWithLength(source.subspan(offset), matchCharacters, matchCharacters.size())
        return offset;
}
return notFound;

I’m pretty sure this handles edge cases like startOffset > source.size() correctly without adding any preflights or checked arithmetic, and I would prefer this version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I managed to change it from "pre-flight" to "in-flight". All tests pass.

Comment on lines 845 to 846
if (matchCharacters.empty())
return searchCharacters.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add this? What test fails if we don’t add it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that part.

@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Nov 24, 2025
@calvaris calvaris removed the merging-blocked Applied to prevent a change from being merged label Nov 25, 2025
@calvaris
Copy link
Contributor Author

I think we would be ready for review.

Comment on lines 1076 to 1081
if (haystack.size() < needle.size())
return notFound;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check needed? This is already what memem does when the needle size is greater than the haystack size. Did some test indicate that was not the case on some platform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I can remove this part and leave only the check above.

for (size_t i = 0; i <= delta; ++i) {
if (equalIgnoringASCIICaseWithLength(startSearchedCharacters.subspan(i), matchCharacters, matchCharacters.size()))
return startOffset + i;
for (size_t offset = startOffset; offset <= source.size() && (source.size() - offset) >= matchCharacters.size(); ++offset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like it better without the parentheses:

Suggested change
for (size_t offset = startOffset; offset <= source.size() && (source.size() - offset) >= matchCharacters.size(); ++offset) {
for (size_t offset = startOffset; offset <= source.size() && source.size() - offset >= matchCharacters.size(); ++offset) {

@calvaris
Copy link
Contributor Author

calvaris commented Dec 2, 2025

I'll land this tomorrow in the morning (in Europe)

@calvaris calvaris added the safe-merge-queue Applied to automatically send a pull-request to merge-queue after passing EWS checks label Dec 4, 2025
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Dec 5, 2025
@calvaris calvaris added merge-queue Applied to send a pull request to merge-queue and removed merging-blocked Applied to prevent a change from being merged safe-merge-queue Applied to automatically send a pull-request to merge-queue after passing EWS checks labels Dec 5, 2025
https://bugs.webkit.org/show_bug.cgi?id=302925

Reviewed by Darin Adler.

There were several cases where we were not handling properly certain cases when inspecting spans for data and some
codepaths that had different behaviors. Fixed them and improved test coverage. reverseFind lost the starting offset that
was not used anywhere and for coherence with the find counterpart.

Test: Tools/TestWebKitAPI/Tests/WTF/StringCommon.cpp
* Source/WTF/wtf/StdLibExtras.h:
(WTF::find):
* Source/WTF/wtf/text/StringCommon.h:
(WTF::findIgnoringASCIICase):
(WTF::reverseFindInner):
(WTF::reverseFind):
* Tools/TestWebKitAPI/Tests/WTF/StringCommon.cpp:
(TestWebKitAPI::TEST(WTF_StringCommon, Equal)):
(TestWebKitAPI::TEST(WTF_StringCommon, EqualIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, StartsWith)):
(TestWebKitAPI::TEST(WTF_StringCommon, EndsWith)):
(TestWebKitAPI::TEST(WTF_StringCommon, Find)):
(TestWebKitAPI::TEST(WTF_StringCommon, ReverseFind)):
(TestWebKitAPI::TEST(WTF_StringCommon, Contains)):
(TestWebKitAPI::TEST(WTF_StringCommon, StartsWithLettersIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, EndsWithLettersIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, FindIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, ContainsIgnoringASCIICase)):
(TestWebKitAPI::TEST(WTF_StringCommon, CharactersAreAllASCII)):

Canonical link: https://commits.webkit.org/303965@main
@webkit-commit-queue
Copy link
Collaborator

Committed 303965@main (647785e): https://commits.webkit.org/303965@main

Reviewed commits have been landed. Closing PR #54312 and removing active labels.

@webkit-commit-queue webkit-commit-queue merged commit 647785e into WebKit:main Dec 5, 2025
@webkit-commit-queue webkit-commit-queue removed the merge-queue Applied to send a pull request to merge-queue label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New Bugs Unclassified bugs are placed in this component until the correct component can be determined.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants