Skip to content

New API, BufferedSource.indexOf(ByteString, fromIndex, toIndex)#1626

Merged
swankjesse merged 3 commits intomasterfrom
jwilson.0526.indexOf_toIndex
May 26, 2025
Merged

New API, BufferedSource.indexOf(ByteString, fromIndex, toIndex)#1626
swankjesse merged 3 commits intomasterfrom
jwilson.0526.indexOf_toIndex

Conversation

@swankjesse
Copy link
Copy Markdown
Collaborator

This is surprisingly interesting. To minimize unnecessary reads for toIndex it is necessary to check whether a prefix of the query matches a suffix of the currently-loaded data.

This read-avoidance is useful in practice. When doing HTTP multipart decoding the caller may scan for a boundary separator with a bounded range, and we don't want to block reading when doing so won't impact the result of the call.

This is surprisingly interesting. To minimize unnecessary reads for
toIndex it is necessary to check whether a prefix of the query matches
a suffix of the currently-loaded data.

This read-avoidance is useful in practice. When doing HTTP multipart
decoding the caller may scan for a boundary separator with a bounded
range, and we don't want to block reading when doing so won't impact
the result of the call.
@swankjesse swankjesse requested a review from mpawliszyn May 26, 2025 15:49
val b0 = targetByteArray[0]
val bytesSize = bytes.size
val resultLimit = size - bytesSize + 1L
val resultLimit = minOf(toIndex, size - bytesSize + 1L)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this toIndex -1?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think so. Here toIndex is the exclusive upper bound on the search, and size is the data size. They’re both independent, and you can pass toIndex that is out-of-bounds on the buffer.

source.skip(5L)
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the lookback case?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called it indexOfByteStringLoadsOnlyWhatIsRequiredWhenNotFoundWithFromIndex()


// Load new data if a prefix of 'bytes' matches a suffix of 'buffer'.
val limit = minOf(bytes.size, buffer.size - fromIndex + 1).toInt()
for (i in limit - 1 downTo 1) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth documenting that the complexity relates to the length of bytes now?

Copy link
Copy Markdown
Collaborator

@yschimke yschimke May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess while there are two O(i) loops here (for and rangeEquals), in practice, it's really unlikely to have pathological behaviour on each iteration, so maybe it's on average fine with long strings.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. I think I’d need to do Boyer-Moore to do faster than N*M.

In practice, indexOf() is allowed to do N*M, whether or not we’re doing load avoidance. The simplest such example is searching for xxxxx in a string like xxxxOxxxxOxxxxOxxxxOxxxx.

Copy link
Copy Markdown
Collaborator

@yschimke yschimke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good fit for square/okhttp#8665

But minimal okio review from me.

@swankjesse swankjesse merged commit b4b5fd2 into master May 26, 2025
11 checks passed
@swankjesse swankjesse deleted the jwilson.0526.indexOf_toIndex branch May 26, 2025 20:05
override fun indexOf(bytes: ByteString, fromIndex: Long): Long
override fun indexOf(bytes: ByteString, fromIndex: Long, toIndex: Long): Long
override fun indexOfElement(targetBytes: ByteString): Long
override fun indexOfElement(targetBytes: ByteString, fromIndex: Long): Long
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variant should probably also have an overload with a toIndex for symmetry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants