New API, BufferedSource.indexOf(ByteString, fromIndex, toIndex) by swankjesse · Pull Request #1626 · square/okio

swankjesse · 2025-05-26T15:35:12Z

This is surprisingly interesting. To minimize unnecessary reads for toIndex it is necessary to check whether a prefix of the query matches a suffix of the currently-loaded data.

This read-avoidance is useful in practice. When doing HTTP multipart decoding the caller may scan for a boundary separator with a bounded range, and we don't want to block reading when doing so won't impact the result of the call.

This is surprisingly interesting. To minimize unnecessary reads for toIndex it is necessary to check whether a prefix of the query matches a suffix of the currently-loaded data. This read-avoidance is useful in practice. When doing HTTP multipart decoding the caller may scan for a boundary separator with a bounded range, and we don't want to block reading when doing so won't impact the result of the call.

mpawliszyn · 2025-05-26T15:42:27Z

okio/src/commonMain/kotlin/okio/internal/Buffer.kt

    val b0 = targetByteArray[0]
    val bytesSize = bytes.size
-    val resultLimit = size - bytesSize + 1L
+    val resultLimit = minOf(toIndex, size - bytesSize + 1L)


Is this toIndex -1?

I don’t think so. Here toIndex is the exclusive upper bound on the search, and size is the data size. They’re both independent, and you can pass toIndex that is out-of-bounds on the buffer.

mpawliszyn · 2025-05-26T15:54:56Z

okio/src/jvmTest/kotlin/okio/BufferedSourceTest.kt

+      source.skip(5L)
+    }
+  }
+


What about the lookback case?

I called it indexOfByteStringLoadsOnlyWhatIsRequiredWhenNotFoundWithFromIndex()

okio/src/commonMain/kotlin/okio/internal/RealBufferedSource.kt

yschimke · 2025-05-26T16:15:58Z

okio/src/commonMain/kotlin/okio/internal/RealBufferedSource.kt

+
+  // Load new data if a prefix of 'bytes' matches a suffix of 'buffer'.
+  val limit = minOf(bytes.size, buffer.size - fromIndex + 1).toInt()
+  for (i in limit - 1 downTo 1) {


Is it worth documenting that the complexity relates to the length of bytes now?

I guess while there are two O(i) loops here (for and rangeEquals), in practice, it's really unlikely to have pathological behaviour on each iteration, so maybe it's on average fine with long strings.

Yeah, good point. I think I’d need to do Boyer-Moore to do faster than N*M.

In practice, indexOf() is allowed to do N*M, whether or not we’re doing load avoidance. The simplest such example is searching for xxxxx in a string like xxxxOxxxxOxxxxOxxxxOxxxx.

yschimke

Looks like a good fit for square/okhttp#8665

But minimal okio review from me.

JakeWharton · 2025-05-27T14:19:19Z

okio/src/commonMain/kotlin/okio/RealBufferedSource.kt

  override fun indexOf(bytes: ByteString, fromIndex: Long): Long
+  override fun indexOf(bytes: ByteString, fromIndex: Long, toIndex: Long): Long
  override fun indexOfElement(targetBytes: ByteString): Long
  override fun indexOfElement(targetBytes: ByteString, fromIndex: Long): Long


This variant should probably also have an overload with a toIndex for symmetry.

swankjesse requested review from JakeWharton and yschimke May 26, 2025 15:35

apiDump

ad3fd73

swankjesse requested a review from mpawliszyn May 26, 2025 15:49

mpawliszyn reviewed May 26, 2025

View reviewed changes

yschimke reviewed May 26, 2025

View reviewed changes

okio/src/commonMain/kotlin/okio/internal/RealBufferedSource.kt Show resolved Hide resolved

yschimke reviewed May 26, 2025

View reviewed changes

yschimke approved these changes May 26, 2025

View reviewed changes

Check both maximum and minimum prefix sizes

62852b9

mpawliszyn approved these changes May 26, 2025

View reviewed changes

swankjesse merged commit b4b5fd2 into master May 26, 2025
11 checks passed

swankjesse deleted the jwilson.0526.indexOf_toIndex branch May 26, 2025 20:05

JakeWharton reviewed May 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New API, BufferedSource.indexOf(ByteString, fromIndex, toIndex)#1626

New API, BufferedSource.indexOf(ByteString, fromIndex, toIndex)#1626
swankjesse merged 3 commits intomasterfrom
jwilson.0526.indexOf_toIndex

swankjesse commented May 26, 2025

Uh oh!

mpawliszyn May 26, 2025

Uh oh!

swankjesse May 26, 2025

Uh oh!

mpawliszyn May 26, 2025

Uh oh!

swankjesse May 26, 2025

Uh oh!

Uh oh!

yschimke May 26, 2025

Uh oh!

yschimke May 26, 2025 •

edited

Loading

Uh oh!

swankjesse May 26, 2025

Uh oh!

yschimke left a comment

Uh oh!

Uh oh!

JakeWharton May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

swankjesse commented May 26, 2025

Uh oh!

mpawliszyn May 26, 2025

Choose a reason for hiding this comment

Uh oh!

swankjesse May 26, 2025

Choose a reason for hiding this comment

Uh oh!

mpawliszyn May 26, 2025

Choose a reason for hiding this comment

Uh oh!

swankjesse May 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yschimke May 26, 2025

Choose a reason for hiding this comment

Uh oh!

yschimke May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swankjesse May 26, 2025

Choose a reason for hiding this comment

Uh oh!

yschimke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JakeWharton May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yschimke May 26, 2025 •

edited

Loading