Skip to content

Fix poor buffering case for MultipartReader#8665

Merged
yschimke merged 10 commits intosquare:masterfrom
yschimke:add_test
Feb 17, 2025
Merged

Fix poor buffering case for MultipartReader#8665
yschimke merged 10 commits intosquare:masterfrom
yschimke:add_test

Conversation

@yschimke
Copy link
Copy Markdown
Collaborator

100mb Body is fully written, reader wants 1MB chunks, but get 8kb at a time. Unfortunately internally we do indexOf over the full 100mb.

outer reading 1048576
indexOf over 104857611 with maxResult 8192
inner 8192
outer read 8192
outer reading 1048576
indexOf over 104849419 with maxResult 8192
inner 8192
outer read 8192

@yschimke yschimke requested a review from squarejesse January 30, 2025 08:37
@yschimke
Copy link
Copy Markdown
Collaborator Author

@swankjesse any suggestions for the right way to test and fix this?

@yschimke
Copy link
Copy Markdown
Collaborator Author

yschimke commented Jan 30, 2025

I'm not sure if this is actually a bug in Okio RealBufferedSource that it assumes that checking segment size is optimal?

internal inline fun RealBufferedSource.commonRead(sink: Buffer, byteCount: Long): Long {
  require(byteCount >= 0L) { "byteCount < 0: $byteCount" }
  check(!closed) { "closed" }

  if (buffer.size == 0L) {
    if (byteCount == 0L) return 0L
    val read = source.read(buffer, Segment.SIZE.toLong())
    if (read == -1L) return -1L
  }

  val toRead = minOf(byteCount, buffer.size)
  return buffer.read(sink, toRead)
}

Also if we do want to scan 8kb at a time, how can we limit the indexOf?

    private fun currentPartBytesRemaining(maxResult: Long): Long {
      source.require(crlfDashDashBoundary.size.toLong())

      return when (val delimiterIndex = source.buffer.indexOf(crlfDashDashBoundary)) {
        -1L -> minOf(maxResult, source.buffer.size - crlfDashDashBoundary.size + 1)
        else -> minOf(maxResult, delimiterIndex)
      }
    }

Do we need this added to Buffer?

  override fun indexOf(bytes: ByteString, fromIndex: Long, toIndex: Long): Long

@yschimke yschimke requested a review from JakeWharton February 1, 2025 13:05
@yschimke yschimke marked this pull request as draft February 1, 2025 13:05
@yschimke yschimke changed the title Demonstrate poor buffering case for MultipartReader Fix poor buffering case for MultipartReader Feb 15, 2025
@yschimke yschimke requested review from swankjesse and removed request for squarejesse February 15, 2025 17:15
@yschimke
Copy link
Copy Markdown
Collaborator Author

500ms with the fix
I stopped after 2 minutes without the fix

@yschimke yschimke marked this pull request as ready for review February 15, 2025 17:17
@yschimke
Copy link
Copy Markdown
Collaborator Author

@swankjesse requesting post review (as agreed)

@yschimke yschimke merged commit 3cc87c3 into square:master Feb 17, 2025
20 checks passed
yschimke added a commit to yschimke/okhttp that referenced this pull request Feb 17, 2025
* Demonstrate poor buffering case
* Fix for repeated reads of small byteCount from large part

(cherry picked from commit 3cc87c3)
yschimke added a commit that referenced this pull request Feb 17, 2025
* Demonstrate poor buffering case
* Fix for repeated reads of small byteCount from large part

(cherry picked from commit 3cc87c3)
yschimke added a commit that referenced this pull request Feb 17, 2025
* Demonstrate poor buffering case
* Fix for repeated reads of small byteCount from large part

(cherry picked from commit 3cc87c3)
-1L -> minOf(maxResult, source.buffer.size - crlfDashDashBoundary.size + 1)
// Avoid indexOf scanning repeatedly over the entire source by using limit
// Since maxResult could be midway through the boundary, read further to be safe.
val limitSource = source.peek().limit(maxResult + crlfDashDashBoundary.size).buffer()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I regret that BufferedSource doesn’t have an API that does what you want without needing to peek() and limit()

@JvmOverloads
internal fun Source.limit(
byteCount: Long,
onReadExhausted: (eof: Boolean) -> Unit = {},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s this for?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Donation of the impl from @JakeWharton

square/okio@2e9df10

With the idea that we could switch to a future Okio one at some point in the future.

assertThat(part.headers["header-name"]).isEqualTo("header-value")
while (true) {
val readBuff = Buffer()
val read = part.body.read(readBuff, (1024).toLong())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a weird way to write 1024L

}

@Test
fun `reading a large part with small byteCount`() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants