LineReader: Reuse temporary buffer to reduce per-line allocation#27782
Merged
kvch merged 4 commits intoelastic:masterfrom Sep 9, 2021
Merged
LineReader: Reuse temporary buffer to reduce per-line allocation#27782kvch merged 4 commits intoelastic:masterfrom
kvch merged 4 commits intoelastic:masterfrom
Conversation
bmoylan
commented
Sep 7, 2021
| @@ -0,0 +1,66 @@ | |||
| package readfile | |||
Contributor
Author
There was a problem hiding this comment.
Not sure if you all have preferences on where/how to include benchmarks. Happy to move/remove this as desired.
Contributor
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
Contributor
|
jenkins run tests |
Contributor
Author
|
Hey @kvch thanks for taking a look! I think I fixed the lint error (missing license header) if we could test again 😄 |
Contributor
|
/test |
Contributor
|
Pinging @elastic/agent (Team:Agent) |
kvch
reviewed
Sep 8, 2021
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
kvch
approved these changes
Sep 8, 2021
Contributor
|
Thank you! |
mergify bot
pushed a commit
that referenced
this pull request
Sep 9, 2021
) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ``` (cherry picked from commit 0e3788b)
mdelapenya
added a commit
to mdelapenya/beats
that referenced
this pull request
Sep 9, 2021
* master: [Auditbeat] scanner honor include_files (elastic#27722) chore(ci): remove not used param when triggering e2e tests (elastic#27823) LineReader: Reuse temporary buffer to reduce per-line allocation (elastic#27782)
kvch
pushed a commit
that referenced
this pull request
Sep 14, 2021
) (#27824) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ``` (cherry picked from commit 0e3788b) Co-authored-by: Brad Moylan <moylan.brad@gmail.com>
Icedroid
pushed a commit
to Icedroid/beats
that referenced
this pull request
Nov 1, 2021
…stic#27782) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Previously, the
LineReaderwould allocate a []byte of sizeconfig.BufferSizebefore decoding each line. The underlying array's size allocation is fixed, sooutBuffer.Appendretains all of it even when the appended bytes are much shorter.With this change, we store a single
tempBuffer []bytewhich is reused across lines anywhere we need temporary storage. Converting tooutBuffer.Writeforces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes.Why is it important?
In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths:
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.Related issues