lineprotocol: fix Decoder bug when decoding from an io.Reader#51
Merged
Conversation
When we needed more input at the end of the buffer, we were sliding the existing data to the front of the buffer, but ignoring the fact that existing tokens could be pointing to that data and hence overwrite it. Avoid that possibility by sliding data only when we reset the decoder after each entry has been entirely decoded. It's likely that this was the cause of #50, although I haven't managed to recreate the panic. Specifically, in [`Decoder.NextField`](https://github.com/influxdata/line-protocol/blob/e5cb25aaf8da4a0058b516f769b1804713e2de58/lineprotocol/decoder.go#L399-L412), this scenario could have happened: - we decode the field tag - we decode the field value, triggering a read request which corrupts the field tag by overwriting it. - there's an error decoding the value - we calculate the original encoded length of the tag by looking at its data, but the corrupted data now contains a character that needs escaping, which it didn't before, so the calculated length is longer than the original. - this results in us passing a `startIndex` value that's too large to `d.syntaxErrorf` - this results in the index out-of-bounds panic The reason this issue wasn't caught by fuzzing was that we were only fuzzing with `NewDecoderWithBytes`, not with `NewDecoder` itself. Performance seems largely unaffected: ``` name old time/op new time/op delta DecodeEntriesSkipping/long-lines-8 25.4ms ± 1% 25.4ms ± 1% ~ (p=0.421 n=5+5) DecodeEntriesSkipping/long-lines-with-escapes-8 29.6ms ± 0% 29.4ms ± 0% -0.60% (p=0.008 n=5+5) DecodeEntriesSkipping/single-short-line-8 408ns ± 1% 407ns ± 1% ~ (p=0.690 n=5+5) DecodeEntriesSkipping/single-short-line-with-escapes-8 415ns ± 2% 418ns ± 2% ~ (p=0.548 n=5+5) DecodeEntriesSkipping/many-short-lines-8 178ms ± 1% 175ms ± 1% -1.49% (p=0.008 n=5+5) DecodeEntriesSkipping/field-key-escape-not-escapable-8 369ns ± 2% 367ns ± 0% ~ (p=0.690 n=5+5) DecodeEntriesSkipping/tag-value-triple-escape-space-8 447ns ± 2% 442ns ± 0% ~ (p=0.151 n=5+5) DecodeEntriesSkipping/procstat-8 3.43µs ± 2% 3.35µs ± 1% -2.29% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/long-lines-8 25.4ms ± 1% 25.2ms ± 0% -0.68% (p=0.016 n=5+5) DecodeEntriesWithoutSkipping/long-lines-with-escapes-8 101ms ± 1% 101ms ± 0% ~ (p=0.310 n=5+5) DecodeEntriesWithoutSkipping/single-short-line-8 442ns ± 2% 438ns ± 1% ~ (p=0.310 n=5+5) DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8 467ns ± 2% 465ns ± 2% ~ (p=1.000 n=5+5) DecodeEntriesWithoutSkipping/many-short-lines-8 205ms ± 2% 207ms ± 0% ~ (p=0.222 n=5+5) DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8 516ns ± 6% 420ns ± 2% -18.60% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8 586ns ± 1% 510ns ± 1% -13.05% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/procstat-8 5.76µs ± 2% 6.23µs ± 0% +8.11% (p=0.016 n=5+4) name old speed new speed delta DecodeEntriesSkipping/long-lines-8 1.03GB/s ± 1% 1.03GB/s ± 1% ~ (p=0.421 n=5+5) DecodeEntriesSkipping/long-lines-with-escapes-8 886MB/s ± 0% 891MB/s ± 0% +0.61% (p=0.008 n=5+5) DecodeEntriesSkipping/single-short-line-8 71.0MB/s ± 1% 71.2MB/s ± 1% ~ (p=0.690 n=5+5) DecodeEntriesSkipping/single-short-line-with-escapes-8 77.2MB/s ± 2% 76.6MB/s ± 2% ~ (p=0.548 n=5+5) DecodeEntriesSkipping/many-short-lines-8 147MB/s ± 1% 150MB/s ± 1% +1.52% (p=0.008 n=5+5) DecodeEntriesSkipping/field-key-escape-not-escapable-8 89.4MB/s ± 2% 90.0MB/s ± 0% ~ (p=0.690 n=5+5) DecodeEntriesSkipping/tag-value-triple-escape-space-8 112MB/s ± 2% 113MB/s ± 0% ~ (p=0.151 n=5+5) DecodeEntriesSkipping/procstat-8 387MB/s ± 1% 396MB/s ± 1% +2.34% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/long-lines-8 1.03GB/s ± 1% 1.04GB/s ± 0% +0.68% (p=0.016 n=5+5) DecodeEntriesWithoutSkipping/long-lines-with-escapes-8 259MB/s ± 1% 261MB/s ± 0% ~ (p=0.286 n=5+5) DecodeEntriesWithoutSkipping/single-short-line-8 65.6MB/s ± 2% 66.2MB/s ± 1% ~ (p=0.310 n=5+5) DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8 68.5MB/s ± 2% 68.9MB/s ± 2% ~ (p=1.000 n=5+5) DecodeEntriesWithoutSkipping/many-short-lines-8 128MB/s ± 2% 126MB/s ± 0% ~ (p=0.222 n=5+5) DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8 64.1MB/s ± 6% 78.6MB/s ± 2% +22.60% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8 85.3MB/s ± 1% 98.1MB/s ± 1% +15.02% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/procstat-8 230MB/s ± 2% 213MB/s ± 0% -7.51% (p=0.016 n=5+4) name old alloc/op new alloc/op delta DecodeEntriesSkipping/long-lines-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/long-lines-with-escapes-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/single-short-line-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/single-short-line-with-escapes-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/many-short-lines-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/field-key-escape-not-escapable-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/tag-value-triple-escape-space-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/procstat-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/long-lines-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/long-lines-with-escapes-8 17.4kB ± 0% 17.4kB ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/single-short-line-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/many-short-lines-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8 512B ± 0% 514B ± 0% +0.39% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8 512B ± 0% 514B ± 0% +0.39% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/procstat-8 512B ± 0% 784B ± 0% +53.12% (p=0.008 n=5+5) name old allocs/op new allocs/op delta DecodeEntriesSkipping/long-lines-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/long-lines-with-escapes-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/single-short-line-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/single-short-line-with-escapes-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/many-short-lines-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/field-key-escape-not-escapable-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/tag-value-triple-escape-space-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/procstat-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/long-lines-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/long-lines-with-escapes-8 7.00 ± 0% 7.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/single-short-line-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/many-short-lines-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8 1.00 ± 0% 2.00 ± 0% +100.00% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8 1.00 ± 0% 2.00 ± 0% +100.00% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/procstat-8 1.00 ± 0% 30.00 ± 0% +2900.00% (p=0.008 n=5+5) ```
Contributor
|
nice find! how did you locate it in the end? |
Contributor
Author
One of the end-to-end Flux tests was failing. I isolated the data that caused the problem by running every write through both decoders and checking when the results differed. Then I realised that the issue didn't happen when using |
andrewcharlton
approved these changes
Nov 24, 2021
This was referenced Nov 29, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When we needed more input at the end of the buffer, we were sliding
the existing data to the front of the buffer, but ignoring the fact that
existing tokens could be pointing to that data and hence overwrite it.
Avoid that possibility by sliding data only when we reset the decoder
after each entry has been entirely decoded.
It's likely that this was the cause of #50,
although I haven't managed to recreate the panic. Specifically, in
Decoder.NextField,this scenario could have happened:
startIndexvalue that's too large tod.syntaxErrorfThe reason this issue wasn't caught by fuzzing was that we were only fuzzing with
NewDecoderWithBytes, not withNewDecoderitself.Performance seems largely unaffected: