lineprotocol: fix Decoder bug when decoding from an io.Reader by rogpeppe · Pull Request #51 · influxdata/line-protocol

rogpeppe · 2021-11-23T18:21:58Z

When we needed more input at the end of the buffer, we were sliding
the existing data to the front of the buffer, but ignoring the fact that
existing tokens could be pointing to that data and hence overwrite it.

Avoid that possibility by sliding data only when we reset the decoder
after each entry has been entirely decoded.

It's likely that this was the cause of #50,
although I haven't managed to recreate the panic. Specifically, in Decoder.NextField,
this scenario could have happened:

we decode the field tag
we decode the field value, triggering a read request which corrupts the field tag by overwriting it.
there's an error decoding the value
we calculate the original encoded length of the tag by looking at its data, but the corrupted data now contains a character that needs escaping, which it didn't before, so the calculated length is longer than the original.
this results in us passing a startIndex value that's too large to d.syntaxErrorf
this results in the index out-of-bounds panic

The reason this issue wasn't caught by fuzzing was that we were only fuzzing with
NewDecoderWithBytes, not with NewDecoder itself.

Performance seems largely unaffected:

name                                                           old time/op    new time/op    delta
DecodeEntriesSkipping/long-lines-8                               25.4ms ± 1%    25.4ms ± 1%       ~     (p=0.421 n=5+5)
DecodeEntriesSkipping/long-lines-with-escapes-8                  29.6ms ± 0%    29.4ms ± 0%     -0.60%  (p=0.008 n=5+5)
DecodeEntriesSkipping/single-short-line-8                         408ns ± 1%     407ns ± 1%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/single-short-line-with-escapes-8            415ns ± 2%     418ns ± 2%       ~     (p=0.548 n=5+5)
DecodeEntriesSkipping/many-short-lines-8                          178ms ± 1%     175ms ± 1%     -1.49%  (p=0.008 n=5+5)
DecodeEntriesSkipping/field-key-escape-not-escapable-8            369ns ± 2%     367ns ± 0%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/tag-value-triple-escape-space-8             447ns ± 2%     442ns ± 0%       ~     (p=0.151 n=5+5)
DecodeEntriesSkipping/procstat-8                                 3.43µs ± 2%    3.35µs ± 1%     -2.29%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-8                        25.4ms ± 1%    25.2ms ± 0%     -0.68%  (p=0.016 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8            101ms ± 1%     101ms ± 0%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-8                  442ns ± 2%     438ns ± 1%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8     467ns ± 2%     465ns ± 2%       ~     (p=1.000 n=5+5)
DecodeEntriesWithoutSkipping/many-short-lines-8                   205ms ± 2%     207ms ± 0%       ~     (p=0.222 n=5+5)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8     516ns ± 6%     420ns ± 2%    -18.60%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8      586ns ± 1%     510ns ± 1%    -13.05%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                          5.76µs ± 2%    6.23µs ± 0%     +8.11%  (p=0.016 n=5+4)

name                                                           old speed      new speed      delta
DecodeEntriesSkipping/long-lines-8                             1.03GB/s ± 1%  1.03GB/s ± 1%       ~     (p=0.421 n=5+5)
DecodeEntriesSkipping/long-lines-with-escapes-8                 886MB/s ± 0%   891MB/s ± 0%     +0.61%  (p=0.008 n=5+5)
DecodeEntriesSkipping/single-short-line-8                      71.0MB/s ± 1%  71.2MB/s ± 1%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/single-short-line-with-escapes-8         77.2MB/s ± 2%  76.6MB/s ± 2%       ~     (p=0.548 n=5+5)
DecodeEntriesSkipping/many-short-lines-8                        147MB/s ± 1%   150MB/s ± 1%     +1.52%  (p=0.008 n=5+5)
DecodeEntriesSkipping/field-key-escape-not-escapable-8         89.4MB/s ± 2%  90.0MB/s ± 0%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/tag-value-triple-escape-space-8           112MB/s ± 2%   113MB/s ± 0%       ~     (p=0.151 n=5+5)
DecodeEntriesSkipping/procstat-8                                387MB/s ± 1%   396MB/s ± 1%     +2.34%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-8                      1.03GB/s ± 1%  1.04GB/s ± 0%     +0.68%  (p=0.016 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8          259MB/s ± 1%   261MB/s ± 0%       ~     (p=0.286 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-8               65.6MB/s ± 2%  66.2MB/s ± 1%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8  68.5MB/s ± 2%  68.9MB/s ± 2%       ~     (p=1.000 n=5+5)
DecodeEntriesWithoutSkipping/many-short-lines-8                 128MB/s ± 2%   126MB/s ± 0%       ~     (p=0.222 n=5+5)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8  64.1MB/s ± 6%  78.6MB/s ± 2%    +22.60%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8   85.3MB/s ± 1%  98.1MB/s ± 1%    +15.02%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                         230MB/s ± 2%   213MB/s ± 0%     -7.51%  (p=0.016 n=5+4)

name                                                           old alloc/op   new alloc/op   delta
DecodeEntriesSkipping/long-lines-8                                 512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/long-lines-with-escapes-8                    512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-8                          512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-with-escapes-8             512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/many-short-lines-8                           512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/field-key-escape-not-escapable-8             512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/tag-value-triple-escape-space-8              512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/procstat-8                                   512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-8                          512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8           17.4kB ± 0%    17.4kB ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-8                   512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8      512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/many-short-lines-8                    512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8      512B ± 0%      514B ± 0%     +0.39%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8       512B ± 0%      514B ± 0%     +0.39%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                            512B ± 0%      784B ± 0%    +53.12%  (p=0.008 n=5+5)

name                                                           old allocs/op  new allocs/op  delta
DecodeEntriesSkipping/long-lines-8                                 1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/long-lines-with-escapes-8                    1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-8                          1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-with-escapes-8             1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/many-short-lines-8                           1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/field-key-escape-not-escapable-8             1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/tag-value-triple-escape-space-8              1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/procstat-8                                   1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-8                          1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8             7.00 ± 0%      7.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-8                   1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8      1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/many-short-lines-8                    1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8      1.00 ± 0%      2.00 ± 0%   +100.00%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8       1.00 ± 0%      2.00 ± 0%   +100.00%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                            1.00 ± 0%     30.00 ± 0%  +2900.00%  (p=0.008 n=5+5)

When we needed more input at the end of the buffer, we were sliding the existing data to the front of the buffer, but ignoring the fact that existing tokens could be pointing to that data and hence overwrite it. Avoid that possibility by sliding data only when we reset the decoder after each entry has been entirely decoded. It's likely that this was the cause of #50, although I haven't managed to recreate the panic. Specifically, in [`Decoder.NextField`](https://github.com/influxdata/line-protocol/blob/e5cb25aaf8da4a0058b516f769b1804713e2de58/lineprotocol/decoder.go#L399-L412), this scenario could have happened: - we decode the field tag - we decode the field value, triggering a read request which corrupts the field tag by overwriting it. - there's an error decoding the value - we calculate the original encoded length of the tag by looking at its data, but the corrupted data now contains a character that needs escaping, which it didn't before, so the calculated length is longer than the original. - this results in us passing a `startIndex` value that's too large to `d.syntaxErrorf` - this results in the index out-of-bounds panic The reason this issue wasn't caught by fuzzing was that we were only fuzzing with `NewDecoderWithBytes`, not with `NewDecoder` itself. Performance seems largely unaffected: ``` name old time/op new time/op delta DecodeEntriesSkipping/long-lines-8 25.4ms ± 1% 25.4ms ± 1% ~ (p=0.421 n=5+5) DecodeEntriesSkipping/long-lines-with-escapes-8 29.6ms ± 0% 29.4ms ± 0% -0.60% (p=0.008 n=5+5) DecodeEntriesSkipping/single-short-line-8 408ns ± 1% 407ns ± 1% ~ (p=0.690 n=5+5) DecodeEntriesSkipping/single-short-line-with-escapes-8 415ns ± 2% 418ns ± 2% ~ (p=0.548 n=5+5) DecodeEntriesSkipping/many-short-lines-8 178ms ± 1% 175ms ± 1% -1.49% (p=0.008 n=5+5) DecodeEntriesSkipping/field-key-escape-not-escapable-8 369ns ± 2% 367ns ± 0% ~ (p=0.690 n=5+5) DecodeEntriesSkipping/tag-value-triple-escape-space-8 447ns ± 2% 442ns ± 0% ~ (p=0.151 n=5+5) DecodeEntriesSkipping/procstat-8 3.43µs ± 2% 3.35µs ± 1% -2.29% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/long-lines-8 25.4ms ± 1% 25.2ms ± 0% -0.68% (p=0.016 n=5+5) DecodeEntriesWithoutSkipping/long-lines-with-escapes-8 101ms ± 1% 101ms ± 0% ~ (p=0.310 n=5+5) DecodeEntriesWithoutSkipping/single-short-line-8 442ns ± 2% 438ns ± 1% ~ (p=0.310 n=5+5) DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8 467ns ± 2% 465ns ± 2% ~ (p=1.000 n=5+5) DecodeEntriesWithoutSkipping/many-short-lines-8 205ms ± 2% 207ms ± 0% ~ (p=0.222 n=5+5) DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8 516ns ± 6% 420ns ± 2% -18.60% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8 586ns ± 1% 510ns ± 1% -13.05% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/procstat-8 5.76µs ± 2% 6.23µs ± 0% +8.11% (p=0.016 n=5+4) name old speed new speed delta DecodeEntriesSkipping/long-lines-8 1.03GB/s ± 1% 1.03GB/s ± 1% ~ (p=0.421 n=5+5) DecodeEntriesSkipping/long-lines-with-escapes-8 886MB/s ± 0% 891MB/s ± 0% +0.61% (p=0.008 n=5+5) DecodeEntriesSkipping/single-short-line-8 71.0MB/s ± 1% 71.2MB/s ± 1% ~ (p=0.690 n=5+5) DecodeEntriesSkipping/single-short-line-with-escapes-8 77.2MB/s ± 2% 76.6MB/s ± 2% ~ (p=0.548 n=5+5) DecodeEntriesSkipping/many-short-lines-8 147MB/s ± 1% 150MB/s ± 1% +1.52% (p=0.008 n=5+5) DecodeEntriesSkipping/field-key-escape-not-escapable-8 89.4MB/s ± 2% 90.0MB/s ± 0% ~ (p=0.690 n=5+5) DecodeEntriesSkipping/tag-value-triple-escape-space-8 112MB/s ± 2% 113MB/s ± 0% ~ (p=0.151 n=5+5) DecodeEntriesSkipping/procstat-8 387MB/s ± 1% 396MB/s ± 1% +2.34% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/long-lines-8 1.03GB/s ± 1% 1.04GB/s ± 0% +0.68% (p=0.016 n=5+5) DecodeEntriesWithoutSkipping/long-lines-with-escapes-8 259MB/s ± 1% 261MB/s ± 0% ~ (p=0.286 n=5+5) DecodeEntriesWithoutSkipping/single-short-line-8 65.6MB/s ± 2% 66.2MB/s ± 1% ~ (p=0.310 n=5+5) DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8 68.5MB/s ± 2% 68.9MB/s ± 2% ~ (p=1.000 n=5+5) DecodeEntriesWithoutSkipping/many-short-lines-8 128MB/s ± 2% 126MB/s ± 0% ~ (p=0.222 n=5+5) DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8 64.1MB/s ± 6% 78.6MB/s ± 2% +22.60% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8 85.3MB/s ± 1% 98.1MB/s ± 1% +15.02% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/procstat-8 230MB/s ± 2% 213MB/s ± 0% -7.51% (p=0.016 n=5+4) name old alloc/op new alloc/op delta DecodeEntriesSkipping/long-lines-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/long-lines-with-escapes-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/single-short-line-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/single-short-line-with-escapes-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/many-short-lines-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/field-key-escape-not-escapable-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/tag-value-triple-escape-space-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesSkipping/procstat-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/long-lines-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/long-lines-with-escapes-8 17.4kB ± 0% 17.4kB ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/single-short-line-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/many-short-lines-8 512B ± 0% 512B ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8 512B ± 0% 514B ± 0% +0.39% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8 512B ± 0% 514B ± 0% +0.39% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/procstat-8 512B ± 0% 784B ± 0% +53.12% (p=0.008 n=5+5) name old allocs/op new allocs/op delta DecodeEntriesSkipping/long-lines-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/long-lines-with-escapes-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/single-short-line-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/single-short-line-with-escapes-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/many-short-lines-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/field-key-escape-not-escapable-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/tag-value-triple-escape-space-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesSkipping/procstat-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/long-lines-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/long-lines-with-escapes-8 7.00 ± 0% 7.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/single-short-line-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/many-short-lines-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8 1.00 ± 0% 2.00 ± 0% +100.00% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8 1.00 ± 0% 2.00 ± 0% +100.00% (p=0.008 n=5+5) DecodeEntriesWithoutSkipping/procstat-8 1.00 ± 0% 30.00 ± 0% +2900.00% (p=0.008 n=5+5) ```

philjb · 2021-11-23T18:35:06Z

nice find! how did you locate it in the end?

rogpeppe · 2021-11-23T18:44:13Z

@philjb

nice find! how did you locate it in the end?

One of the end-to-end Flux tests was failing. I isolated the data that caused the problem by running every write through both decoders and checking when the results differed. Then I realised that the issue didn't happen when using NewDecoderWithBytes which led swiftly to the problem area.

rogpeppe requested a review from a team as a code owner November 23, 2021 18:21

rogpeppe requested review from andrewcharlton and removed request for a team November 23, 2021 18:21

andrewcharlton approved these changes Nov 24, 2021

View reviewed changes

rogpeppe merged commit 8d23ab8 into v2 Nov 24, 2021

This was referenced Nov 29, 2021

Panic in syntax error handling #50

Closed

feat: use new Influx Line Protocol Parser influxdata/telegraf#9871

Closed

jacobmarble deleted the rog-037-fix-read-bug branch April 19, 2022 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lineprotocol: fix Decoder bug when decoding from an io.Reader#51

lineprotocol: fix Decoder bug when decoding from an io.Reader#51
rogpeppe merged 1 commit intov2from
rog-037-fix-read-bug

rogpeppe commented Nov 23, 2021

Uh oh!

philjb commented Nov 23, 2021

Uh oh!

rogpeppe commented Nov 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rogpeppe commented Nov 23, 2021

Uh oh!

philjb commented Nov 23, 2021

Uh oh!

rogpeppe commented Nov 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants