Skip to content

lineprotocol: fix Decoder bug when decoding from an io.Reader#51

Merged
rogpeppe merged 1 commit intov2from
rog-037-fix-read-bug
Nov 24, 2021
Merged

lineprotocol: fix Decoder bug when decoding from an io.Reader#51
rogpeppe merged 1 commit intov2from
rog-037-fix-read-bug

Conversation

@rogpeppe
Copy link
Copy Markdown
Contributor

When we needed more input at the end of the buffer, we were sliding
the existing data to the front of the buffer, but ignoring the fact that
existing tokens could be pointing to that data and hence overwrite it.

Avoid that possibility by sliding data only when we reset the decoder
after each entry has been entirely decoded.

It's likely that this was the cause of #50,
although I haven't managed to recreate the panic. Specifically, in Decoder.NextField,
this scenario could have happened:

  • we decode the field tag
  • we decode the field value, triggering a read request which corrupts the field tag by overwriting it.
  • there's an error decoding the value
  • we calculate the original encoded length of the tag by looking at its data, but the corrupted data now contains a character that needs escaping, which it didn't before, so the calculated length is longer than the original.
  • this results in us passing a startIndex value that's too large to d.syntaxErrorf
  • this results in the index out-of-bounds panic

The reason this issue wasn't caught by fuzzing was that we were only fuzzing with
NewDecoderWithBytes, not with NewDecoder itself.

Performance seems largely unaffected:

name                                                           old time/op    new time/op    delta
DecodeEntriesSkipping/long-lines-8                               25.4ms ± 1%    25.4ms ± 1%       ~     (p=0.421 n=5+5)
DecodeEntriesSkipping/long-lines-with-escapes-8                  29.6ms ± 0%    29.4ms ± 0%     -0.60%  (p=0.008 n=5+5)
DecodeEntriesSkipping/single-short-line-8                         408ns ± 1%     407ns ± 1%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/single-short-line-with-escapes-8            415ns ± 2%     418ns ± 2%       ~     (p=0.548 n=5+5)
DecodeEntriesSkipping/many-short-lines-8                          178ms ± 1%     175ms ± 1%     -1.49%  (p=0.008 n=5+5)
DecodeEntriesSkipping/field-key-escape-not-escapable-8            369ns ± 2%     367ns ± 0%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/tag-value-triple-escape-space-8             447ns ± 2%     442ns ± 0%       ~     (p=0.151 n=5+5)
DecodeEntriesSkipping/procstat-8                                 3.43µs ± 2%    3.35µs ± 1%     -2.29%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-8                        25.4ms ± 1%    25.2ms ± 0%     -0.68%  (p=0.016 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8            101ms ± 1%     101ms ± 0%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-8                  442ns ± 2%     438ns ± 1%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8     467ns ± 2%     465ns ± 2%       ~     (p=1.000 n=5+5)
DecodeEntriesWithoutSkipping/many-short-lines-8                   205ms ± 2%     207ms ± 0%       ~     (p=0.222 n=5+5)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8     516ns ± 6%     420ns ± 2%    -18.60%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8      586ns ± 1%     510ns ± 1%    -13.05%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                          5.76µs ± 2%    6.23µs ± 0%     +8.11%  (p=0.016 n=5+4)

name                                                           old speed      new speed      delta
DecodeEntriesSkipping/long-lines-8                             1.03GB/s ± 1%  1.03GB/s ± 1%       ~     (p=0.421 n=5+5)
DecodeEntriesSkipping/long-lines-with-escapes-8                 886MB/s ± 0%   891MB/s ± 0%     +0.61%  (p=0.008 n=5+5)
DecodeEntriesSkipping/single-short-line-8                      71.0MB/s ± 1%  71.2MB/s ± 1%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/single-short-line-with-escapes-8         77.2MB/s ± 2%  76.6MB/s ± 2%       ~     (p=0.548 n=5+5)
DecodeEntriesSkipping/many-short-lines-8                        147MB/s ± 1%   150MB/s ± 1%     +1.52%  (p=0.008 n=5+5)
DecodeEntriesSkipping/field-key-escape-not-escapable-8         89.4MB/s ± 2%  90.0MB/s ± 0%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/tag-value-triple-escape-space-8           112MB/s ± 2%   113MB/s ± 0%       ~     (p=0.151 n=5+5)
DecodeEntriesSkipping/procstat-8                                387MB/s ± 1%   396MB/s ± 1%     +2.34%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-8                      1.03GB/s ± 1%  1.04GB/s ± 0%     +0.68%  (p=0.016 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8          259MB/s ± 1%   261MB/s ± 0%       ~     (p=0.286 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-8               65.6MB/s ± 2%  66.2MB/s ± 1%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8  68.5MB/s ± 2%  68.9MB/s ± 2%       ~     (p=1.000 n=5+5)
DecodeEntriesWithoutSkipping/many-short-lines-8                 128MB/s ± 2%   126MB/s ± 0%       ~     (p=0.222 n=5+5)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8  64.1MB/s ± 6%  78.6MB/s ± 2%    +22.60%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8   85.3MB/s ± 1%  98.1MB/s ± 1%    +15.02%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                         230MB/s ± 2%   213MB/s ± 0%     -7.51%  (p=0.016 n=5+4)

name                                                           old alloc/op   new alloc/op   delta
DecodeEntriesSkipping/long-lines-8                                 512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/long-lines-with-escapes-8                    512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-8                          512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-with-escapes-8             512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/many-short-lines-8                           512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/field-key-escape-not-escapable-8             512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/tag-value-triple-escape-space-8              512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/procstat-8                                   512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-8                          512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8           17.4kB ± 0%    17.4kB ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-8                   512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8      512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/many-short-lines-8                    512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8      512B ± 0%      514B ± 0%     +0.39%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8       512B ± 0%      514B ± 0%     +0.39%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                            512B ± 0%      784B ± 0%    +53.12%  (p=0.008 n=5+5)

name                                                           old allocs/op  new allocs/op  delta
DecodeEntriesSkipping/long-lines-8                                 1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/long-lines-with-escapes-8                    1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-8                          1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-with-escapes-8             1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/many-short-lines-8                           1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/field-key-escape-not-escapable-8             1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/tag-value-triple-escape-space-8              1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/procstat-8                                   1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-8                          1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8             7.00 ± 0%      7.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-8                   1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8      1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/many-short-lines-8                    1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8      1.00 ± 0%      2.00 ± 0%   +100.00%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8       1.00 ± 0%      2.00 ± 0%   +100.00%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                            1.00 ± 0%     30.00 ± 0%  +2900.00%  (p=0.008 n=5+5)

When we needed more input at the end of the buffer, we were sliding
the existing data to the front of the buffer, but ignoring the fact that
existing tokens could be pointing to that data and hence overwrite it.

Avoid that possibility by sliding data only when we reset the decoder
after each entry has been entirely decoded.

It's likely that this was the cause of #50,
although I haven't managed to recreate the panic. Specifically, in [`Decoder.NextField`](https://github.com/influxdata/line-protocol/blob/e5cb25aaf8da4a0058b516f769b1804713e2de58/lineprotocol/decoder.go#L399-L412),
this scenario could have happened:

- we decode the field tag
- we decode the field value, triggering a read request which corrupts the field tag by overwriting it.
- there's an error decoding the value
- we calculate the original encoded length of the tag by looking at its data, but the corrupted data now contains a character that needs escaping, which it didn't before, so the calculated length is longer than the original.
- this results in us passing a `startIndex` value that's too large to `d.syntaxErrorf`
- this results in the index out-of-bounds panic

The reason this issue wasn't caught by fuzzing was that we were only fuzzing with
`NewDecoderWithBytes`, not with `NewDecoder` itself.

Performance seems largely unaffected:

```
name                                                           old time/op    new time/op    delta
DecodeEntriesSkipping/long-lines-8                               25.4ms ± 1%    25.4ms ± 1%       ~     (p=0.421 n=5+5)
DecodeEntriesSkipping/long-lines-with-escapes-8                  29.6ms ± 0%    29.4ms ± 0%     -0.60%  (p=0.008 n=5+5)
DecodeEntriesSkipping/single-short-line-8                         408ns ± 1%     407ns ± 1%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/single-short-line-with-escapes-8            415ns ± 2%     418ns ± 2%       ~     (p=0.548 n=5+5)
DecodeEntriesSkipping/many-short-lines-8                          178ms ± 1%     175ms ± 1%     -1.49%  (p=0.008 n=5+5)
DecodeEntriesSkipping/field-key-escape-not-escapable-8            369ns ± 2%     367ns ± 0%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/tag-value-triple-escape-space-8             447ns ± 2%     442ns ± 0%       ~     (p=0.151 n=5+5)
DecodeEntriesSkipping/procstat-8                                 3.43µs ± 2%    3.35µs ± 1%     -2.29%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-8                        25.4ms ± 1%    25.2ms ± 0%     -0.68%  (p=0.016 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8            101ms ± 1%     101ms ± 0%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-8                  442ns ± 2%     438ns ± 1%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8     467ns ± 2%     465ns ± 2%       ~     (p=1.000 n=5+5)
DecodeEntriesWithoutSkipping/many-short-lines-8                   205ms ± 2%     207ms ± 0%       ~     (p=0.222 n=5+5)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8     516ns ± 6%     420ns ± 2%    -18.60%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8      586ns ± 1%     510ns ± 1%    -13.05%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                          5.76µs ± 2%    6.23µs ± 0%     +8.11%  (p=0.016 n=5+4)

name                                                           old speed      new speed      delta
DecodeEntriesSkipping/long-lines-8                             1.03GB/s ± 1%  1.03GB/s ± 1%       ~     (p=0.421 n=5+5)
DecodeEntriesSkipping/long-lines-with-escapes-8                 886MB/s ± 0%   891MB/s ± 0%     +0.61%  (p=0.008 n=5+5)
DecodeEntriesSkipping/single-short-line-8                      71.0MB/s ± 1%  71.2MB/s ± 1%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/single-short-line-with-escapes-8         77.2MB/s ± 2%  76.6MB/s ± 2%       ~     (p=0.548 n=5+5)
DecodeEntriesSkipping/many-short-lines-8                        147MB/s ± 1%   150MB/s ± 1%     +1.52%  (p=0.008 n=5+5)
DecodeEntriesSkipping/field-key-escape-not-escapable-8         89.4MB/s ± 2%  90.0MB/s ± 0%       ~     (p=0.690 n=5+5)
DecodeEntriesSkipping/tag-value-triple-escape-space-8           112MB/s ± 2%   113MB/s ± 0%       ~     (p=0.151 n=5+5)
DecodeEntriesSkipping/procstat-8                                387MB/s ± 1%   396MB/s ± 1%     +2.34%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-8                      1.03GB/s ± 1%  1.04GB/s ± 0%     +0.68%  (p=0.016 n=5+5)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8          259MB/s ± 1%   261MB/s ± 0%       ~     (p=0.286 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-8               65.6MB/s ± 2%  66.2MB/s ± 1%       ~     (p=0.310 n=5+5)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8  68.5MB/s ± 2%  68.9MB/s ± 2%       ~     (p=1.000 n=5+5)
DecodeEntriesWithoutSkipping/many-short-lines-8                 128MB/s ± 2%   126MB/s ± 0%       ~     (p=0.222 n=5+5)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8  64.1MB/s ± 6%  78.6MB/s ± 2%    +22.60%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8   85.3MB/s ± 1%  98.1MB/s ± 1%    +15.02%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                         230MB/s ± 2%   213MB/s ± 0%     -7.51%  (p=0.016 n=5+4)

name                                                           old alloc/op   new alloc/op   delta
DecodeEntriesSkipping/long-lines-8                                 512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/long-lines-with-escapes-8                    512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-8                          512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-with-escapes-8             512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/many-short-lines-8                           512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/field-key-escape-not-escapable-8             512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/tag-value-triple-escape-space-8              512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesSkipping/procstat-8                                   512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-8                          512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8           17.4kB ± 0%    17.4kB ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-8                   512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8      512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/many-short-lines-8                    512B ± 0%      512B ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8      512B ± 0%      514B ± 0%     +0.39%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8       512B ± 0%      514B ± 0%     +0.39%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                            512B ± 0%      784B ± 0%    +53.12%  (p=0.008 n=5+5)

name                                                           old allocs/op  new allocs/op  delta
DecodeEntriesSkipping/long-lines-8                                 1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/long-lines-with-escapes-8                    1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-8                          1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/single-short-line-with-escapes-8             1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/many-short-lines-8                           1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/field-key-escape-not-escapable-8             1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/tag-value-triple-escape-space-8              1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesSkipping/procstat-8                                   1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-8                          1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/long-lines-with-escapes-8             7.00 ± 0%      7.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-8                   1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/single-short-line-with-escapes-8      1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/many-short-lines-8                    1.00 ± 0%      1.00 ± 0%       ~     (all equal)
DecodeEntriesWithoutSkipping/field-key-escape-not-escapable-8      1.00 ± 0%      2.00 ± 0%   +100.00%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/tag-value-triple-escape-space-8       1.00 ± 0%      2.00 ± 0%   +100.00%  (p=0.008 n=5+5)
DecodeEntriesWithoutSkipping/procstat-8                            1.00 ± 0%     30.00 ± 0%  +2900.00%  (p=0.008 n=5+5)
```
@rogpeppe rogpeppe requested a review from a team as a code owner November 23, 2021 18:21
@rogpeppe rogpeppe requested review from andrewcharlton and removed request for a team November 23, 2021 18:21
@philjb
Copy link
Copy Markdown
Contributor

philjb commented Nov 23, 2021

nice find! how did you locate it in the end?

@rogpeppe
Copy link
Copy Markdown
Contributor Author

@philjb

nice find! how did you locate it in the end?

One of the end-to-end Flux tests was failing. I isolated the data that caused the problem by running every write through both decoders and checking when the results differed. Then I realised that the issue didn't happen when using NewDecoderWithBytes which led swiftly to the problem area.

@rogpeppe rogpeppe merged commit 8d23ab8 into v2 Nov 24, 2021
@jacobmarble jacobmarble deleted the rog-037-fix-read-bug branch April 19, 2022 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants