Skip to content

feat: add new Influx Line Protocol Parser#9685

Closed
akrantz01 wants to merge 15 commits intoinfluxdata:masterfrom
akrantz01:9474-new-influx-parser
Closed

feat: add new Influx Line Protocol Parser#9685
akrantz01 wants to merge 15 commits intoinfluxdata:masterfrom
akrantz01:9474-new-influx-parser

Conversation

@akrantz01
Copy link
Copy Markdown
Contributor

@akrantz01 akrantz01 commented Aug 26, 2021

Required for all PRs:

  • Updated associated README.md.
  • Wrote appropriate unit tests.
  • Pull request title or commits are in conventional commit format (e.g. feat: or fix:)

resolves #9474

Currently blocked by influxdata/line-protocol#43

Replaces the Influx Line Protocol parser with the new, faster, and zero-allocation parser found here. The only downside to the new parser is that we cannot get the current buffer when using the stream parser variant, so the messages will be slightly less descriptive when an error is encountered.

@telegraf-tiger telegraf-tiger bot added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins labels Aug 26, 2021
@wz2b
Copy link
Copy Markdown

wz2b commented Sep 1, 2021

I have an open feature request #9487 that requires the line protocol parser in telegraf to process and handle comments. After some discussion with @reimda I have been talked out of this idea but I am still looking at this new line parser and I don't see where it gracefully handles comments (even by just throwing them away). Did I miss it?

@akrantz01
Copy link
Copy Markdown
Contributor Author

@wz2b I'm not sure where exactly it handles comments, but it does handle them by just discarding them.

@akrantz01 akrantz01 marked this pull request as ready for review September 1, 2021 16:41
@akrantz01 akrantz01 force-pushed the 9474-new-influx-parser branch from 7d99a8e to bf40aff Compare September 1, 2021 16:50
Copy link
Copy Markdown
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @akrantz01 thanks for looking into this. I have some comments, the most prominent being

  1. My suggestion to use errors.Is() when checking for a certain error type to handle wrapped errors.
  2. A question regarding the removal of the bytes_received stat.

Would be nice if you can take a look.

@srebhan srebhan self-assigned this Sep 7, 2021
@akrantz01 akrantz01 force-pushed the 9474-new-influx-parser branch from 642db8c to 070d298 Compare September 17, 2021 01:37
Copy link
Copy Markdown
Contributor

@sspaink sspaink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, do you know if anyone has tested this? @akrantz01 were you able to test all of these changes?

@akrantz01
Copy link
Copy Markdown
Contributor Author

I don't know if anyone else has tested it. I also haven't done any extensive testing.

@oplehto
Copy link
Copy Markdown
Contributor

oplehto commented Sep 22, 2021

I ran this against a set of diverse real-world metrics data also with a variety of badly formatted lines. It worked quite well but ended up panicing. Unfortunately I do not have the exact line which caused this but maybe the trace will already be helpful. Working on identifying the culprit line as well.

Trace:

panic: runtime error: slice bounds out of range [:-1007604]

goroutine 10 [running]:
github.com/influxdata/telegraf/plugins/parsers/influx.(*ParseError).Error(0xc001d20018)
        /go/src/github.com/influxdata/telegraf/plugins/parsers/influx/parser.go:77 +0x3b0
github.com/influxdata/telegraf/plugins/inputs/nats_consumer.(*natsConsumer).receiver(0xc000001d40, {0x53bd4a8, 0xc000c88640})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/nats_consumer/nats_consumer.go:237 +0x442
created by github.com/influxdata/telegraf/plugins/inputs/nats_consumer.(*natsConsumer).Start.func2
        /go/src/github.com/influxdata/telegraf/plugins/inputs/nats_consumer/nats_consumer.go:202 +0xd5

@reimda reimda mentioned this pull request Sep 24, 2021
3 tasks
@oplehto
Copy link
Copy Markdown
Contributor

oplehto commented Sep 29, 2021

I haven't been able to pinpoint what could have caused the panic in the metrics stream. One simple and fast workaround would be to have better bounds checking on this array to ensure that the slice is valid and provide the necessary context in the error message to make identifying the problematic line easier.

@oplehto
Copy link
Copy Markdown
Contributor

oplehto commented Sep 29, 2021

Found another panic, this one is within the line protocol parser itself so reported it here: influxdata/line-protocol#50

@powersj
Copy link
Copy Markdown
Contributor

powersj commented Sep 30, 2021

@oplehto, would you be willing to share any of the data you are using to cause these issues?

@oplehto
Copy link
Copy Markdown
Contributor

oplehto commented Sep 30, 2021

@oplehto, would you be willing to share any of the data you are using to cause these issues?

It's a pretty high-volume real-time data stream with a variety of sources. Thus pinpointing the needle in the haystack is a bit challenging, which I why I was hoping for some more robust exception handling to help narrow it down. In general it would be nice to have bounds checking when referencing slices like this to better catch these type of "unknown unknowns" in live data.

if len(buffer) > maxErrorBufferSize {
startEllipsis := true
offset := e.Offset - e.LineOffset
offset := e.Column - 1 - lineStart
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as pointed out in the comments, we need to verify that offset is not > len(buffer) and if it is return len(buffer)

@powersj
Copy link
Copy Markdown
Contributor

powersj commented Oct 6, 2021

Hi!

I am going to take this work over from Alex. I have opened a new PR #9871 where I have pushed his existing work plus a fix for the buffer overflow and have Telegraf set to use the debug version of the Influx Line Protocol parser for help with testing.

I am going to close this PR in favor of my own and hope @oplehto can use that to help debug the current panic.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support the new Influx Line Protocol Parser

6 participants