Influxdb v2 listener by magichair · Pull Request #7828 · influxdata/telegraf

magichair · 2020-07-13T15:19:19Z

Most code is ported directly from the influxdb_listener input plugin. Naming convention for influxdb_v2* matches the paradigm with the existing output plugin.

I've manually tested functionality of this plugin with two of the official InfluxDB 2.x client libraries:

Only open question, I ported the handlePing method from influxdb_listener to instead support the /api/v2/ready endpoint. But it could easily be removed if there is no requirement from the client libraries to test readiness before writing.
Removed handleReady

Also to note, the handleQuery method was removed, the comments indicated that was to support some client libraries that would attempt a basic query to test connectivity. I didn't see any proof of that in the above two client libraries, so I removed the stubbed query response.

Oh, another thing updated to support this change, a new Auth handler is added to manage InfluxDB's idiosyncratic Authorization: Token <token> authentication strategy.

Required for all PRs:

Signed CLA.
Associated README.md updated.
Has appropriate unit tests.
Awaiting Legal review from my company regarding the applicability of the CCLA Update: Do not need any additional approvals

magichair · 2020-07-13T15:26:22Z

I'll have a fix for the time related test failure.

ssoroka

Looks amazing! Very thorough.
Please remove the .DS_Store files for merging.
I have a couple small changes in feedback.

ssoroka · 2020-07-13T15:32:59Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+}
+
+func (h *InfluxDBV2Listener) Description() string {
+	return "Accept metrics over InfluxDB 1.x HTTP API"


update description.

ssoroka · 2020-07-13T15:37:42Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+	ReadTimeout  internal.Duration `toml:"read_timeout"`
+	WriteTimeout internal.Duration `toml:"write_timeout"`
+	MaxBodySize  internal.Size     `toml:"max_body_size"`
+	MaxLineSize  internal.Size     `toml:"max_line_size"` // deprecated in 1.14; ignored


I'd remove this. if it's already deprecated, there's no reason to add it to a new plugin.

Perfect, missed this, will remove.

ssoroka · 2020-07-13T15:39:08Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+
+	ReadTimeout  internal.Duration `toml:"read_timeout"`
+	WriteTimeout internal.Duration `toml:"write_timeout"`
+	MaxBodySize  internal.Size     `toml:"max_body_size"`


Not sure we really need a max body size. If we do, 32mb is possibly too small. We should be streaming the request through, so this won't affect memory used.

ah, yeah - good catch, InfluxDB 2.x now no longer has a max-body-size config parameter - https://v2.docs.influxdata.com/v2.0/reference/config-options/

I would keep this option, 32mb seems like plenty. The typical batch size is 1000-5000 metrics and is gzip compressed.

The limit seems pretty arbitrary if we're taking results of any size in a stream.

This API from InfluxDBv2 is built around accepting batches of data, it's not a streaming API. As everything is loaded into memory, I think this is is a useful limit to avoid OOM.

ssoroka

ok so is it magi chair, magic chair, or magic hair? :D Either way, bravo, fellow mage. 🧙

magichair · 2020-07-13T16:37:51Z

ok so is it magi chair, magic chair, or magic hair? :D Either way, bravo, fellow mage. 🧙

magic hair, an ol' high school nickname that became my de facto online alias. Thanks for the review and approval!

danielnelson

I don't have a very strong opinion on the /ready endpoint, but if none of the client libraries are using it then I suggest we remove it here just to keep things neat.

Don't forget to remove .DS_Store.

danielnelson · 2020-07-13T18:17:30Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+}
+
+func (h *InfluxDBV2Listener) Description() string {
+	return "Accept metrics over InfluxDB 1.8+ / 2.x HTTP API"


Let's document, and support, 2.x only.

danielnelson · 2020-07-13T18:26:58Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+	tags := map[string]string{
+		"address": h.ServiceAddress,
+	}


We should make this optional, or just remove it, so that it is possible to pass though data unmodified. If we want to make it optional it should be done similar to the bucket_tag option.

This tags is just used to add tagging to the internal selfstat below and does not munge the actual telegraf metric that is collected and passed on from the plugin

danielnelson · 2020-07-13T18:27:30Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+  ## maximum duration before timing out read of the request
+  read_timeout = "10s"
+  ## maximum duration before timing out write of the response
+  write_timeout = "10s"


Consider combining these into a single timeout.

They do surface two different configurations on the http.Server (

telegraf/plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

Lines 139 to 145 in af42bc9

h.server = http.Server{

Addr: h.ServiceAddress,

Handler: h,

ReadTimeout: h.ReadTimeout.Duration,

WriteTimeout: h.WriteTimeout.Duration,

TLSConfig: tlsConf,

}

) and are carried over from the influxdb_listener implementation. I'm not sure how they might be used, but I'd like to keep them separate.

My thought was that the response is typically very small, so there isn't a lot of value in having separate options vs a simpler combined timeout. If it is difficult to set this up on the http.Server, then it is okay.

Decided that given these should be small responses, to entirely remove the timeouts for a simpler config. Should someone need to add that functionality in the future, it should be straightforward to forward port from the influxdb_listener or other examples in the code base.

danielnelson · 2020-07-13T18:34:41Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+			defer body.Close()
+		}
+
+		parser := influx.NewStreamParser(body)


One spot where the InfluxDB 2.x API differs from the 1.x API is that writes are either fully accepted or fully rejected. I would switch this over to use the regular influx.Parser.

From the /write docs:

400: line protocol poorly formed and no points were written. Response can be used to determine the first malformed line in the body line-protocol. All data in body was rejected and not written.

We should try to mirror the API from InfluxDBv2 as closely as we reasonable can to avoid issues when switching between the APIs.

🤔 so maybe I should still enforce some form of max_body_size if I have to read the entire http response body into memory and pass into influx.Parse(bodyBytes) #7828 (comment)

I definitely agree we want to match, not pushing back on that, just trying to brainstorm on the best way to approach this since I presume the accumulator isn't transactional, and I'd have to scrap the streaming code (or again store the telegraf metrics in memory and throw them away if we encounter an error):

telegraf/plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

Line 277 in af42bc9

h.acc.AddMetric(m)

I do think we should keep max_body_size, commented over there. You are right that the accumulator isn't transactional, so we aren't safe in the case of SIGKILL, power loss or the likes. Anyway, the output plugin is unsyncronized with the inputs, so the data could go out in separate writes.

Still we can do a close approximation that will work under normal operation (outputs up/clean shutdown): after switching to the Parse() function just check and return the error or add all the metrics if no error.

Added max_body_size back, implemented all or nothing writes.

danielnelson · 2020-07-13T18:45:50Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+	}
+}
+
+func badRequest(res http.ResponseWriter, errString string) {


In order to more closely match the InfluxDBv2 API, we should try to sync up some common error messages. We should check example error responses from InfluxDBv2 listed at https://v2.docs.influxdata.com/v2.0/api/#tag/Write and anywhere we produce the equivalent error try to produce something similar.

danielnelson · 2020-07-13T18:47:17Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+func init() {
+	inputs.Add("influxdb_v2_listener", func() telegraf.Input {
+		return &InfluxDBV2Listener{
+			ServiceAddress: ":8186",


Default to the 9999 port to match InfluxDBv2 OSS.

danielnelson · 2020-07-13T18:49:00Z

plugins/inputs/influxdb_v2_listener/influxdb_v2_listener.go

+	readysServed    selfstat.Stat
+	requestsRecv    selfstat.Stat
+	notFoundsServed selfstat.Stat
+	buffersCreated  selfstat.Stat


I don't see buffersCreated being used, can we remove it?

danielnelson · 2020-07-13T18:50:44Z

plugins/inputs/influxdb_v2_listener/README.md

+When chaining Telegraf instances using this plugin, CREATE DATABASE requests
+receive a 200 OK response with message body `{"results":[]}` but they are not
+relayed. The output configuration of the Telegraf instance which ultimately
+submits data to InfluxDB determines the destination database.


Let's remove this paragraph since it isn't relevant for InfluxDBv2.

danielnelson · 2020-07-13T19:05:41Z

internal/http.go

+// TokenAuthHandler returns a http handler that requires `Authorization: Token <token>`
+// Introduced to support InfluxDB 2.x style authentication
+// https://v2.docs.influxdata.com/v2.0/reference/api/#authentication
+func TokenAuthHandler(token string, onError TokenAuthErrorFunc) func(h http.Handler) http.Handler {


Set this up to handle a direct comparison with any Authorization header value, not treating the scheme and credentials separately, just as a single big string. I think this will be more reusable and less tied to the InfluxDBv2 plugin.

danielnelson · 2020-07-13T19:06:42Z

internal/http.go

+		authParts := strings.SplitN(authHeader, " ", 2)
+		if len(authParts) != 2 ||
+			subtle.ConstantTimeCompare(
+				[]byte(strings.ToLower(strings.TrimSpace(authParts[0]))),
+				[]byte(strings.ToLower(h.scheme))) != 1 ||
+			subtle.ConstantTimeCompare([]byte(strings.TrimSpace(authParts[1])), []byte(h.credentials)) != 1 {


Let's remove the normalization steps here, just do a subtle.ConstantTimeCompare against the header value.

magichair · 2020-07-13T19:14:08Z

Excellent points @danielnelson, I'll get right on these 👍 Thank you!

magichair · 2020-07-13T21:37:25Z

Comments from Daniel still remaining:

Handle all or nothing writes
Normalized error handling to match API spec

magichair · 2020-07-16T16:58:02Z

I think all of @danielnelson 's concerns have been addressed at this point. A fair bit of changes since last review - each split in their own commit (hopefully for easier review).

magichair · 2020-09-10T19:07:18Z

Just poking in to see if there is a planned release timeline for 1.16.0. I've been running this as a custom built telegraf in production fairly successfully, not a lot of load on this for proving it out. However, it has kept up with what I'm using it for.

sjwang90 · 2020-09-10T23:21:23Z

@magichair We're looking to get a 1.16.0 release soon. Ideally to get this plugin in to align with the InfluxDB OSS 2 GA.

ssoroka · 2020-09-14T22:43:26Z

Merged. Thanks again for the great effort here!

eraac · 2020-09-15T09:48:41Z

Can this listener for v2 also support a /ping route? When you deploy this to kubernetes or/and behind a load balancer a health check route (without auth) is required to make this work

EDIT: Which should return 200 instead of 204, because of this #4935

magichair · 2020-09-15T14:20:16Z

Can this listener for v2 also support a /ping route? When you deploy this to kubernetes or/and behind a load balancer a health check route (without auth) is required to make this work

EDIT: Which should return 200 instead of 204, because of this #4935

@eraac I debating putting the /ping endpoint equivalent in (the /api/v2/ready endpoint), however it didn't seem necessary. The original Influx 1.x listener had it in place for compatibility with the existing InfluxDB client libraries where they tested /ping before allowing a write. In this case, the 2.x client libraries have no such check.

Is there a reason you can't add https://github.com/influxdata/telegraf/tree/master/plugins/outputs/health to setup a generic health endpoint for your telegraf node overall? This is what I use for my AWS ALB health checks.

eraac · 2020-09-15T15:15:40Z

Is there a reason you can't add https://github.com/influxdata/telegraf/tree/master/plugins/outputs/health to setup a generic health endpoint for your telegraf node overall? This is what I use for my AWS ALB health checks.

@magichair I just try, but I can't start output.health and input.influxdb_v2_listener on the same port (which is perfectly logic) :/

Our telegraf agent is deployed on Kubernetes, and we can't set a different port for the health check on the load balancer. this feature is only available for 1.17 and is still "beta"

magichair · 2020-09-15T16:26:32Z

That makes sense, if we wanted to resurrect it in a separate PR, it was roughly stubbed out and then removed in this commit: e1571b9

magichair added 2 commits July 12, 2020 14:28

Initial pass at InfluxDB v2 compliant listener

296c9e3

Tests and README updated

52a18fd

magichair mentioned this pull request Jul 13, 2020

InfluxDB v2.0 support for inputs.influxdb_listener plugin #6626

Closed

Fix for time related test failures

d858cde

ssoroka suggested changes Jul 13, 2020

View reviewed changes

magichair added 3 commits July 13, 2020 12:12

Remove sneaky .DS_Store file

312aa19

Remove MaxBodySize and MaxLineSize

2f0ebf7

Update description

af42bc9

ssoroka approved these changes Jul 13, 2020

View reviewed changes

danielnelson suggested changes Jul 13, 2020

View reviewed changes

danielnelson added this to the 1.16.0 milestone Jul 13, 2020

danielnelson added area/influxdb new plugin labels Jul 13, 2020

magichair added 3 commits July 13, 2020 16:10

Cosmetic fixups first

f8b9927

Switch to more generic credentials handler

347cce1

Remove support for /api/v2/ready

e1571b9

magichair added 4 commits July 16, 2020 11:48

All or nothing writes, bring back max_body_size

00da544

Docs tweaks, remove timeout config entirely

15ba021

Update error messages

176af78

Merge branch 'master' into influxdb_v2_listener

86d73f7

magichair requested a review from danielnelson July 16, 2020 16:58

ssoroka merged commit d764f86 into influxdata:master Sep 14, 2020

eraac mentioned this pull request Sep 16, 2020

Adding ready route to inputs.influxdb_v2_listener #8133

Merged

3 tasks

idohalevi pushed a commit to idohalevi/telegraf that referenced this pull request Sep 29, 2020

Influxdb v2 listener (influxdata#7828)

2ba2622

p-zak mentioned this pull request Oct 6, 2020

add min telegraf to plugin readme #8223

Merged

arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023

Influxdb v2 listener (influxdata#7828)

1af6795

	h.server = http.Server{
	Addr: h.ServiceAddress,
	Handler: h,
	ReadTimeout: h.ReadTimeout.Duration,
	WriteTimeout: h.WriteTimeout.Duration,
	TLSConfig: tlsConf,
	}

Conversation

magichair commented Jul 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Required for all PRs:

Uh oh!

magichair commented Jul 13, 2020

Uh oh!

ssoroka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssoroka left a comment

Choose a reason for hiding this comment

Uh oh!

magichair commented Jul 13, 2020

Uh oh!

danielnelson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

magichair Jul 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielnelson Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

magichair commented Jul 13, 2020

Uh oh!

magichair commented Jul 13, 2020

Uh oh!

magichair commented Jul 13, 2020 •

edited

Loading

magichair Jul 13, 2020 •

edited

Loading

danielnelson Jul 14, 2020 •

edited

Loading

eraac commented Sep 15, 2020 •

edited

Loading

magichair commented Sep 15, 2020 •

edited

Loading