Skip to content

vendor: github.com/fluent/fluent-logger-golang v1.9.0#43100

Merged
cpuguy83 merged 2 commits intomoby:masterfrom
conorevans:conorevans/update-fluent
Jan 5, 2022
Merged

vendor: github.com/fluent/fluent-logger-golang v1.9.0#43100
cpuguy83 merged 2 commits intomoby:masterfrom
conorevans:conorevans/update-fluent

Conversation

@conorevans
Copy link
Copy Markdown
Contributor

- What I did

Updated fluent-logger-golang dependency to v1.9.0 and added support in the logger for the new AsyncReconnectInterval parameter.

- How to verify it

Run a Docker container with the fluentd logging driver here and set the fluentd-address log-opt to an address with a host that could resolve to multiple IPs, e.g. fluentd.service.consul:24224. Set the fluentd-async-reconnect-interval to a value greater than 0 (value in ms). Observe that the container will periodically reconnect to the fluentd-address. You can force a reconnection by removing one of the fluent hosts from the service pool (or equivalent).

- Description for the changelog
Support AsyncReconnectInterval parameter in fluentd logging driver. This option is useful if the address
may resolve to one or more IP addresses, e.g. a Consul service address.

- A picture of a cute animal (not mandatory but encouraged)

image

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing! I left some comments inline. I was also "coding along" while I was writing my comments; let me open a pull request against your fork with my suggestions

Comment thread daemon/logger/fluentd/fluentd.go Outdated

asyncReconnectInterval := 0
if cfg[asyncReconnectIntervalKey] != "" {
ari64, err := strconv.ParseUint(cfg[asyncReconnectIntervalKey], 10, strconv.IntSize)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should consider to parse the value as a time.Duration (time.ParseDuration()) to make it easier (and less error-prone) for the user to set the right interval. Having to specify the value in ms works "ok(ish)" if you want to, say, specify a second (1000 ms), but can become a bit awkward after that. It's also error-prone, for example, if a user wants to specify 1 second, but specifies 1 and ends up with 1 ms.

And looking at the code, I see we already use duration strings for fluentd-retry-wait, so it's a bit more consistent.

So I would suggest to;

  • use time.ParseDuration()
  • validate the no negative values are specified
  • perhaps also a minimum / maximum (not sure what reasonable values are? between 100ms and 10s ? (open to suggestions)

Would be good to have a minimal unit test (I see there's currently no tests in this package, which is a bit unfortunate, but let's not continue that "tradition" 😅)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion -- that sounds good to me. I'll have a look at your suggested changes on my fork and merge them in.

Would be good to have a minimal unit test (I see there's currently no tests in this package, which is a bit unfortunate, but let's not continue that "tradition" 😅)

Yeah, I saw that and I guess lazily thought "ah, it's Christmas, time of tradition" 😉 -- I'll add some tests for this MR and probably cover the other test-cases in parseConfig while I'm at it 👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MR and probably cover the other test-cases in parseConfig while I'm at it

Thanks! Probably good to do that in a follow-up / separate PR to keep this PR "on topic"

}

if f.AsyncReconnectInterval > 0 {
if time.Since(f.latestReconnectTime) > time.Duration(f.AsyncReconnectInterval)*time.Millisecond {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look too closely if it could be a problem, but wondering if the usage of f.latestReconnectTime also needs a lock here

(This is in vendored code, so if it needs fixing, needs to be done in the upstream project)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it needs one, at least in the current code. The only place that calls connect (which assigns latestReconnectTime is this if loop (if AsyncReconnectInterval is enabled, otherwise it's the writeWithRetry below) -- concurrent calls to connect are not made). If you want to take a second look, I'm all ears. I'm definitely a less experienced Gopher than you, so perhaps I am missing something!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrent calls to connect are not made

Ah, right, yes, in that case it won't be needed. I saw a mention of the lock in one of the functions, and wondered if things could be called concurrently.

I'm definitely a less experienced Gopher than you, so perhaps I am missing something!

I still make mistakes and miss things as anyone! 😂 . What I did learn over the years is that it usually doesn't hurt to leave a comment (even if wrong). I've been bitten by cases where I wasn't sure and "assumed" other participants on the PR would've noticed if it was wrong (because they are more experienced than I, so for sure they would see a mistake??) and it turned out there was an issue after all, but everyone missed it.

return
}

if f.AsyncReconnectInterval > 0 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comment above;

So perhaps we should also have acceptable minimal and maximal intervals?

I'm actually wondering if the fluentd code itself should also define a minimum to prevent a close retry loop. Perhaps worth raising an issue for that and/or a PR to add at require a minimum value? 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually wondering if the fluentd code itself should also define a minimum to prevent a close retry loop. Perhaps worth raising an issue for that and/or a PR to add at require a minimum value?

I considered that when I submitted the PR there but thought I'd leave it to fluent maintainers to suggest values if they believed it necessary. I do also think that it can be left to downstream users to decide for themselves what those safe values are. I totally see it being appropriate in this project. I think 100ms/10s are fair min/maxes. We can go with that to start with and a second reviewer (I believe those are necessary for most/external changes) can perhaps suggest an alternative if they feel differently.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 100ms/10s are fair min/maxes. We can go with that to start with

Agreed; we can always relax the range if users complain (and if there's valid reasons to have shorter or longer intervals)

and a second reviewer (I believe those are necessary for most/external changes) can perhaps suggest an alternative if they feel differently.

Yes, general rule is to have a minimum of two reviewers (irregardless of who opened the PR) to reduce the risk of things being missed. In some rare exceptions we may merge with one review (trusting the maintainer that merged to have outweighed the risks).

@thaJeztah
Copy link
Copy Markdown
Member

I opened https://github.com/conorevans/moby/pull/1 against your branch; you can merge those changes if they look good to you (please squash the commits after that, as I don't think they need to be in separate commits 😅)

@thaJeztah
Copy link
Copy Markdown
Member

CI Failure is unrelated, but looks like this new test is flaky; (merged in #43043)

=== RUN   TestFollowLogsHandleDecodeErr
time="2021-12-23T13:44:47Z" level=error msg="Log reader notified that it must re-open log file, some log data may not be streamed to the client." file=/tmp/TestFollowLogsHandleDecodeErr2491538477
    logfile_test.go:317: assertion failed: 5 (int) != 6 (dec.resetCount int)
--- FAIL: TestFollowLogsHandleDecodeErr (0.02s)

Updates the fluent logger library to v1.9.0. The update includes the following commit:

* [Add periodic reconnection functionality](fluent/fluent-logger-golang@1c05506)

See https://github.com/fluent/fluent-logger-golang/compare/v1.8.0..v1.9.0

Signed-off-by: Conor Evans <coevans@tcd.ie>
thaJeztah
thaJeztah previously approved these changes Dec 24, 2021
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Comment thread daemon/logger/fluentd/fluentd.go Outdated
return config, errors.Errorf("invalid value for %s: value (%q) must be between %s and %s",
asyncReconnectIntervalKey, interval, minReconnectInterval, maxReconnectInterval)
}
asyncReconnectInterval = int(interval)
Copy link
Copy Markdown
Contributor Author

@conorevans conorevans Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this randomly popped into my head @thaJeztah -- I don't think this would function as expected, since the interval is a time.Duration which will be in nanoseconds. The Fluent library expects an interval in milliseconds:

### AsyncReconnectInterval
When async is enabled, this option defines the interval (ms) at which the connection
to the fluentd-address is re-established. This option is useful if the address
may resolve to one or more IP addresses, e.g. a Consul service address.

int(interval.Milliseconds()) would fix this -- https://go.dev/play/p/TJfb0zQ7l3E

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh! Great catch. For some reason I was thinking time.Interval was in Milliseconds already 🤦 😊😬

Let me know if you want to update it, or want me to push 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out the rebasing 🙂 -- updated.

Copy link
Copy Markdown
Contributor Author

@conorevans conorevans Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least.. it looks OK 😅 -- it doesn't have both of us as committers for the second commit, but includes a sign-off for for both of us. Was there an option I missed that would have included you as an author of the commit as well as in the message itself?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read up a bit more, I remembered the co-authored-by field, so I fixed that! Should be good to go again 🙂

 * When async is enabled, this option defines the interval (ms) at which the connection
to the fluentd-address is re-established. This option is useful if the address
may resolve to one or more IP addresses, e.g. a Consul service address.

While the change in #42979 resolves the issue where a Docker container can be stuck
if the fluentd-address is unavailable, this functionality adds an additional benefit
in that a new and healthy fluentd-address can be resolved, allowing logs to flow once again.

This adds a `fluentd-async-reconnect-interval` log-opt for the fluentd logging driver.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Conor Evans <coevans@tcd.ie>

Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Co-authored-by: Conor Evans <coevans@tcd.ie>
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@thaJeztah
Copy link
Copy Markdown
Member

@cpuguy83 ptal

Copy link
Copy Markdown
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@conorevans
Copy link
Copy Markdown
Contributor Author

Hey @thaJeztah just wondering, is there an expected timeline for 21.xx release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants