Skip to content

[Forward plugin] Improve load balancing in connection pool#6804

Open
Arkelenia wants to merge 1 commit into
coredns:masterfrom
DataDog:frederic/forward-plugin-load-balancing
Open

[Forward plugin] Improve load balancing in connection pool#6804
Arkelenia wants to merge 1 commit into
coredns:masterfrom
DataDog:frederic/forward-plugin-load-balancing

Conversation

@Arkelenia

Copy link
Copy Markdown
Contributor

1. Why is this pull request needed and what does it do?

More details are available in the issue. In a nutshell, when multiple servers back a single destination endpoint (through a Kubernetes clusterIP, for example) in the forward plugin, the traffic is very unbalanced. The connection pool favors slow servers when returning cached connections.

This PR proposes the following changes:

  • Change from LIFO to FIFO behavior when returning a cached connection from the pool.
  • Add two explicit parameters: idle_timeout and max_connection_age. expire is deprecated in favor of idle_timeout, hence, idle_timeout takes precedence if both are defined but expire can still be used. For backward compatibility, max_connection_age defaults to infinite if not defined.

This PR is only meant as a suggestion to gather feedback.

2. Which issues (if any) are related?

This PR is a suggestion for this issue.

3. Which documentation changes (if any) need to be made?

Document the new parameters idle_timeout and max_connection_age and the "deprecation" of expire

4. Does this introduce a backward incompatible change or deprecation?

This PR doesn't introduce a breaking change because an existing configuration would still work after applying this change.

Comment thread plugin/pkg/proxy/persistent.go Outdated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes O(n) instead of the previous O(1) fetching of a connection. On normal operations, I would expect most connections to be valid and only needing to iterate through a few connections before finding a valid one. Benchmarks indicate that each step of the loop is about 5ns.

Signed-off-by: Frederic Hemery <frederic.hemery@datadoghq.com>

@thevilledev thevilledev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale PR and conflicts with the changes introduced in #7970. The conn manager no longer exists and the proxy implementation now uses a connection pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants