Skip to content

config: add fallback timeout to tls_inspector#7853

Closed
lambdai wants to merge 2 commits intoenvoyproxy:masterfrom
lambdai:snifftoapi
Closed

config: add fallback timeout to tls_inspector#7853
lambdai wants to merge 2 commits intoenvoyproxy:masterfrom
lambdai:snifftoapi

Conversation

@lambdai
Copy link
Copy Markdown
Contributor

@lambdai lambdai commented Aug 7, 2019

Description:

Istio is seeing mysql connection rejected when deprecating bind_to_port=false listeners by migrating to listener with a large amount of filter chains.

tls_inspector always expecting some bytes from client. Otherwise it block the filterchain match.
There are some application protocols that requires server send the handshake packet first. E.g. MySQL.
That means if there are two filter chains, one requires tls_inpsector listener filter and the other filter chain expecting mysql traffic
can never listed in the same listener.
If MySQL client connects, the connection is handled by envoy listener and blocked at tls_inspector::onAccept()
before trying to match any filter chain.

One of the solution is to allow tls_inspector fallback to determine it as plain text after a certain timeout (e.g. 10ms). (Not implemented yet, prototype https://github.com/silentdai/envoy/tree/toimpl)

  • The envoy listener may wrongly determine tls connection as TCP, and it's the matched filter chain's responsibility to either reject the connection or survive.
  • The MySQL filter chain would see the above fall back timeout (10ms -ish)

Back compatibility: if the tls_inspector config proto is not setting fallback timeout field, the default timeout is infinity.

Fix #7195

Risk Level: LOW
Testing:
Docs Changes: TODO
Release Notes: TODO

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #7853 was synchronize by silentdai.

see: more, trace.

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Aug 7, 2019

http inspector listener has the exact same issue. Will apply to http_inspector once approved.

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Aug 7, 2019

Request for comment: @htuch @rshriram

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #7853 was synchronize by silentdai.

see: more, trace.

Copy link
Copy Markdown
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this approach on per listener filter basis, #7195 should be applied to all listener filters at connection handler level.

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Aug 7, 2019

@lizan
I think It's hard to say if envoy could bypass the opinion of each listener filter. It's quite different from the decision of rejecting the connection.
If any listener filter cannot tolerate the bypass, we should not pass. Where do you provide the chance for those listener filters to say no?

@rshriram
Copy link
Copy Markdown
Member

rshriram commented Aug 7, 2019

Do you really need that level of granularity though? Even in Istio, we are most likely to only ever set the same timeout value for every filter chain. There is actually no scenario where we would set the timeout for one chain but not for the other because we do not know the protocol apriori.

IMO it is easy to go from a global option to a more fine grained option (local overrides global), than it is to go from a local only option to a global option, IMO. I dont mind either implementation - which ever is quicker works!

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Aug 7, 2019

@lizan Sorry I may misinterpret your comment. I total agree if there is a single timeout for the listener.

What I am proposed is that if any listener filter cannot follow the fallback timeout, that listener filter can block the onAccept, until envoy decide to close the connection

@lizan
Copy link
Copy Markdown
Member

lizan commented Aug 7, 2019

Let me pull up my WIP branch to a Draft PR soon.

I think It's hard to say if envoy could bypass the opinion of each listener filter. It's quite different from the decision of rejecting the connection.
If any listener filter cannot tolerate the bypass, we should not pass. Where do you provide the chance for those listener filters to say no?

We don't do this today, a StopIteration from listener filters today just means it wants more data, it is not an intent to decline a connection. We can always provide closing function via ListenerFilterCallbacks, which actively close the connection, if there is any desire for that.

@lizan
Copy link
Copy Markdown
Member

lizan commented Aug 8, 2019

We can always provide closing function via ListenerFilterCallbacks, which actively close the connection, if there is any desire for that.

Actually this is what continueFilterChain(false) do.

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Aug 8, 2019

I read through #7859 It pushes the flag continueOnListenerFiltersTimeout to listener but we do have something.
I have small concerns:

  • Ideally the timeout to close connection and timeout to continue could be 2 things... But istio survives without a timeout to close. I am fine with single timeout value :)
  • Previously it's either close the connection or envoy go through each listener filter so that each listener could have the opportunity to clean up in onAccept, or on file event. E.g. tls_inspector may created a file event and waiting to be triggered, with your current PR that file event is leaked or somehow triggered later. IMHO it continueOnListenerFiltersTimeout should do something before call newConnection().

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Aug 8, 2019

I believe #7859 will eventually address the my concern. Close this PR

@lambdai lambdai closed this Aug 8, 2019
@lizan
Copy link
Copy Markdown
Member

lizan commented Aug 8, 2019

your current PR that file event is leaked or somehow triggered later. IMHO it continueOnListenerFiltersTimeout should do something before call newConnection().

I'm spotting this working on tests, should be fixed.

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Aug 8, 2019

@lizan Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

allow fallback to default filter chain when listener filters timeout

3 participants