Skip to content

[http] Move HTTP1 request flood checks from response encode to request decode.#10475

Merged
mattklein123 merged 6 commits intoenvoyproxy:masterfrom
antoniovicente:flood_protection_on_decode
Mar 26, 2020
Merged

[http] Move HTTP1 request flood checks from response encode to request decode.#10475
mattklein123 merged 6 commits intoenvoyproxy:masterfrom
antoniovicente:flood_protection_on_decode

Conversation

@antoniovicente
Copy link
Copy Markdown
Contributor

Description: Move HTTP1 request flood checks from response encode to request decode.
Decode of a new request is guaranteed to happen on a shallow call stack under HCM, which makes it a better choice for flood checks since it reduces the number of unique call stacks under which FrameFloodException may be thrown.
Fixes several cases where FrameFloodException was thrown outside a try/catch block, resulting in an Envoy crash.
Risk Level: low
Testing: unit and integration
Docs Changes: n/a
Release Notes: n/a

Decode of a new request is guaranteed to happen on a shallow call stack under HCM, which makes it a better choice for flood checks since it reduces the number of unique call stacks under which FrameFloodException may be thrown.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I think moving this is much simpler than the try / catch PR, but I think you need to update some comments and probably docs. The logs, thrown error message and stats all talk about too many responses, where now we're reacting to an incoming request when there are too many responses.

@mattklein123
Copy link
Copy Markdown
Member

+1 I prefer this solution. I think we will need to make similar changes to HTTP/2 but I'm not sure? cc @yanavlasov.

Increase the number of requests processed by IntegrationTest.TestManyBadRequests

Signed-off-by: Antonio Vicente <avd@google.com>
@antoniovicente
Copy link
Copy Markdown
Contributor Author

Cool, I think moving this is much simpler than the try / catch PR, but I think you need to update some comments and probably docs. The logs, thrown error message and stats all talk about too many responses, where now we're reacting to an incoming request when there are too many responses.

I went through "git show f5b0294" and updated relevant comments and exception strings.

@antoniovicente
Copy link
Copy Markdown
Contributor Author

+1 I prefer this solution. I think we will need to make similar changes to HTTP/2 but I'm not sure? cc @yanavlasov.

The "throws FrameFloodException" in the H2 codec are protected by either dispatching_downstream_data_ or dispatching_, so I think uncaught exception shouldn't be a problem. I need to look into this further but I think we have the problem of incrementOutboundFrameCount only throwing when dispatching_downstream_data_ is true. Calls to ConnectionImpl::addOutboundFrameFragment under an upstream read callback will buffer infinitely (up to flow control limits).

mattklein123
mattklein123 previously approved these changes Mar 24, 2020
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks. Will defer to @alyssawilk for further review.

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I thought it was worth updating more docs but honestly I think this is clear enough. One fix to (prior) comments and you're good to go!

Signed-off-by: Antonio Vicente <avd@google.com>
mattklein123
mattklein123 previously approved these changes Mar 25, 2020
@mattklein123
Copy link
Copy Markdown
Member

@antoniovicente it looks like on TSAN IpVersions/IntegrationTest.TestFloodUpstreamErrors/IPv6 hung. Can you maybe try locally with --runs_per_test just to see if we now have a flake? Thank you.

/wait

alyssawilk
alyssawilk previously approved these changes Mar 25, 2020
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also LGTM modulo CI

… connection backs up and the response flood can be detected in IntegrationTest.TestFloodUpstreamErrors

Signed-off-by: Antonio Vicente <avd@google.com>
@antoniovicente
Copy link
Copy Markdown
Contributor Author

@antoniovicente it looks like on TSAN IpVersions/IntegrationTest.TestFloodUpstreamErrors/IPv6 hung. Can you maybe try locally with --runs_per_test just to see if we now have a flake? Thank you.

/wait

It is a consistent failure. The test case I added takes requires processing over a thousand tiny responses before it fills the TCP connection buffer. Setting SO_RCVBUF to 1024 on the client connection reduces the time until the socket backs up. In opt mode the test case runs in about 2 seconds, but under tsan it takes 20 seconds for each of ipv4/ipv6 (down from 120 secs each before the SO_RCVBUF change).

@antoniovicente antoniovicente changed the title Move HTTP1 request flood checks from response encode to request decode. [http] Move HTTP1 request flood checks from response encode to request decode. Mar 26, 2020
Signed-off-by: Antonio Vicente <avd@google.com>
@mattklein123 mattklein123 merged commit bdd849c into envoyproxy:master Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants