fix regression in router that can cause a crash on response timeout by mpuncel · Pull Request #7192 · envoyproxy/envoy

mpuncel · 2019-06-06T16:55:53Z

Prior to the recent retry hedging change, upstream requests were always
reset as they should be when a response timeout is triggered. The retry
hedging change inadvertently changed the logic so that the upstream
request is only reset if it hadn't seen headers. However it's possible
to hit the timeout after seeing headers and before end_stream which
would result in a crash as the body or trailers are attempted to be
decoded using a null pointer.

Signed-off-by: Michael Puncel mpuncel@squareup.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description: Fixes a regression in the router that can cause a crash when an upstream request has seen headers but not end_stream when a response timeout is hit
Risk Level: low
Testing: integration test
Docs Changes: N/A
Release Notes: N/A
Fixes #7154

Prior to the recent retry hedging change, upstream requests were always reset as they should be when a response timeout is triggered. The retry hedging change inadvertently changed the logic so that the upstream request is only reset if it hadn't seen headers. However it's possible to hit the timeout after seeing headers and before end_stream which would result in a crash as the body or trailers are attempted to be decoded using a null pointer. Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel · 2019-06-06T16:56:43Z

test/integration/http_timeout_integration_test.cc

+  timeSystem().sleep(std::chrono::milliseconds(200));
+
+  // Respond with body.
+  upstream_request_->encodeData(100, true);


this crashed before the fix was applied

lizan

Great, just one nit.

lizan · 2019-06-06T17:44:55Z

test/integration/http_timeout_integration_test.cc

+                                          {":authority", "host"},
+                                          {"x-forwarded-for", "10.0.0.1"},
+                                          {"x-envoy-upstream-rq-timeout-ms", "100"}};
+  auto encoder_decoder = codec_client_->startRequest(request_headers);


use auto only if the type is obvious (e.g. cast, make_unique/shared, assigning literals).

mattklein123

Nice, thanks. LGTM modulo @lizan comment.

alyssawilk

Thanks both for the fix and the regression test!

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

rgs1 · 2019-06-06T19:39:38Z

Testing this with traffic that triggered the crash for us and thus far, no more crashes 👍

lizan · 2019-06-06T20:04:38Z

/retest

repokitteh-read-only · 2019-06-06T20:04:42Z

🔨 rebuilding ci/circleci: ipv6_tests (failed build)

🐱

Caused by: a #7192 (comment) was created by @lizan.

see: more, trace.

lizan · 2019-06-06T21:31:21Z

The IPv6 test failure seems real, @mpuncel ?
https://230461-65214191-gh.circle-artifacts.com/0/tmp/envoy-docker/envoy/generated/failed-testlogs/test/integration/http_timeout_integration_test/test.log

mpuncel · 2019-06-07T00:15:13Z

Unfortunately I can't seem to reproduce that in gdb or figure out line numbers from stack_decode.py which means I'm kind of stuck. It does reproduce about 20 times in 1000 though on my machine

Sending a body causes the ipv6 test to segfault for some reason, perhaps that's not a sane thing to try in the integration tests (send data after reset). Asserting a reset is just as good from a testing standpoint since it verifies the correct router behavior. Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel · 2019-06-07T00:30:20Z

OK think I lucked out with a guess on what was going on. Now it passes after 1000 runs. Maybe the way the fake upstreams work trying to write a body after a reset is a recipe for badness

mattklein123

Nice

mpuncel commented Jun 6, 2019

View reviewed changes

lizan reviewed Jun 6, 2019

View reviewed changes

lizan assigned alyssawilk Jun 6, 2019

mattklein123 reviewed Jun 6, 2019

View reviewed changes

alyssawilk previously approved these changes Jun 6, 2019

View reviewed changes

PR feedback

7b0b651

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel dismissed alyssawilk’s stale review via 7b0b651 June 6, 2019 18:23

fix format

1b11029

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

lizan previously approved these changes Jun 6, 2019

View reviewed changes

mpuncel dismissed lizan’s stale review via bea49ec June 7, 2019 00:29

mattklein123 approved these changes Jun 7, 2019

View reviewed changes

mattklein123 merged commit 9d47062 into envoyproxy:master Jun 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix regression in router that can cause a crash on response timeout#7192

fix regression in router that can cause a crash on response timeout#7192
mattklein123 merged 4 commits intoenvoyproxy:masterfrom
mpuncel:mpuncel/fix-timeout-regression

mpuncel commented Jun 6, 2019

Uh oh!

mpuncel Jun 6, 2019

Uh oh!

lizan left a comment

Uh oh!

lizan Jun 6, 2019

Uh oh!

mattklein123 left a comment

Uh oh!

alyssawilk left a comment

Uh oh!

rgs1 commented Jun 6, 2019

Uh oh!

lizan commented Jun 6, 2019

Uh oh!

repokitteh-read-only bot commented Jun 6, 2019

Uh oh!

lizan commented Jun 6, 2019

Uh oh!

mpuncel commented Jun 7, 2019 •

edited

Loading

Uh oh!

mpuncel commented Jun 7, 2019

Uh oh!

mattklein123 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mpuncel commented Jun 6, 2019

Uh oh!

mpuncel Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

lizan left a comment

Choose a reason for hiding this comment

Uh oh!

lizan Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

alyssawilk left a comment

Choose a reason for hiding this comment

Uh oh!

rgs1 commented Jun 6, 2019

Uh oh!

lizan commented Jun 6, 2019

Uh oh!

repokitteh-read-only bot commented Jun 6, 2019

Uh oh!

lizan commented Jun 6, 2019

Uh oh!

mpuncel commented Jun 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mpuncel commented Jun 7, 2019

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mpuncel commented Jun 7, 2019 •

edited

Loading