upstream: Implement retry concurrency budgets#9069
upstream: Implement retry concurrency budgets#9069mattklein123 merged 77 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
|
/retest |
|
🔨 rebuilding |
|
/retest |
|
🔨 rebuilding |
Signed-off-by: Tony Allen <tallen@lyft.com>
snowp
left a comment
There was a problem hiding this comment.
Looks pretty good, just a couple comments
|
|
||
| // If a retry budget was configured, we cannot exceed the configured percentage of total | ||
| // outstanding requests/connections. | ||
| const uint64_t current_active = cluster_.resourceManager(priority_).connections().count() + |
There was a problem hiding this comment.
This seems a bit iffy to me: today kinda this works before for H/1.1 and H/2 make use of either max_connections OR max_requests so this works, but there are requested features that might introduce max_connections to H/2 as well (#7403). Envoy also doesn't reclaim H/1.1 connections until it needs to, so I think using max_connections to get a sense for the current concurrency isn't perfect.
I think ideally we'd be counting # of requests for HTTP/1.1 and just do max_requests + max_pending, but I don't think we can just add that enforcement safely at this point (since users might have bumped max_con but not max_req). Maybe just note this limitation in the docs for now?
There was a problem hiding this comment.
Noted in the docs. Let me know if the latest patch communicates this well enough.
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
|
/wait-any |
|
Apologies but can you merge master and format one more time? It should fix all the CI/random format issues. /wait |
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
|
Sorry needs another master merge. /wait |
|
I’ll just wait until we’re all back in the office to avoid further
conflicts.
On Fri, Jan 3, 2020 at 11:20 AM Matt Klein ***@***.***> wrote:
Sorry needs another master merge.
/wait
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9069?email_source=notifications&email_token=AAIOZ7RKT5SY7QYPM5IH7VLQ35QUXA5CNFSM4JO5HESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIBPVZA#issuecomment-570620644>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIOZ7TYVNDJ64XQQWD5M2TQ35QUXANCNFSM4JO5HESA>
.
--
Tony Allen
Software Engineer
<https://www.lyft.com/>
|
Signed-off-by: Tony Allen <tony@allen.gg>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM modulo merge issue. Great work!
/wait
| // 2019/11/15 9040 35029 35500 build: update protobuf to 3.10.1 | ||
| // 2019/11/15 9040 35061 35500 upstream: track whether cluster is local | ||
| // 2019/12/10 8779 35053 35000 use var-length coding for name lengths | ||
| // 2019/12/20 8779 35053 35000 use var-length coding for name lengths |
There was a problem hiding this comment.
This is still a merge issue. To avoid going back and forth on this again can you fix this in your next change?
Since envoyproxy#9069, macOS builds have been failing because they use slightly more memory for stats than the new limit. Bump limit to next even multiple of 1000. Risk Level: low, test change only Docs Changes: n/a Release Notes: n/a Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
This patch implements
RetryBudget, a new configuration option in the retry policy that limits allowed outstanding retries. The budget is specified as a percentage of the current outstanding requests/connections. Each retry budget must specify a minimum concurrency value to the budgets, so that one can retry when there are low concurrency values. This configuration is optional and will not change existing behavior unless configured. If configured, the retry budgets will override anymax_retriescircuit breaker configuration.For example, a budget of 20% with a minimum retry concurrency of 3 will allow 5 active retries while there are 25 active requests. If there are 2 active requests, there are still 3 active retries allowed because of the minimum retry concurrency.
This approach to limiting retries and mitigating retry storms is useful, because it allows one to think in terms of worst-case increases in traffic due to retries. Retry circuit breakers are fixed values and can potentially limit retries when the mesh can handle the increased volume. This is especially beneficial for adaptive concurrency for high RPS services.
Fixes: #8727
Risk Level: Low
Testing: Unit tests
Docs Changes: Stats and proto.