Update fork from envoyproxy/envoy master by larrywest · Pull Request #1 · larrywest/envoy

larrywest · 2019-04-23T16:22:53Z

Thanks to @KirstieJane for the steps to do this: https://github.com/KirstieJane/STEMMRoleModels/wiki/Syncing-your-fork-to-the-original-repository-via-the-browser

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]

Introduce a new "safe" init manager, to replace the existing one that's prone to use-after-free issues (see e.g. #6116). Users of the existing init manager will be upgraded one-by-one in subsequent PRs if this design is approved. See also previous false starts in PRs #6136 and #6245. Risk Level: Low, no existing users of the existing init manager are changed in this PR. Testing: New unit tests added. Docs Changes: n/a Release Notes: n/a Signed-off-by: Dan Rosen <mergeconflict@google.com>

Signed-off-by: Derek Argueta <dereka@pinterest.com>

We need to think about whether we want to have all of these somehow reference some type of environment variable that would point to the right image in the context of the tree the user is looking at, but given that the trunk documentation may require a master build, this is more correct. Signed-off-by: Matt Klein <mklein@lyft.com>

Signed-off-by: Derek Argueta <dereka@pinterest.com>

Previously, we incremented rq_total and upstream_rq_total in the HTTP/1 conn pool even if the request ended up being circuit broken. The stats were not incremented for HTTP/2 requests. This change no longer increments the stats for HTTP/1 circuit broken requests for consistency between the two. Signed-off-by: Spencer Lewis <slewis@squareup.com>

Address one TOTO in that file that (D)CHECK is not explicit listed in platform API, but is supposed to be defined in some impl. Define them in quic_logging_impl.h seems appropriate. Risk Level: low, not in use Part of #2557 Signed-off-by: Dan Zhang <danzh@google.com>

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>

Update some documentation comments in api/envoy/service/auth/v2/*.proto to more accurately describe the *current* behavior (without making any judgment on whether that behavior is "correct" or desirable). Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

This filter decodes the ZooKeeper wire protocol and emits stats & metadata about requests, responses and events. This wire protocol parsing is based on: https://github.com/twitter/zktraffic https://github.com/rgs1/zktraffic-cpp The actual filter structure is based on the Mysql proxy filter. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>

Updating per new file locations. Updates (unused) reloadable flags to default true. Risk Level: n/a (tooling) Testing: manual Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Risk Level: n/a Testing: n/a Docs Changes: yes Release Notes: no Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Remove entry from the "initial resource versions" map when the server informs us that the corresponding resource has gone away. Risk Level: low #4991 Signed-off-by: Fred Douglas <fredlas@google.com>

Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>

…_bug_tracker_impl.h QUICHE platform implementation (#6339) Add quic_expect_bug_impl.h, (spdy|http2)_logging_impl.h, (spdy|http2)_bug_tracker_impl.h QUICHE platform implementation. All of them depends on quic_logging_impl.h. Risk Level: minimum, code not used yet. Testing: bazel test test/extensions/quic_listeners/quiche/platform:spdy_platform_test --test_output=all --define quiche=enabled bazel test test/extensions/quic_listeners/quiche/platform:http2_platform_test --test_output=all --define quiche=enabled bazel test test/extensions/quic_listeners/quiche/platform:quic_platform_test --test_output=all --define quiche=enabled bazel test @com_googlesource_quiche//:spdy_platform_test --test_output=all --define quiche=enabled bazel test @com_googlesource_quiche//:http2_platform_test --test_output=all --define quiche=enabled bazel test @com_googlesource_quiche//:quic_platform_test --test_output=all --define quiche=enabled Signed-off-by: Bin Wu <wub@google.com>

We want to limit the number of connection pools per cluster. Add it to the circut breaker thresholds so we can do it per priority. Signed-off-by: Kyle Larose <kyle@agilicus.com>

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

Part of #6361 Signed-off-by: Matt Klein <mklein@lyft.com>

Signed-off-by: Michael Rebello <mrebello@lyft.com>

Signed-off-by: Derek Argueta <dereka@pinterest.com>

Signed-off-by: Dan Rosen <mergeconflict@google.com>

…ime. (#6369) * Rework guarddog_impl.cc using timers rather than condvar timed waits. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Add QuicFileUtilsImpl using Envoy::FileSystem. Risk Level: low Testing: Added tests in test/extensions/quic_listeners/quiche/platform/quic_platform_test.cc and tested with --define quiche=enabled Part of #2557 Signed-off-by: Dan Zhang <danzh@google.com>

Fixing up a TODO - fitting all route config options simply doesn't scale, so refactoring things so we don't have functions with infinite arguments. Risk Level: n/a (test only) Testing: integration test pass Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

…cation for Any and hosts deprecation for load_assignment (#6368) Update examples for Struct deprecation for Any Risk Level: Low - generated configs only, no changes to code Testing: bazel build //configs:example_configs, bazel test //test/... Docs Changes: None required Release Notes: None required Fixes #6025 Replaces #6356 Related #6346 Signed-off-by: Michael Payne <michael@sooper.org>

Signed-off-by: Harvey Tuch <htuch@google.com>

Support google_default in channel credentials configuration. The documentation mentions this option and yet it's ignored. Risk Level: Low, the option was seemingly useless/unused. If anybody relies on it doing nothing, they can just unset it. Testing: Tried running my own envoy, seemed to pick up the credentials pointed to be GOOGLE_APPLICATION_CREDENTIALS environment variable. Signed-off-by: qfel <qfel.pl@gmail.com>

This fixes a bug where hosts that were moved between priorities would not be included in the hosts_added vector, resulting in crashes if the same host was moved multiple times when used with active health checking: if a host was moved between priorities twice, it would first get removed from the health checker, then on the second move the health checker would crash as it would attempt to remove a host it didn't know about. We fix this by explicitly adding the existing host to the list of added hosts iff the host was previously in a different priority. Uncovering this bug lead to the discovery of a bug in the batch updating done during EDS: std::set_difference assumes that the provided ranges are both *sorted*, which is not generally true during this update flow. This meant that the filtering of hosts that were added/removed did not work correctly, and would produce inconsistent result dependent on the ordering of the host pointers in the unordered_map. We fix this by using a standard for loop instead of std::set_difference. Not only is this more correct, it should also be faster for large sets as it performs the filtering in O(n) instead of O(n^2). Signed-off-by: Snow Pettersen <snowp@squareup.com>

The change breaks the existing Redis operation, for example redis-cli -p [WHATEVER] GET 1 crashes Envoy. This reverts commit 046e989. Signed-off-by: Nicolas Flacco <nflacco@lyft.com>

This allows us to move the new runtime APIs over to string_view without taking a string-serialization performance hit. see https://abseil.io/docs/cpp/guides/container for flat_hash_map being a unordered_map replacement with heterogeneous lookup for string_view. Risk Level: Medium (swapping the underlying internals of runtime) Testing: existing tests pass Docs Changes: no Release Notes: no

Add integration tests around HTTP timeouts in the router filter including per try and global timeout. Risk Level: Low Testing: integration tests Signed-off-by: Michael Puncel <mpuncel@squareup.com>

Add support for specifying _stale_after timeout as part of ClusterLoadAssignment Risk Level: Low Optional Feature that is triggered by the Management Server. Defaults to noop. Testing: Unit test Docs Changes: None Release Notes: None Fixes #6420 Signed-off-by: Vishal Powar <vishalpowar@google.com>

Modified from https://raw.githubusercontent.com/dastergon/postmortem-templates/master/templates/postmortem-template-srebook.md and https://landing.google.com/sre/book/chapters/postmortem.html. Signed-off-by: Harvey Tuch <htuch@google.com>

Signed-off-by: Matt Klein <mklein@lyft.com>

GitHub was complaining that 2.10 was problematic security wise; I don't think it's an issue in our environment, but this should make the warnings go away. Signed-off-by: Harvey Tuch <htuch@google.com>

) Signed-off-by: Gabriel <gsagula@gmail.com>

Created OpenRCA service proto file based on ORCA design Risk Level: Low Signed-off-by: Chengyuan Zhang <chengyuanzhang@google.com>

Default behavior remains unchanged: retries will use the runtime parameter defaulted to 25ms as the base interval and 250ms as the maximum. Allows routes to customize the base and maximum intervals. Risk Level: low (no change to default behavior) Testing: unit tests Doc Changes: included, plus updated description of back-off algorithm Release Notes: added Signed-off-by: Stephan Zuercher <zuercher@gmail.com>

Signed-off-by: Dan Zhang <danzh@google.com>

Signed-off-by: Chris Paika <paika.christopher@gmail.com>

@htuch

@htuch discovered a race condition in my libevent watcher implementation in the process of enabling TSAN for dependencies (#6610). Update libevent to pull in the fix (libevent/libevent#793). Risk Level: low Testing: bazel test //test/server:worker_impl_test -c dbg --config=clang-tsan --runs_per_test=1000 (with @htuch's patch applied). Signed-off-by: Dan Rosen <mergeconflict@google.com>

This change alters the behavior of fault data limiting by resetting the token bucket to a single token when data initially starts streaming. This makes sure that data pacing is as expecting, while still allowing per-second bursting if the data provider is also bursty. Signed-off-by: Matt Klein <mklein@lyft.com>

Signed-off-by: Nicolas Flacco <nflacco@lyft.com>

Signed-off-by: Matt Klein <mklein@lyft.com>

Description: add ppc64le badge that links to Jenkins build server Risk Level: Low - Docs only Testing: Viewed in browser and through GH markdown viewer Docs Changes: N/A Release Notes: support ppc64le CPU architecture Fixes: #5196 Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>

) Description: add formatting for the "response code details" string recently added to the StreamInfo (#6530) Risk Level: low Testing: unit tests Docs Changes: updated Release Notes: updated Signed-off-by: Elisha Ziskind <eziskind@google.com>

This test waits for the upstream to see a reset which confirms that the router filter did the right thing when the global timeout is hit. However since this involves the network, we would occasionally see the reset after the wait call. Since we were waiting for 0ms we'd get flakes. 15s is hopefully high enough that the test will succeed reliably. Signed-off-by: Michael Puncel <mpuncel@squareup.com>

Signed-off-by: Snow Pettersen <snowp@squareup.com>

…otocol spec (#6545) realized that, with the unreliable queue implementation copied from SotW xDS, delta xDS could get into a state where Envoy thinks it has subscribed, but the server hasn't heard the subscription, with no way for either to realize the mistake. I fixed that by converting the queue setup to a cleaner "do I currently want to send a request?" with the request's (un)subscriptions only populated immediately before the request is actually sent into gRPC. While doing that, I further realized there was a problem when a given resource was subscribed then unsubscribed (or reversed), all in between request sends. I made sure Envoy handles that sensibly, and added explicit requirements to the xDS protocol spec to ensure servers will also handle it sensibly. Added unit tests for those fixes. Risk Level: low Testing: added unit tests for bugs uncovered #4991 Signed-off-by: Fred Douglas <fredlas@google.com>

Signed-off-by: Derek Schaller <dschaller@lyft.com>

Signed-off-by: Bin Wu <wub@google.com>

Signed-off-by: Derek Schaller <dschaller@lyft.com>

This defers starting the per try timeout timer until onRequestComplete to ensure that it is not started before the global timeout. This ensures that the per try timeout will not take into account the time spent reading the downstream, which should be responsibility of the HCM level timeouts. Signed-off-by: Snow Pettersen <snowp@squareup.com>

This adds support for modifying the grpc-timeout provided by the downstream by some offset. This is useful to make sure that Envoy is able to see timeouts before the gRPC client does, as the client will cancel the request when the deadline has been exceeded which hides the timeout from the outlier detector. Signed-off-by: Snow Pettersen <snowp@squareup.com>

It is no longer needed since Api::Api is plumbed ubiquitiously throughout Envoy's core. The only user of the factory, QuicThreadImpl, has been modified to take the Envoy::Thread::ThreadFactory via QuicThreadImpl::setThreadFactory(). Signed-off-by: Andres Guedez <aguedez@google.com>

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

This PR moves the xds protocol from md to rst. Risk Level: Low Testing: N/A Docs Changes: N/A Release Notes: N/A Fixes #6338 Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

* Adds SharedStatNameStorageSet. Signed-off-by: Joshua Marantz <jmarantz@google.com>

KirstieJane · 2019-04-23T16:32:13Z

I think those instructions might be the most useful thing I’ve ever written 😂 Glad you found them! 💖

Dan Rosen and others added 30 commits March 22, 2019 15:48

convert HCM test configs to v2 YAML (#6354)

7b1909b

Signed-off-by: Derek Argueta <dereka@pinterest.com>

remove v1 Redis HC tests (#6367)

bea9cd0

Signed-off-by: Derek Argueta <dereka@pinterest.com>

fix NPE In refreshCachedRoute (#6359)

dcf5544

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>

tools: updating deprecation scripts (#6289)

03b28bd

Updating per new file locations. Updates (unused) reloadable flags to default true. Risk Level: n/a (tooling) Testing: manual Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

owners: promoting Lizan to senior maintainer! (#6374)

1899110

Risk Level: n/a Testing: n/a Docs Changes: yes Release Notes: no Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

config: Remove entries from initial resource versions map (#6320)

78ad883

Remove entry from the "initial resource versions" map when the server informs us that the corresponding resource has gone away. Risk Level: low #4991 Signed-off-by: Fred Douglas <fredlas@google.com>

redis: prefixed routing (#5658)

046e989

Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>

upstream: allow configuration of connection pool limits (#6298)

7de2b39

We want to limit the number of connection pools per cluster. Add it to the circut breaker thresholds so we can do it per priority. Signed-off-by: Kyle Larose <kyle@agilicus.com>

docs: fixing a bad merge (#6385)

aab0545

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

http fault: implement header controlled faults (#6318)

805683f

Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

docs: start the work of snapping docs to the repo/docker (#6376)

00400d2

Part of #6361 Signed-off-by: Matt Klein <mklein@lyft.com>

docs: fix a few typos in intro (#6390)

71b3ecd

Signed-off-by: Michael Rebello <mrebello@lyft.com>

test: convert router v1 JSON test configs to v2 YAML (#6332)

2a2a886

Signed-off-by: Derek Argueta <dereka@pinterest.com>

init: replace old init manager with with new "safe" init manager (#6360)

e1450a1

Signed-off-by: Dan Rosen <mergeconflict@google.com>

time: sim-time thread safety and move guard-dog fully into abstract t…

fcb7af6

…ime. (#6369) * Rework guarddog_impl.cc using timers rather than condvar timed waits. Signed-off-by: Joshua Marantz <jmarantz@google.com>

api: reserve HCM field for pending security fix. (#6397)

8ba28c3

Signed-off-by: Harvey Tuch <htuch@google.com>

Revert "redis: prefixed routing (#5658)" (#6401)

bacd89e

The change breaks the existing Redis operation, for example redis-cli -p [WHATEVER] GET 1 crashes Envoy. This reverts commit 046e989. Signed-off-by: Nicolas Flacco <nflacco@lyft.com>

mpuncel and others added 28 commits April 17, 2019 15:14

add HTTP integration tests exercising timeouts (#6621)

504e15f

Add integration tests around HTTP timeouts in the router filter including per try and global timeout. Risk Level: Low Testing: integration tests Signed-off-by: Michael Puncel <mpuncel@squareup.com>

tools: check spelling in pre-push hook (#6631)

c2e8e3f

Signed-off-by: Matt Klein <mklein@lyft.com>

build: update jinja to 2.10.1. (#6623)

788e66d

GitHub was complaining that 2.10 was problematic security wise; I don't think it's an issue in our environment, but this should make the warnings go away. Signed-off-by: Harvey Tuch <htuch@google.com>

ext_authz: option for clearing route cache of authorized requests (#6503

0e109cb

) Signed-off-by: Gabriel <gsagula@gmail.com>

api: create OpenRCA service proto file (#6497)

5cb2229

Created OpenRCA service proto file based on ORCA design Risk Level: Low Signed-off-by: Chengyuan Zhang <chengyuanzhang@google.com>

quiche: Implement SpdyUnsafeArena using SpdySimpleArena (#6612)

a039b9d

Signed-off-by: Dan Zhang <danzh@google.com>

examples: standardize docker-compose version and yaml extension (#6613)

4050392

Signed-off-by: Chris Paika <paika.christopher@gmail.com>

Batch implementation with timer (#6452)

dc3467a

Signed-off-by: Nicolas Flacco <nflacco@lyft.com>

Revert dispatcher stats (#6649)

43e06d2

Signed-off-by: Matt Klein <mklein@lyft.com>

docs: update docs to recommend /retest repokitteh command (#6655)

7a8f4b0

Signed-off-by: Snow Pettersen <snowp@squareup.com>

docs: add aspell to mac dependencies to fix check format script (#6661)

8ceb9c7

Signed-off-by: Derek Schaller <dschaller@lyft.com>

Implement some TODOs in quic_endian_impl.h (#6644)

fdb4f1a

Signed-off-by: Bin Wu <wub@google.com>

update bazel readme for clang-format-8 on mac (#6660)

2ae3322

Signed-off-by: Derek Schaller <dschaller@lyft.com>

fix version history order (#6671)

60241e3

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

docs: move xds protocol to rst (#6670)

a3fe3c6

This PR moves the xds protocol from md to rst. Risk Level: Low Testing: N/A Docs Changes: N/A Release Notes: N/A Fixes #6338 Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

stats: add/test heterogenous set of StatNameStorage objects. (#6504)

629bbfb

* Adds SharedStatNameStorageSet. Signed-off-by: Joshua Marantz <jmarantz@google.com>

larrywest merged commit 989ae8e into larrywest:master Apr 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update fork from envoyproxy/envoy master#1

Update fork from envoyproxy/envoy master#1
larrywest merged 165 commits intolarrywest:masterfrom
envoyproxy:master

larrywest commented Apr 23, 2019

Uh oh!

KirstieJane commented Apr 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

larrywest commented Apr 23, 2019

Uh oh!

KirstieJane commented Apr 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants