Skip to content

build: update grpc to 1.34.0#14147

Merged
mattklein123 merged 15 commits intoenvoyproxy:masterfrom
asraa:server-bugs
Jan 5, 2021
Merged

build: update grpc to 1.34.0#14147
mattklein123 merged 15 commits intoenvoyproxy:masterfrom
asraa:server-bugs

Conversation

@asraa
Copy link
Copy Markdown
Contributor

@asraa asraa commented Nov 23, 2020

Signed-off-by: Asra Ali asraa@google.com

Commit Message: Updates gRPC dependency to version 1.33.2. Also updates upb as a requirement, and removes a stale patch.
Additional Description:

  • Side effect, fixes fuzz bug.
  • gRPC adds frame size negotiation for ALTS. Used default.

Risk Level: Low, but not non-existent? There was a gRPC default change.
Testing: Added regression test for fuzz bug.
Fixes:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=27307

Signed-off-by: Asra Ali <asraa@google.com>
@asraa asraa requested a review from htuch as a code owner November 23, 2020 16:29
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Nov 23, 2020
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).

🐱

Caused by: #14147 was opened by asraa.

see: more, trace.

Signed-off-by: Asra Ali <asraa@google.com>
htuch
htuch previously approved these changes Nov 23, 2020
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch
Copy link
Copy Markdown
Member

htuch commented Nov 23, 2020

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Nov 23, 2020
@htuch htuch self-assigned this Nov 23, 2020
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Nov 23, 2020

Looks like there was a failure in test, after fixing, it seems there's a fake handshake shutdown failure due to a timeout. investigating.

Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Nov 24, 2020

@markdroth Hi Mark, quick question. I'm trying to update gRPC and ran into a timeout failure in IpVersions/AltsIntegrationTestClientInvalidPeer.ClientValidationFail/IPv4 when trying to shutdown the fake handshake server. Just curious, do you have a lead on this? Tried to check if there was anything different about Shutdown from 1.27 to 1.33, and couldn't find an immediate lead. Any idea here to start looking? Seems like Shutdown never returns. Thank you!

@moderation
Copy link
Copy Markdown
Contributor

moderation commented Nov 24, 2020

I ran into these same timeout issues on a prior attempt at #13394. The comment #13394 (comment) from @yashykt about the changed status codes looks like the closest thing I could find. We then decided to defer gRPC and bump protobuf independently.

The gRPC commit for the status changes is at https://github.com/grpc/grpc/pull/22901/files

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Nov 24, 2020

The gRPC commit for the status changes is at https://github.com/grpc/grpc/pull/22901/files

Thanks @moderation! I changed the status code already, which fixed some test issues, but not the Shutdown() timeout, which seemed independent. Let me trace through and see if onRemoteClose() is being called anywhere there or if I can relate it.

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Nov 24, 2020

@jiangtaoli2016 @yihuazhang Sorry for asking again, I'm attempting to bump gRPC again. The HTTP response code conversions are all fixed, but the timeout remains in the ALTS integration test.

I verified that the client validation actually fails, and that the issue is that Shutdown() never returns. I experimented with adding an deadline like Shutdown(now), but no luck. The handshake validation seems to fail Couldn't find peer's service account in peer_service_accounts: invalid_client_identity.

[2020-11-24 19:02:26.496][28][debug][conn_handler] [source/server/connection_handler_impl.cc:503] [C1] new connection
[2020-11-24 19:02:26.496][28][trace][connection] [source/common/network/connection_impl.cc:517] [C1] socket event: 2
[2020-11-24 19:02:26.496][28][trace][connection] [source/common/network/connection_impl.cc:625] [C1] write ready
[2020-11-24 19:02:26.496][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:45] [C1] TSI: doHandshake
[2020-11-24 19:02:26.496][12][trace][connection] [source/common/network/connection_impl.cc:517] [C0] socket event: 2
[2020-11-24 19:02:26.496][12][trace][connection] [source/common/network/connection_impl.cc:625] [C0] write ready
[2020-11-24 19:02:26.496][12][debug][connection] [source/common/network/connection_impl.cc:634] [C0] connected
[2020-11-24 19:02:26.496][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:54] [C0] TSI: doHandshake next: received: 0
[2020-11-24 19:02:26.497][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:45] [C0] TSI: doHandshake
[2020-11-24 19:02:26.498][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:79] [C0] TSI: doHandshake next done: status: 0 to_send: 10
[2020-11-24 19:02:26.498][12][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C0] write returns: 10
[2020-11-24 19:02:26.498][28][trace][connection] [source/common/network/connection_impl.cc:517] [C1] socket event: 3
[2020-11-24 19:02:26.498][28][trace][connection] [source/common/network/connection_impl.cc:625] [C1] write ready
[2020-11-24 19:02:26.498][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:45] [C1] TSI: doHandshake
[2020-11-24 19:02:26.498][28][trace][connection] [source/common/network/connection_impl.cc:555] [C1] read ready. dispatch_buffered_data=false
[2020-11-24 19:02:26.498][28][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C1] read returns: 10
[2020-11-24 19:02:26.498][28][trace][connection] [source/common/network/raw_buffer_socket.cc:39] [C1] read error: Resource temporarily unavailable
[2020-11-24 19:02:26.498][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:169] [C1] TSI: raw read result action 1 bytes 10 end_stream false
[2020-11-24 19:02:26.498][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:45] [C1] TSI: doHandshake
[2020-11-24 19:02:26.498][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:54] [C1] TSI: doHandshake next: received: 10
[2020-11-24 19:02:26.499][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:79] [C1] TSI: doHandshake next done: status: 0 to_send: 21
[2020-11-24 19:02:26.499][28][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C1] write returns: 21
[2020-11-24 19:02:26.499][12][trace][connection] [source/common/network/connection_impl.cc:517] [C0] socket event: 3
[2020-11-24 19:02:26.499][12][trace][connection] [source/common/network/connection_impl.cc:625] [C0] write ready
[2020-11-24 19:02:26.499][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:45] [C0] TSI: doHandshake
[2020-11-24 19:02:26.499][12][trace][connection] [source/common/network/connection_impl.cc:555] [C0] read ready. dispatch_buffered_data=false
[2020-11-24 19:02:26.499][12][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C0] read returns: 21
[2020-11-24 19:02:26.499][12][trace][connection] [source/common/network/raw_buffer_socket.cc:39] [C0] read error: Resource temporarily unavailable
[2020-11-24 19:02:26.499][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:169] [C0] TSI: raw read result action 1 bytes 21 end_stream false
[2020-11-24 19:02:26.499][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:45] [C0] TSI: doHandshake
[2020-11-24 19:02:26.499][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:54] [C0] TSI: doHandshake next: received: 21
[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:79] [C0] TSI: doHandshake next done: status: 0 to_send: 14
**[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:100] [C0] TSI: Handshake successful: peer properties: 5**
[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:103] [C0]   certificate_type: ALTS
[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:103] [C0]   service_account: peer_identity
[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:103] [C0]   rpc_versions: 

[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:103] [C0]   alts_context: 
peer_identity*local_identity2
[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:103] [C0]   security_level: TSI_PRIVACY_AND_INTEGRITY
**[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:112] [C0] TSI: Handshake validation failed: Couldn't find peer's service account in peer_service_accounts: invalid_client_identity**
[2020-11-24 19:02:26.500][12][debug][connection] [source/common/network/connection_impl.cc:117] [C0] closing data_to_write=0 type=1
[2020-11-24 19:02:26.500][12][debug][connection] [source/common/network/connection_impl.cc:213] [C0] closing socket: 1
[2020-11-24 19:02:26.500][12][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:232] [C0] TSI: closing socket
[2020-11-24 19:02:26.500][12][trace][main] [source/common/event/dispatcher_impl.cc:213] item added to deferred deletion list (size=1)
[2020-11-24 19:02:26.500][28][trace][connection] [source/common/network/connection_impl.cc:517] [C1] socket event: 3
[2020-11-24 19:02:26.500][12][debug][client] [source/common/http/codec_client.cc:96] [C0] disconnect. resetting 0 pending requests
[2020-11-24 19:02:26.500][28][trace][connection] [source/common/network/connection_impl.cc:625] [C1] write ready
[2020-11-24 19:02:26.500][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:45] [C1] TSI: doHandshake
[2020-11-24 19:02:26.500][28][trace][connection] [source/common/network/connection_impl.cc:555] [C1] read ready. dispatch_buffered_data=false
[2020-11-24 19:02:26.500][12][trace][main] [source/common/event/dispatcher_impl.cc:90] clearing deferred deletion list (size=1)
[2020-11-24 19:02:26.500][28][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C1] read returns: 0
[2020-11-24 19:02:26.500][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:169] [C1] TSI: raw read result action 1 bytes 0 end_stream true
[2020-11-24 19:02:26.500][28][debug][connection] [source/common/network/connection_impl.cc:602] [C1] remote close
[2020-11-24 19:02:26.500][28][debug][connection] [source/common/network/connection_impl.cc:213] [C1] closing socket: 0
[2020-11-24 19:02:26.500][28][debug][connection] [source/extensions/transport_sockets/alts/tsi_socket.cc:232] [C1] TSI: closing socket
[2020-11-24 19:02:26.500][28][trace][main] [source/common/event/dispatcher_impl.cc:213] item added to deferred deletion list (size=1)
[2020-11-24 19:02:26.500][28][debug][conn_handler] [source/server/connection_handler_impl.cc:164] [C1] adding to cleanup list
[2020-11-24 19:02:26.500][28][trace][main] [source/common/event/dispatcher_impl.cc:213] item added to deferred deletion list (size=2)
[2020-11-24 19:02:26.500][28][trace][main] [source/common/event/dispatcher_impl.cc:213] item added to deferred deletion list (size=3)
[2020-11-24 19:02:26.500][28][trace][main] [source/common/event/dispatcher_impl.cc:90] clearing deferred deletion list (size=3)
[2020-11-24 19:02:28.478][24][info][main] [source/server/drain_manager_impl.cc:70] shutting down parent after drain
[2020-11-24 19:02:31.442][24][debug][main] [source/server/server.cc:196] flushing stats
[2020-11-24 19:02:36.446][24][debug][main] [source/server/server.cc:196] flushing stats
[2020-11-24 19:02:41.451][24][debug][main] [source/server/server.cc:196] flushing stats
[2020-11-24 19:02:46.454][24][debug][main] [source/server/server.cc:196] flushing stats

This is what the trace logs used to look like https://gist.github.com/asraa/195b3b05ed217763a79f3d96f323c6fb. They're identical besides additional properties.

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Nov 25, 2020

Confirmed that grpc/grpc@837c4ce#diff-ff0681fdeb3cda181d2ac2d4b80915cd is not 100% the root cause. With a patch that reverts this commit, the tests crash, so the patch did it's part in preventing the crash. As far as the timeout, still no dice.

@markdroth
Copy link
Copy Markdown
Contributor

@jiangtaoli2016 may be able to help with the ALTS problem.

@jiangtaoli2016
Copy link
Copy Markdown

Just back from OOO. It looks like the problem is at the shutdown of of the FakeHandshakerService. @apolcyn last changed it with this PR.

@asraa Could you revert that PR and try and see if it can shutdown correctly?

@yihuazhang
Copy link
Copy Markdown
Contributor

yihuazhang commented Nov 30, 2020

@jiangtaoli2016 I do not think the PR is the culprit because it is introduced in gRPC version 1.26, and ALTS integration test still works with version 1.27.

Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Nov 30, 2020

update: @jiangtaoli2016 helped debug and fixed the shutdown happening in client validation fail. change pushed

however, another timeout is happening in CheckAltsVersion.

is stuck in an infinite loop waiting for a client request. will get back to this later.

@yihuazhang
Copy link
Copy Markdown
Contributor

@asraa, regarding the timeout happened in CheckAltsVersion, I think after extracting the RPC version from the first client and server handshake message at line (https://github.com/envoyproxy/envoy/blob/master/test/extensions/transport_sockets/alts/alts_integration_test.cc#L73), it should return OK, instead of hang there.

Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Nov 30, 2020

@yihuazhang Thank you so much! Tests pass :)

@htuch
Copy link
Copy Markdown
Member

htuch commented Dec 1, 2020

@asraa can you take a look at format etc?

asraa added 2 commits December 1, 2020 08:15
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Dec 1, 2020
asraa added 2 commits December 2, 2020 13:34
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@moderation
Copy link
Copy Markdown
Contributor

@asraa gRPC 1.34.0 is out if you want to try and jump all the way to latest - https://github.com/grpc/grpc/releases/tag/v1.34.0

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Dec 3, 2020

Will do! I'm not sure why I keep hitting CI errors. Looks like one after the merge.

asraa added 2 commits December 3, 2020 08:41
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Dec 3, 2020

Bumped to 1.34.0. It looks like I'm hitting this

// TODO(mattklein123): clang-tidy is showing a use after move when passing to
in clang_tidy.

Signed-off-by: Asra Ali <asraa@google.com>
@asraa asraa changed the title build: update grpc to 1.33.2 build: update grpc to 1.34.0 Dec 3, 2020
Comment on lines -777 to +780
version = "8a3ae1ef3e3e3f26b45dec735c5776737fc7247f",
sha256 = "e9f281c56ab1eb1f97a80ca8a83bb7ef73d230eabb8591f83876f4e7b85d9b47",
version = "382d5afc60e05470c23e8de19b19fc5ad231e732",
sha256 = "7992217989f3156f8109931c1fc6db3434b7414957cb82371552377beaeb9d6c",
strip_prefix = "upb-{version}",
urls = ["https://github.com/protocolbuffers/upb/archive/{version}.tar.gz"],
use_category = ["controlplane"],
release_date = "2019-11-19",
release_date = "2020-08-03",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest upb if you want to use that is:

    upb = dict(
        project_name = "upb",
        project_desc = "A small protobuf implementation in C (gRPC dependency)",
        project_url = "https://github.com/protocolbuffers/upb",
        version = "794ce6d06188c6df290b4b46b4c32e95a219eafc",
        sha256 = "1920bdcb3b963db424ce0a0e6cdcc9d5b5d48ef3c95bb79823ba3c8f63422d01",
        release_date = "2020-11-26",
        strip_prefix = "upb-{version}",
        urls = ["https://github.com/protocolbuffers/upb/archive/{version}.tar.gz"],
        use_category = ["controlplane"],
        cpe = "N/A",
    ),

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Dec 3, 2020

CI is passing besides the existing clang_tidy issue

@moderation
Copy link
Copy Markdown
Contributor

/cc @mattklein123 as

// TODO(mattklein123): clang-tidy is showing a use after move when passing to
is marked as his TODO

@moderation
Copy link
Copy Markdown
Contributor

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Dec 3, 2020
@mattklein123 mattklein123 assigned lizan and unassigned htuch Dec 18, 2020
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Dec 21, 2020

Upstream merged, clang-tidy is still pre-existing.

@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Dec 21, 2020
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Jan 5, 2021

@mattklein123 @moderation Sorry for bumping this again, but merged in main, clang-tidy is still failing on existing TODO

@mattklein123
Copy link
Copy Markdown
Member

OK let's force merge given it's an existing issue.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 7fa6aa5 into envoyproxy:master Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deps Approval required for changes to Envoy's external dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants