Skip to content

Dependencies: dataplane_core, controlplane updates: gRPC, protobuf, upd, xxhash, spdlog, zlib-ng, rules_go#13394

Closed
moderation wants to merge 23 commits intoenvoyproxy:masterfrom
moderation:master
Closed

Dependencies: dataplane_core, controlplane updates: gRPC, protobuf, upd, xxhash, spdlog, zlib-ng, rules_go#13394
moderation wants to merge 23 commits intoenvoyproxy:masterfrom
moderation:master

Conversation

@moderation
Copy link
Copy Markdown
Contributor

@moderation moderation commented Oct 5, 2020

Dependencies: dataplane_core, controlplane updates:

  • gRPC pre 1.27.x - 1.32.0 (release notes)
  • protobuf 3.10.1 - 3.13.0 (release notes). Reduces patch size
  • upd 2020-09-18. Remove patch!
  • xxhash 0.7.3 - 0.8.0 (release notes)
  • spdlog 1.7.0 - 1.8.1 (release notes)
  • zlib-ng 2020-09-24
  • rules_go 0.23.7 - 0.24.3 (release notes). Updates Go to 1.14.9 (1.15.2 introduces tighter conflicting imports and Envoy won't compile)

Risk Level: Medium, jumps in core components gRPC and protobuf
Testing: bazel test //test/..., bazel build @envoy_api_canonical//envoy/..., docs/build.sh, bazel @envoy_api_canonical//tools/..., bazel @envoy_api_canonical//test/...
Docs Changes: N/A
Release Notes: N/A

Comments: ALTS changes will require review from Google. There may be straggling gRPC test failures that look like they are related to response code changes in upstream gRPC

Signed-off-by: Michael Payne michael@sooper.org

Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
…xtra.bzl

Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
… bump and sync to generated protos this time

Signed-off-by: Michael Payne <michael@sooper.org>
…, xxhash, spdlog, zlib-ng, rules_go.

Signed-off-by: Michael Payne <michael@sooper.org>
@moderation moderation requested a review from htuch as a code owner October 5, 2020 18:11
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/dependency-watchers: FYI only for changes made to (bazel/repository_locations\.bzl)|(api/bazel/repository_locations\.bzl)|(.*/requirements\.txt).

🐱

Caused by: #13394 was opened by moderation.

see: more, trace.

Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
@moderation
Copy link
Copy Markdown
Contributor Author

@antoniovicente as per https://envoyproxy.slack.com/archives/C78HA81DH/p1595896518287300?thread_ts=1595887714.282900&cid=C78HA81DH hoping you can take a look the gRPC changes and build failures in this PR. Thanks

@htuch
Copy link
Copy Markdown
Member

htuch commented Oct 7, 2020

@moderation LGTM. Can you merge master? Ready to ship when CI passes.

@moderation
Copy link
Copy Markdown
Contributor Author

moderation commented Oct 7, 2020

Thanks for validating that test change @htuch and @yihuazhang.

The issue with CI now are two failures. One is a ALTS test timeout and one is a gRPC timeout. I suspect that both are related to the same thing and possibly with gRPC.

The ALTS test timeout occurs at https://github.com/envoyproxy/envoy/blob/master/test/extensions/transport_sockets/alts/alts_integration_test.cc#L264-L270

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //test/extensions/transport_sockets/alts:alts_integration_test
-----------------------------------------------------------------------------
WARNING: Perftools heap leak checker is active -- Performance may suffer
[==========] Running 12 tests from 6 test suites.
[----------] Global test environment set-up.
[----------] 2 tests from IpVersions/AltsIntegrationTestValidPeer
[ RUN      ] IpVersions/AltsIntegrationTestValidPeer.RouterRequestAndResponseWithBodyNoBuffer/IPv4
[       OK ] IpVersions/AltsIntegrationTestValidPeer.RouterRequestAndResponseWithBodyNoBuffer/IPv4 (667 ms)
[ RUN      ] IpVersions/AltsIntegrationTestValidPeer.RouterRequestAndResponseWithBodyNoBuffer/IPv6
[       OK ] IpVersions/AltsIntegrationTestValidPeer.RouterRequestAndResponseWithBodyNoBuffer/IPv6 (441 ms)
[----------] 2 tests from IpVersions/AltsIntegrationTestValidPeer (1108 ms total)

[----------] 2 tests from IpVersions/AltsIntegrationTestEmptyPeer
[ RUN      ] IpVersions/AltsIntegrationTestEmptyPeer.RouterRequestAndResponseWithBodyNoBuffer/IPv4
[       OK ] IpVersions/AltsIntegrationTestEmptyPeer.RouterRequestAndResponseWithBodyNoBuffer/IPv4 (466 ms)
[ RUN      ] IpVersions/AltsIntegrationTestEmptyPeer.RouterRequestAndResponseWithBodyNoBuffer/IPv6
[       OK ] IpVersions/AltsIntegrationTestEmptyPeer.RouterRequestAndResponseWithBodyNoBuffer/IPv6 (464 ms)
[----------] 2 tests from IpVersions/AltsIntegrationTestEmptyPeer (930 ms total)

[----------] 2 tests from IpVersions/AltsIntegrationTestClientInvalidPeer
[ RUN      ] IpVersions/AltsIntegrationTestClientInvalidPeer.ClientValidationFail/IPv4
-- Test timed out at 2020-10-06 00:11:04 UTC --

The gRPC error occurs at https://github.com/envoyproxy/envoy/blob/master/test/common/grpc/grpc_client_integration_test.cc#L74-L91

[  FAILED  ] 2 tests, listed below:
[  FAILED  ] IpVersionsClientType/GrpcClientIntegrationTest.HttpNon200Status/IPv4_GoogleGrpc, where GetParam() = (4-byte object <00-00 00-00>, 4-byte object <01-00 00-00>)
[  FAILED  ] IpVersionsClientType/GrpcClientIntegrationTest.HttpNon200Status/IPv6_GoogleGrpc, where GetParam() = (4-byte object <01-00 00-00>, 4-byte object <01-00 00-00>)

You can see the timeout

[ RUN      ] IpVersionsClientType/GrpcClientIntegrationTest.HttpNon200Status/IPv4_GoogleGrpc
unknown file: Failure

Unexpected mock function call - returning directly.
    Function call: onRemoteClose(13, @0x7ffdc736f188 "Received http2 header with status: 400")
Google Mock tried the following 1 expectation, but it didn't match:

./test/common/grpc/grpc_client_integration_test_harness.h:167: EXPECT_CALL(*this, onRemoteClose(grpc_status, _))...
  Expected arg #0: is equal to 1
           Actual: 13
         Expected: to be called once
           Actual: never called - unsatisfied and active
Stack trace:
  0x2f4f0a9: testing::internal::GoogleTestFailureReporter::ReportFailure()
  0x4d7e08: testing::internal::Expect()
  0x2f61755: testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()::$_1::operator()()
  0x2f61383: testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
  0x5169a9: testing::internal::FunctionMocker<>::Invoke()
  0x505882: Envoy::Grpc::MockAsyncStreamCallbacks<>::onRemoteClose()
  0x17c24b0: Envoy::Grpc::GoogleAsyncStreamImpl::notifyRemoteClose()
  0x17c6b70: Envoy::Grpc::GoogleAsyncStreamImpl::handleOpCompletion()
  0x17bf81f: Envoy::Grpc::GoogleAsyncStreamImpl::onCompletedOps()
  0x17c8ab8: Envoy::Grpc::GoogleAsyncClientThreadLocal::completionThread()::$_1::operator()()
  0x17c8a7d: std::__invoke_impl<>()
  0x17c8a1d: std::__invoke_r<>()
  0x17c891d: std::_Function_handler<>::_M_invoke()
  0x4f4ade: std::function<>::operator()()
  0x1c8aabd: Envoy::Event::DispatcherImpl::runPostCallbacks()
  0x1c8b258: Envoy::Event::DispatcherImpl::DispatcherImpl()::$_1::operator()()
  0x1c8b21d: std::__invoke_impl<>()
  0x1c8b1bd: std::__invoke_r<>()
  0x1c8b0bd: std::_Function_handler<>::_M_invoke()
  0x4f4ade: std::function<>::operator()()
  0x2109611: Envoy::Event::SchedulableCallbackImpl::SchedulableCallbackImpl()::$_0::operator()()
  0x21095d6: Envoy::Event::SchedulableCallbackImpl::SchedulableCallbackImpl()::$_0::__invoke()
  0x21edf46: event_process_active_single_queue
  0x21e863a: event_process_active
  0x21e74ec: event_base_loop
  0x2106c24: Envoy::Event::LibeventScheduler::run()
  0x1c8a9ea: Envoy::Event::DispatcherImpl::run()
  0x459b41: Envoy::Grpc::(anonymous namespace)::DispatcherHelper::runDispatcher()
  0x464ea4: Envoy::Grpc::(anonymous namespace)::GrpcClientIntegrationTest_HttpNon200Status_Test::TestBody()
  0x2f48254: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x2f37d6e: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x2f20663: testing::Test::Run()
  0x2f2125d: testing::TestInfo::Run()
... Google Test internal frames ...

./test/common/grpc/grpc_client_integration_test_harness.h:261: Failure
Failed
Test timeout
Stack trace:
  0x457128: Envoy::Grpc::(anonymous namespace)::GrpcClientIntegrationTest::initialize()::{lambda()#1}::operator()()
  0x45709d: std::__invoke_impl<>()
  0x45703d: std::__invoke_r<>()
  0x456f3d: std::_Function_handler<>::_M_invoke()
  0x4f4ade: std::function<>::operator()()
  0x210a02d: Envoy::Event::TimerImpl::TimerImpl()::$_0::operator()()
  0x2109fd6: Envoy::Event::TimerImpl::TimerImpl()::$_0::__invoke()
  0x21edf46: event_process_active_single_queue
  0x21e863a: event_process_active
  0x21e74ec: event_base_loop
  0x2106c24: Envoy::Event::LibeventScheduler::run()
  0x1c8a9ea: Envoy::Event::DispatcherImpl::run()
  0x459b41: Envoy::Grpc::(anonymous namespace)::DispatcherHelper::runDispatcher()
  0x464ea4: Envoy::Grpc::(anonymous namespace)::GrpcClientIntegrationTest_HttpNon200Status_Test::TestBody()
  0x2f48254: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x2f37d6e: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x2f20663: testing::Test::Run()
  0x2f2125d: testing::TestInfo::Run()
... Google Test internal frames ...

Are you aware of a change in gRPC (or ALTS) that would unblock these two tests?

Signed-off-by: Michael Payne <michael@sooper.org>
Signed-off-by: Michael Payne <michael@sooper.org>
@htuch
Copy link
Copy Markdown
Member

htuch commented Oct 9, 2020

@yihuazhang would you be able to help @moderation debug the ALTS test?
@markdroth do you happen to know what change have occurred over this gRPC range that might affect status code as shown in the above test?

@yihuazhang
Copy link
Copy Markdown
Contributor

@moderation For ALTS timeout, could you please add more logs to verify whether ALTS handshake successfully fails due to peer service account mismatch? If the handshake fails successfully, I think it is the problem of gRPC or Envoy on how to handle the failure.

@yihuazhang
Copy link
Copy Markdown
Contributor

yihuazhang commented Oct 9, 2020

By looking at the ALTS changes happened since pre 1.27.x (from this PR in the history), the only ATLS change that looks suspicious to me is these two PR's - PR1 and PR2

Edit: PR1 does not seem to be relevant.

@htuch
Copy link
Copy Markdown
Member

htuch commented Oct 9, 2020

Tagging @jiangtaoli2016 who authored PR2.

@markdroth
Copy link
Copy Markdown
Contributor

@yashykt It looks like something may have changed in the gRPC transport layer. Can you please take a look? Thanks!

@yashykt
Copy link
Copy Markdown
Contributor

yashykt commented Oct 9, 2020

Hi, I believe the difference in the status codes is coming from grpc/grpc#22901
This PR brought gRPC Core/C++ in agreement with what the gRPC spec says - https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md
Prior to this PR, a 400 HTTP status was indeed resulting in a CANCELLED status and it should now return an INTERNAL status.

@jiangtaoli2016
Copy link
Copy Markdown

PR2 is a fix that prevents envoy from crashing during ALTS destruction. It should not cause timeout. @htuch Is there way to see debug logs?

@moderation
Copy link
Copy Markdown
Contributor Author

Away from build computer right now. Is --sandbox_debug and --verbose_failures enough?

@stale
Copy link
Copy Markdown

stale bot commented Oct 18, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 18, 2020
@moderation moderation closed this Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale stalebot believes this issue/PR has not been touched recently

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants