Skip to content

Fix a SIGSEGV when circuit breaker to upstream policy server is hit, fix a memory leak when client disconnects during upstream policy check, expose mixer client logging to Envoy, and add more mixer client counters.#2107

Closed
duderino wants to merge 1 commit intoistio:release-1.1from
duderino:jblatt_8224_crash_fix_logging_and_counters

Conversation

@duderino
Copy link
Copy Markdown

@duderino duderino commented Feb 7, 2019

What this PR does / why we need it:

This is a partial fix for istio/istio#8224.

I created a mixer fault test framework using an Envoy-based load generator and
upstream server. See #2105. There will also be a future PR that I will link here that applies that test framework to Mixer. These tests found two issues related to memory management in the MixerClientImpl.

Issue 1: If the upstream client disconnects during an upstream check to
the policy server, memory is leaked. Example:

Leak of 375808 bytes in 367 objects allocated from:
	@ 1fef7cf google::protobuf::internal::ArenaImpl::NewBlock
	@ 1ff011e google::protobuf::internal::ArenaImpl::SerialArena::AllocateAlignedFallback
	@ 1ff09b6 google::protobuf::internal::ArenaImpl::SerialArena::AllocateAligned
	@ 1fefb78 google::protobuf::internal::ArenaImpl::AllocateAligned
	@ 61544f google::protobuf::Arena::CreateMaybeMessage
	@ 55c835 google::protobuf::MessageLite::CreateMaybeMessage
	@ 55b973 istio::mixer::v1::CheckRequest::mutable_attributes
	@ 55a0aa istio::mixerclient::MixerClientImpl::Check

Issue 2: If too many requests are sent to the upstream policy server a
circuit breaker may be hit (by default this limits to 1024 concurrent
requests). The MixerClientImpl will free memory prematurely and SIGSEGV
with a stack trace like:

[2019-01-26 23:15:47.525][28][debug][pool] [external/envoy/source/common/http/http2/conn_pool.cc:79] max requests overflow
[2019-01-26 23:15:47.525][28][debug][router] [external/envoy/source/common/router/router.cc:532] [C0][S6223237052999230964] upstream reset
[2019-01-26 23:15:47.526][28][debug][http] [external/envoy/source/common/http/async_client_impl.cc:93] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers'
'x-envoy-overloaded', 'true'

[2019-01-26 23:15:47.526][28][debug][grpc] [src/envoy/utils/grpc_transport.cc:78] Check failed with code: 14, upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][warning][filter] [src/istio/mixerclient/client_impl.cc:136] Mixer policy check transport error: UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/istio/mixerclient/client_impl.cc:142] Mixer policy check status: UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:152] Called Mixer::Filter : check complete UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:132] Called Mixer::Filter : encodeHeaders 3
[2019-01-26 23:15:47.526][28][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1234] [C2230][S9774838916162700467] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '69'
'content-type', 'text/plain'
'date', 'Sat, 26 Jan 2019 23:15:46 GMT'
'server', 'envoy'

[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:207] Called Mixer::Filter : onDestroy state: 3
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:125] Caught Segmentation fault, suspect faulting address 0x8
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:94] Backtrace thr<28> obj</usr/lib/x86_64-linux-gnu/libstdc++.so.6> (If unsymbolized, use tools/stack_decode.py):
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:117] thr<28> #0 0x7ff123114c04 (unknown)
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] thr<28> obj</home/jblatt/.cache/bazel/_bazel_jblatt/b6258e8f3eabbcfebfab41d31f03811a/sandbox/linux-sandbox/4/execroot/__main__/bazel-out/k8-fastbuild/bin/test/integration/mixer_fault_test.runfiles/__main__/test/integration/mixer_fault_test>
[2019-01-26 23:15:47.553][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #1 0x209e113 google::protobuf::internal::GeneratedMessageReflection::HasBit()
[2019-01-26 23:15:47.571][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #2 0x20923d6 google::protobuf::internal::GeneratedMessageReflection::HasField()
[2019-01-26 23:15:47.587][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #3 0x2037f03 google::protobuf::TextFormat::Printer::PrintField()
[2019-01-26 23:15:47.603][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #4 0x2036d68 google::protobuf::TextFormat::Printer::Print()
[2019-01-26 23:15:47.621][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #5 0x20367e1 google::protobuf::TextFormat::Printer::Print()
[2019-01-26 23:15:47.638][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #6 0x20332f1 google::protobuf::TextFormat::Printer::PrintToString()
[2019-01-26 23:15:47.656][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #7 0x2033144 _ZNK6google8protobuf7Message11DebugStringB5cxx11Ev
[2019-01-26 23:15:47.673][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #8 0x50af1b Envoy::Utils::GrpcTransport<>::GrpcTransport()
[2019-01-26 23:15:47.690][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #9 0x50aaff Envoy::Utils::GrpcTransport<>::GetFunc()::{lambda()#1}::operator()()
[2019-01-26 23:15:47.707][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #10 0x50a857 std::_Function_handler<>::_M_invoke()
[2019-01-26 23:15:47.724][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #11 0x582383 std::function<>::operator()()
[2019-01-26 23:15:47.758][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #12 0x580b9a istio::mixerclient::MixerClientImpl::Check()

Issues 1 and 2 were fixed by replacing the large number of objects
dynamically allocated in the MixerClientImpl::Check function and passed
to an async client lambda with two shared pointers that keep all memory
alive for the lifetime of the Envoy filter (which is about as long as
the lifetime as the upstream request). This also had a nice side effect
of removing a few large protobuf copies. It will also make it much easier to retry failed mixer requests because the shared pointers will preserve the memory in retry lambdas. See istio/istio#9992 which I will work on next.

Which issue this PR fixes

Partial fix for istio/istio#8224

Note that the original bug described this as a problem with a slow telemetry upstream. I noticed that the policy check issues were often accompanied by log messages like:

libprotobuf ERROR src/istio/mixerclient/report_batch.cc:75] Mixer Report failed with: UNAVAILABLE:upstream connect error or disconnect/reset before headers

The policy checks encountered similar issues but didn't log them. So
in addition to the shared pointer bug fix for the leak and the sigsegv,
I also added additional logging, a simple logging abstraction so errors
in the mixer client could be relayed to envoy's logger, and additional
stats so we could track error rates, cache effectiveness, etc.

This should handle the policy check side. No errors were found for the
telemetry side, but I'll add additional counters and logging for that
too.

Special notes for your reviewer:

MixerClientImpl is pretty much a rewrite. I recommend using the 'unified view' to review that one.

@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Feb 7, 2019
@istio-testing
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: duderino
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: rshriram

If they are not already assigned, you can assign the PR to them by writing /assign @rshriram in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@duderino
Copy link
Copy Markdown
Author

duderino commented Feb 7, 2019

The counter changes in this PR will break the istio/istio build. Please merge PR istio/istio#11591 to fix.

Copy link
Copy Markdown
Contributor

@kyessenov kyessenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of questions (for my education purposes). Feel free to disregard them.


// Whether this quota response is from cache.
bool is_quota_cache_hit{false};
virtual const ::google::protobuf::util::Status& status() const = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do status and routeDirective need to be virtual?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check context shared pointer with all the state now implements the check response. Before there was a copy into a result class which included a protobuf copy. What's worse, a virtual call or a protobuf copy? I figured the latter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I still think we could avoid it, but that probably requires more refactoring to erase mixerclient/envoy boundary than desired.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I could create a wrapper object that delegates instead of making a virtual function call, but that would increment shared pointer ref counts. Still, this feels like premature optimization to me. For the gazillion inefficiencies of the mixer filter, replacing a virtual function call that I introduced to avoid some deep protobuf copies doesn't seem worth it

class AttributesBuilder {
public:
AttributesBuilder(RequestContext* request) : request_(request) {}
AttributesBuilder(istio::mixer::v1::Attributes* attributes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: can we avoid a raw pointer here? what stops us from using references?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd love to get rid of the raw pointers.

Initially I tried reworking all this code to use the CheckContext shared pointer, but it was a massive diff. So I exercised restraint. And Mixer v2 is coming.

Copy link
Copy Markdown
Contributor

@kyessenov kyessenov Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for explaining. Whether or not v2 is coming soon, this is production code. I am a bit concerned about the loss of context for the lifetime of the parameter, but if we can validate that the filter never outlives the attributes with integration tests under dynamic xDS reloads, it's fine. dynamic_listener test in istio is supposed to test for that, so if it passes, then it's probably OK.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree on the use of raw pointers. I would love to do this, but I want to minimize this overly large PR and I think the current PR makes things no worse. I have also verified that none of these raw pointers are used beyond the lifetime of the calling function.

So I'll do it if you guys are willing to review all the changes. I think a separate PR on top of this would make that easier. @mandarjog since he's also reviewing this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PiotrSikora came up with a similar criticism when we were discussing a breakout of this PR. See #2117

I will do this (and I really would love to!) if you agree to review the resulting changes. I think it would be good to do this in a new PR after 2117 is merged or else it'll become even larger

I created a mixer fault test framework using an Envoy-based load generator and
upstream server.  See #2105.   There will also be a future PR that I will link here that applies that test framework to Mixer.   These tests found two issues related to memory management in the MixerClientImpl.

Issue 1: If the upstream client disconnects during an upstream check to
the policy server, memory is leaked. Example:

```
Leak of 375808 bytes in 367 objects allocated from:
	@ 1fef7cf google::protobuf::internal::ArenaImpl::NewBlock
	@ 1ff011e google::protobuf::internal::ArenaImpl::SerialArena::AllocateAlignedFallback
	@ 1ff09b6 google::protobuf::internal::ArenaImpl::SerialArena::AllocateAligned
	@ 1fefb78 google::protobuf::internal::ArenaImpl::AllocateAligned
	@ 61544f google::protobuf::Arena::CreateMaybeMessage
	@ 55c835 google::protobuf::MessageLite::CreateMaybeMessage
	@ 55b973 istio::mixer::v1::CheckRequest::mutable_attributes
	@ 55a0aa istio::mixerclient::MixerClientImpl::Check
```
Issue 2: If too many requests are sent to the upstream policy server a
circuit breaker may be hit (by default this limits to 1024 concurrent
requests). The MixerClientImpl will free memory prematurely and SIGSEGV
with a stack trace like:

```
[2019-01-26 23:15:47.525][28][debug][pool] [external/envoy/source/common/http/http2/conn_pool.cc:79] max requests overflow
[2019-01-26 23:15:47.525][28][debug][router] [external/envoy/source/common/router/router.cc:532] [C0][S6223237052999230964] upstream reset
[2019-01-26 23:15:47.526][28][debug][http] [external/envoy/source/common/http/async_client_impl.cc:93] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers'
'x-envoy-overloaded', 'true'

[2019-01-26 23:15:47.526][28][debug][grpc] [src/envoy/utils/grpc_transport.cc:78] Check failed with code: 14, upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][warning][filter] [src/istio/mixerclient/client_impl.cc:136] Mixer policy check transport error: UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/istio/mixerclient/client_impl.cc:142] Mixer policy check status: UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:152] Called Mixer::Filter : check complete UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:132] Called Mixer::Filter : encodeHeaders 3
[2019-01-26 23:15:47.526][28][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1234] [C2230][S9774838916162700467] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '69'
'content-type', 'text/plain'
'date', 'Sat, 26 Jan 2019 23:15:46 GMT'
'server', 'envoy'

[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:207] Called Mixer::Filter : onDestroy state: 3
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:125] Caught Segmentation fault, suspect faulting address 0x8
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:94] Backtrace thr<28> obj</usr/lib/x86_64-linux-gnu/libstdc++.so.6> (If unsymbolized, use tools/stack_decode.py):
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:117] thr<28> #0 0x7ff123114c04 (unknown)
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] thr<28> obj</home/jblatt/.cache/bazel/_bazel_jblatt/b6258e8f3eabbcfebfab41d31f03811a/sandbox/linux-sandbox/4/execroot/__main__/bazel-out/k8-fastbuild/bin/test/integration/mixer_fault_test.runfiles/__main__/test/integration/mixer_fault_test>
[2019-01-26 23:15:47.553][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #1 0x209e113 google::protobuf::internal::GeneratedMessageReflection::HasBit()
[2019-01-26 23:15:47.571][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #2 0x20923d6 google::protobuf::internal::GeneratedMessageReflection::HasField()
[2019-01-26 23:15:47.587][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #3 0x2037f03 google::protobuf::TextFormat::Printer::PrintField()
[2019-01-26 23:15:47.603][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #4 0x2036d68 google::protobuf::TextFormat::Printer::Print()
[2019-01-26 23:15:47.621][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #5 0x20367e1 google::protobuf::TextFormat::Printer::Print()
[2019-01-26 23:15:47.638][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #6 0x20332f1 google::protobuf::TextFormat::Printer::PrintToString()
[2019-01-26 23:15:47.656][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #7 0x2033144 _ZNK6google8protobuf7Message11DebugStringB5cxx11Ev
[2019-01-26 23:15:47.673][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #8 0x50af1b Envoy::Utils::GrpcTransport<>::GrpcTransport()
[2019-01-26 23:15:47.690][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #9 0x50aaff Envoy::Utils::GrpcTransport<>::GetFunc()::{lambda()#1}::operator()()
[2019-01-26 23:15:47.707][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #10 0x50a857 std::_Function_handler<>::_M_invoke()
[2019-01-26 23:15:47.724][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #11 0x582383 std::function<>::operator()()
[2019-01-26 23:15:47.758][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #12 0x580b9a istio::mixerclient::MixerClientImpl::Check()
```
Issues 1 and 2 were fixed by replacing the large number of objects
dynamically allocated in the MixerClientImpl::Check function and passed
to an async client lambda with two shared pointers that keep all memory
alive for the lifetime of the Envoy filter (which is about as long as
the lifetime as the upstream request). This also had a nice side effect
of removing a few large protobuf copies.   It will also make it *much* easier to retry failed mixer requests because the shared pointers will preserve the memory in retry lambdas.  See istio/istio#9992 which I will work on next.

This is a partial fix for istio/istio#8224

Note that the original bug described this as a problem with a slow telemetry upstream. I noticed that the policy check issues were often accompanied by log messages like:

```
libprotobuf ERROR src/istio/mixerclient/report_batch.cc:75] Mixer Report failed with: UNAVAILABLE:upstream connect error or disconnect/reset before headers
```
The policy checks encountered similar issues but didn't log them.   So
in addition to the shared pointer bug fix for the leak and the sigsegv,
I also added additional logging, a simple logging abstraction so errors
in the mixer client could be relayed to envoy's logger, and additional
stats so we could track error rates, cache effectiveness, etc.

This should handle the policy check side. No errors were found for the
telemetry side, but I'll add additional counters and logging for that
too.
@duderino
Copy link
Copy Markdown
Author

@qiwzhang Apologies for asking for your review since you've moved on to another project, but could you take a look at this? I think it'd really benefit from your expertise and experience with this code.

There are essentially three large changes here:

  1. A bunch of logging. That's just boilerplate and should be easy to review by anyone, so you can skip if you're busy.

  2. Combining all the state needed by a remote check request into shared pointers to fix a memory leak and make quota check retries possible/safer (actual retry and lots of tests for that are coming in Add retries to policy checks on failed transport error #2113 - but that badly needs a rebase which I will do after this is merged). This needs a bit more attention, but I think others on the team are qualified to review it.

  3. A large rework of the MixerClientImpl and a few much smaller changes to QuotaCache and CheckCache. That was done to expose the control flow of policy/quota checks so we could introduce a lot more counters. That also made a large diff on the corresponding test cases.

  4. is the set of changes that most badly needs your review. I think it's fine if you only have time to do that. Others have and can review 1 and 2.

Thanks

Copy link
Copy Markdown
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comment. Just nits.
BTW: I believe if you can extract the logger in separate commit, it would be much easier to review

#include "options.h"

#include <vector>
#include "src/istio/mixerclient/check_context.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you verify if check_context.h is needed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified it's needed by this code in that file:

class MixerClient {

  virtual CancelFunc Check(istio::mixerclient::CheckContextSharedPtr& context,
                           TransportCheckFunc transport,
                           CheckDoneFunc on_done) = 0;

// total_remote_check_calls = total_check_misses
// total_remote_check_calls >= total_remote_check_accepts +
// total_remote_check_denies
// ^ Transport errors are responsible for the >=
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crystal clear. Thanks!


void Filter::completeCheck(const CheckResponseInfo& info) {
auto status = info.response_status;
auto status = info.status();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto& ? I don't see why we need a copy. Unless it's small POD

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was done on #2117

check_response; // by default status is unknown
auto attributes = std::make_shared<::istio::mixerclient::SharedAttributes>();
::istio::mixerclient::CheckContext check_response(
false, attributes); // by default status is unknown
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
false, attributes); // by default status is unknown
false /* fail_open */, attributes); // by default status is unknown

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was done on #2117


// Use mixer client object to make a Report call.
void SendReport(const RequestContext& request);
void SendReport(istio::mixerclient::SharedAttributesPtr& attributes);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's extremely uncommon to use non-const shared_ptr. It's only change the raw pointer need a non-const shared_ptr.
Could you confirm?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the consuming code wanted to increment the ref count on attributes, could it do that if attributes were const?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. const shared_ptr restricts you from changing the raw pointer(reset() or swap()) but allows changing control block(use count)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Will update all of these to use const. Thanks

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was done on #2117

*/

#include "src/istio/control/http/attributes_builder.h"
#include "google/protobuf/stubs/status.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I believe we have a tool to reorder the header files

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to learn more about this tool. Until then, I manually fixed this on #2117

@qiwzhang
Copy link
Copy Markdown
Contributor

@duderino This PR is too big. It is mixing bug fixing with code refractory.

For crash, it is not clear to me how the crash happened, without knowledge, I am not sure the new code will fix it. Could you use issues to explain the root cause for each crash? I will review it the issue.

For the code refractory, I like to see it breaks into smaller PR.

@duderino
Copy link
Copy Markdown
Author

duderino commented Feb 14, 2019

I'm closing this in favor of three smaller PRs.

  1. Logging only: Added simple logging abstraction so mixer client logs can be relayed to envoy logs. #2116
  2. Replace many raw pointers with 2 shared pointers (this is the big one, sorry): Coalesce all memory for checks and reports into shared pointers #2117
  3. Additional counters. Additional counters for mixer policy check #2118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants