Skip to content

grpc_stats filter: Add config options to restrict number of stats#10467

Merged
htuch merged 20 commits intoenvoyproxy:masterfrom
ggreenway:grpc-stats-whitelist
Mar 27, 2020
Merged

grpc_stats filter: Add config options to restrict number of stats#10467
htuch merged 20 commits intoenvoyproxy:masterfrom
ggreenway:grpc-stats-whitelist

Conversation

@ggreenway
Copy link
Copy Markdown
Member

@ggreenway ggreenway commented Mar 20, 2020

Add options to either put all methods into the same per-cluster
stat, or to whitelist which methods to create stats for.

Additionally, start a deprecation process for the default mode
being to create a stat for all methods, to remove an unsafe
default.

Risk Level: Low
Testing: Unit tests added
Docs Changes: done
Release Notes: added
Fixes #10445
Deprecated: added

Add options to either put all methods into the same per-cluster
stat, or to whitelist which methods to create stats for.

Additionally, start a deprecation process for the default mode
being to create a stat for all methods, to remove an unsafe
default.

Fixes: envoyproxy#10445

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway ggreenway requested a review from lizan as a code owner March 20, 2020 15:37
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #10467 was opened by ggreenway.

see: more, trace.

@ggreenway
Copy link
Copy Markdown
Member Author

This is still a bit rough, but I wanted to get some feedback on:

  • the API: is a two-layer lookup for service/method like I did it good, or would a flat lookup of pair<service, method> be better?
  • how to handle deprecation of a default value (should I be using deprecatedFeatureEnabled instead of runtimeFeatureEnabled?)

@ggreenway
Copy link
Copy Markdown
Member Author

cc @kyessenov

}

// Gets the stat prefix and underlying storage, depending on whether request_names is empty
std::pair<Stats::StatName, Stats::SymbolTable::StoragePtr>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just return the StoragePtr here...the StatName is trivially constructed on the StoragePtr.

Think of StoragePtr as std::string, and StatName as the absl::string_view referencing the same string.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that in one case, the StoragePtr is allocated in this function, and in the other case, the StatName is retrieved from a member variable that is a StatName. So in one case, I am returning ownership, and in the other case I am not.

Given that requirement, can I simplify this somehow?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. And in all cases we don't need a symbol-table lock to do this operation, which is nice.

I think I see an opportunity for a refactor, where we add a new class StatNameJoinPool which has a method:
StatName join(const std::vector<StatName>& parts);
that would simplify this call-site and also others. Basically every usage of SymbolTable::join() is kind of ugly.

I added #10516 to track this, but feel free to mark this resolved.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@jmarantz
Copy link
Copy Markdown
Contributor

/wait

@ggreenway
Copy link
Copy Markdown
Member Author

@jmarantz Here's a version of the map that flattens the keys to a tuple<string, string>. The code-logic is simpler, this is probably faster in practice, but it involves more complicated c++isms. Thoughts?

public:
  GrpcServiceMethodToRequestNamesMap(Stats::SymbolTable& symbol_table)
      : stat_name_pool_(symbol_table) {}

  void populate(const envoy::config::core::v3::GrpcMethodList& method_list) {
    for (const auto& service : method_list.services()) {
      Stats::StatName stat_name_service = stat_name_pool_.add(service.name());

      for (const auto& method_name : service.method_names()) {
        Stats::StatName stat_name_method = stat_name_pool_.add(method_name);
        map_[std::make_tuple(service.name(), method_name)] =
            Grpc::Context::RequestStatNames{stat_name_service, stat_name_method};
      }
    }
  }

  absl::optional<Grpc::Context::RequestStatNames>
  lookup(const Grpc::Common::RequestNames& request_names) const {
    auto it = map_.find(std::make_tuple(request_names.service_, request_names.method_));
    if (it != map_.end()) {
      return it->second;
    }

    return {};
  }

private:
  struct MapHash : absl::Hash<std::tuple<std::string, std::string>>,
                   absl::Hash<std::tuple<absl::string_view, absl::string_view>> {
    using is_transparent = void;
  };

  struct MapEq {
    using is_transparent = void;
    bool operator()(const std::tuple<std::string, std::string>& left,
                    const std::tuple<std::string, std::string>& right) const {
      return left == right;
    }
    bool operator()(const std::tuple<std::string, std::string>& left,
                    const std::tuple<absl::string_view, absl::string_view>& right) const {
      return left == right;
    }
  };
  absl::flat_hash_map<std::tuple<std::string, std::string>, Grpc::Context::RequestStatNames,
                      MapHash, MapEq>
      map_;
  Stats::StatNamePool stat_name_pool_;
};

Reference: https://groups.google.com/forum/#!topic/abseil-io/jvNJAqTQDIA

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick language comment. The API LGTM, thanks.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway ggreenway requested a review from snowp as a code owner March 21, 2020 00:03
@jmarantz
Copy link
Copy Markdown
Contributor

can you ping when Ci is passing? Thanks!

/wait

// Envoy, using unbounded memory and potentially slowing down stats pipelines.
//
// .. attention::
// If neither `stats_allowlist` nor `stats_for_all_methods` is set, the behavior
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: individual_method_stats_allowlist

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Copy Markdown
Member Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🐴 hold your horses - no failures detected, yet.

🐱

Caused by: a #10467 (comment) was created by @ggreenway.

see: more, trace.

@dio
Copy link
Copy Markdown
Member

dio commented Mar 24, 2020

/azp run envoy-linux

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jmarantz
Copy link
Copy Markdown
Contributor

just noting that the 'release' test failed with

2020-03-25T00:27:03.1101292Z [ RUN      ] ClusterRefreshManagerTest.Basic
2020-03-25T00:27:03.1101953Z test/extensions/common/redis/cluster_refresh_manager_test.cc:149: Failure
2020-03-25T00:27:03.1102599Z Expected equality of these values:
2020-03-25T00:27:03.1103371Z   cluster_info->redirects_count_
2020-03-25T00:27:03.1103972Z     Which is: 1
2020-03-25T00:27:03.1104438Z   0
2020-03-25T00:27:03.1104863Z Stack trace:
2020-03-25T00:27:03.1105518Z   0x8d129cb: Envoy::Extensions::Common::Redis::ClusterRefreshManagerTest_Basic_Test::TestBody()
2020-03-25T00:27:03.1106736Z   0x1446c39e: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
2020-03-25T00:27:03.1107618Z   0x14418d36: testing::internal::HandleExceptionsInMethodIfSupported<>()
2020-03-25T00:27:03.1108276Z   0x143c2af7: testing::Test::Run()
2020-03-25T00:27:03.1108864Z   0x143c5800: testing::TestInfo::Run()
2020-03-25T00:27:03.1109413Z ... Google Test internal frames ...

not sure if that's a flake or regression, but re-running now.

@jmarantz
Copy link
Copy Markdown
Contributor

/azp envoy-linux run

@azure-pipelines
Copy link
Copy Markdown

Command 'envoy-linux' is not supported by Azure Pipelines.

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. Thank you for doing this!

just a few nits.

}

// Gets the stat prefix and underlying storage, depending on whether request_names is empty
std::pair<Stats::StatName, Stats::SymbolTable::StoragePtr>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. And in all cases we don't need a symbol-table lock to do this operation, which is nice.

I think I see an opportunity for a refactor, where we add a new class StatNameJoinPool which has a method:
StatName join(const std::vector<StatName>& parts);
that would simplify this call-site and also others. Basically every usage of SymbolTable::join() is kind of ugly.

I added #10516 to track this, but feel free to mark this resolved.

@ggreenway
Copy link
Copy Markdown
Member Author

@htuch Can you approve the api again? Or do you want to re-evaluate in light of the conversation about compatibility/defaults?

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the current policy allows for WKT defaults to switch, let's land this one. We should continue that discussion in parallel, thanks.

@htuch htuch merged commit 715992f into envoyproxy:master Mar 27, 2020
case envoy::extensions::filters::http::grpc_stats::v3::FilterConfig::kStatsForAllMethods:
stats_for_all_methods_ =
PROTOBUF_GET_WRAPPED_OR_DEFAULT(proto_config, stats_for_all_methods,
context.runtime().snapshot().deprecatedFeatureEnabled(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway Quick drive be here: I'm pretty sure there is not going to print a warning in the default case, right? (It will inc the stat I think.) This is going to make it hard/impossible for operators to figure what they did wrong. If I'm right can we do a follow up to get a warning printed somehow? cc @zuercher @alyssawilk who have more experience with the deprecation warning code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; I'll sort out a fix.

@antoniovicente
Copy link
Copy Markdown
Contributor

This PR was merged upstream without signoff in the commit message.

Running pre-push check; to skip this step use 'push --no-verify'
ERROR: The following commits do not have DCO signoff:
715992f grpc_stats filter: Add config options to restrict number of stats (#10467)

@ggreenway
Copy link
Copy Markdown
Member Author

This PR was merged upstream without signoff in the commit message.

Running pre-push check; to skip this step use 'push --no-verify'
ERROR: The following commits do not have DCO signoff:
715992f grpc_stats filter: Add config options to restrict number of stats (#10467)

That's unfortunate, but it can't be fixed at this point. Eventually this commit will be far enough back in the history that the tool won't look at it. Until then, I think you just have to --no-verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use of grpc-stats filter with untrusted clients can cause unbounded growth of stat names

7 participants