Skip to content

[stats] lazyinit ClusterInfo::trafficStats() #23921

Merged
jmarantz merged 145 commits intoenvoyproxy:mainfrom
stevenzzzz:lazy-init-cluster-upstream-stats
Jun 27, 2023
Merged

[stats] lazyinit ClusterInfo::trafficStats() #23921
jmarantz merged 145 commits intoenvoyproxy:mainfrom
stevenzzzz:lazy-init-cluster-upstream-stats

Conversation

@stevenzzzz
Copy link
Copy Markdown
Contributor

@stevenzzzz stevenzzzz commented Nov 9, 2022

Commit Message: LazyInit ClusterInfo::trafficStats().

Additional Description:
3rd PR for #23575, this is stack upon 2nd PR #23907

With 100K clusters, we are seeing ~1.5 GBs less RAM usage with this PR.
Kudos to @jmarantz for the data:

Clean client in-MB Defferred Diff Diff % Defferred-Fulfilled Diff-in-MB (fulfilled - Clean)
allocated 4561550208 4350.233276 2886860656 1674689552 36.71316714 4565167632 3.44984436
heap_size 5303697408 656 3443523584 1860173824 35.07315144 5146411008 -150
pageheap_unmapped 687865856 501219328 186646528 27.13414634 524288000 -156
pageheap_free 22921216 21.859375 23109632 -188416 -00.8220157255 22257664 -0.6328125
total_thread_cache 1718288 1.638687134 4197032 -2478744 -144.2566089 4833576 2.970970154
total_physical_bytes 4647192158 4431.907804 2965242430 1681949728 36.19281645 4653479518 5.99609375

To reproduce, use this script to add 100K clusters to the bootstrap tmpl:

$ cat large_bootstrap_maker.sh 
#!/bin/bash

set -u
set -e
limit="$1"

for i in $(seq 1 $limit); do
  service="zzzzzzz-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_$i"
  echo "  - name: $service"
  echo "    connect_timeout: 0.25s"
  echo "    type: LOGICAL_DNS"
  echo "    dns_lookup_family: \"v6_only\""
  echo "    lb_policy: ROUND_ROBIN"
  echo "    load_assignment:"
  echo "      cluster_name: $service"
  echo "      endpoints:"
  echo "      - lb_endpoints:"
  echo "        - endpoint:"
  echo "            address: {socket_address: {address: google.com, port_value: 443}}"
done


base.tmpl:

admin:
  access_log:
  - name: envoy.access_loggers.file
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      path: "/dev/null"
  address:
    socket_address:
      address: "::"
      port_value: 8080
layered_runtime:
  layers:
  - name: admin
    admin_layer: {}

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: "::"
        port_value: 0
    filter_chains:
    - filters:
      - name: http
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: {prefix: "/"}
                route: {host_rewrite_literal: 127.0.0.1, cluster: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy-wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww}
  clusters:

Run the following commands to generate the bootstrap.

bash large_bootstrap_maker.sh 100000 >> base.yaml
mv base.yaml test/config/integration/100k_clusters.yaml

and modify the test/config/main_common_test.cc to load from test/config/integration/100k_clusters.yaml instead. Then you could run the test and observe the RAM uasage.

Risk Level: Medium (changes how cluster_info::trafficStats() are created).

Testing: existing unit tests.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

…n non-config-update stats

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
…-stats and "the rest"(will be renamed to upstream-stats)

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
@stevenzzzz
Copy link
Copy Markdown
Contributor Author

/assign @jmarantz
Still working on some tests but pls take a glance when you get a sec.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
@stevenzzzz stevenzzzz changed the title [WIP] [stats] lazyinit ClusterInfo::stats() and rename that to clusterInfo::upstreamStats() [stats] lazyinit ClusterInfo::stats() and rename that to clusterInfo::upstreamStats() Nov 9, 2022
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good but why don't we get the other PR in first.

@jmarantz
Copy link
Copy Markdown
Contributor

Hey -- I converted this to a draft until #23907 lands. Feel free to convert back when that lands and you merge main into this.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
@stevenzzzz
Copy link
Copy Markdown
Contributor Author

Also can you benchmark memory overhead estimates for when the feature is enabled, and you are instantiating all clusters.

please take a look at the last column in the updated table, in PR description.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor

/wait

Once the factored-out infrastructure in #27899 merges, this PR can merge main and will become smaller and easier to polish off with a final check.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
#include "fmt/format.h"

namespace Envoy {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove empty line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@stevenzzzz
Copy link
Copy Markdown
Contributor Author

/retest

jmarantz
jmarantz previously approved these changes Jun 22, 2023
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit, but basically looks great. This has come a long way in readability.

@ggreenway can you take a pass?

ClusterInfoImpl::generateStats(Stats::ScopeSharedPtr scope,
const ClusterTrafficStatNames& stat_names, bool deferred_creation) {
return Stats::createDeferredCompatibleStats<ClusterTrafficStats>(scope, stat_names,
deferred_creation);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/deferred_creation/defer_creation/ as this applies to something that will happen when the function is called, not something that already happened.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Copy link
Copy Markdown
Contributor Author

@stevenzzzz stevenzzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

ClusterInfoImpl::generateStats(Stats::ScopeSharedPtr scope,
const ClusterTrafficStatNames& stat_names, bool deferred_creation) {
return Stats::createDeferredCompatibleStats<ClusterTrafficStats>(scope, stat_names,
deferred_creation);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#include "fmt/format.h"

namespace Envoy {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@stevenzzzz
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@stevenzzzz
Copy link
Copy Markdown
Contributor Author

/retest

jmarantz
jmarantz previously approved these changes Jun 23, 2023
@jmarantz
Copy link
Copy Markdown
Contributor

@ggreenway are you able to take a look soon? I think this is much easier to digest now .

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except it should have a changelog entry

/wait

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
@stevenzzzz
Copy link
Copy Markdown
Contributor Author

Thanks.

@jmarantz
Copy link
Copy Markdown
Contributor

/retest

2 similar comments
@DiazAlan
Copy link
Copy Markdown
Contributor

/retest

@stevenzzzz
Copy link
Copy Markdown
Contributor Author

/retest

@jmarantz jmarantz mentioned this pull request Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no stalebot Disables stalebot from closing an issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants