Fixing logic to keep list of unique cluster UUIDs#22808
Fixing logic to keep list of unique cluster UUIDs#22808ycombinator merged 7 commits intoelastic:masterfrom ycombinator:mon-ls-dup-cluster-uuid
Conversation
|
Pinging @elastic/stack-monitoring (Stack monitoring) |
|
Pinging @elastic/integrations-services (Team:Services) |
| clusterToPipelinesMap[overrideClusterUUID] = pipelines | ||
| return clusterToPipelinesMap | ||
| } | ||
|
|
There was a problem hiding this comment.
I removed this section because it's also incorrect. We shouldn't be using the override cluster UUID, if set, for all pipelines so broadly. Instead we should figure out (as we do further below) the cluster UUIDs appropriate for each pipeline and build the cluster UUID => pipelines map that way.
There was a problem hiding this comment.
This is still a change in semantics and maybe a bc breaking change.
Before this change all pipelines have had been associated with overrideClusterUUID. With this change pipelines are only associated with overrideClusterUUID iff they do no clusterUUID set. This way overrideClusterUUID actually becomes defaultClusterUUID.
Instead of removing this line, how about printing a deprecation warning if this setting is configured and introduce an a default cluster uuid setting that matches the new semantics?
There was a problem hiding this comment.
Hmmm, looking at where the value of overrideClusterUUID comes from ultimately, it comes from the monitoring.cluster_uuid setting in logstash.yml: https://www.elastic.co/guide/en/logstash/current/monitoring-with-metricbeat.html#define-cluster__uuid.
Reading the documentation for that setting:
To bind the metrics of Logstash to a specific cluster, optionally define ...
This makes me think, if this setting is set and therefore overrideClusterUUID is set, we should make that the cluster UUID for all pipelines, ignoring what Elasticsearch clusters each pipeline might individually be talking to.
So I'm going to revert this change and, in fact, add a similar code block in the makeClusterToPipelinesMap function in the node metricset.
|
/cc @sayden |
|
|
||
| for _, pipeline := range pipelines { | ||
| var clusterUUIDs []string | ||
| clusterUUIDs := make(map[string]struct{}, 0) |
There was a problem hiding this comment.
this is common.StringSet.
The overall data expension starts by converting the map[string]pipelineState into a []pipelineState. Would it make sense to keep the map instead, to reduce the chance of creating duplicate entries in general?
There was a problem hiding this comment.
I've changed the code to use common.StringSet in 4cb337f but I didn't follow this:
The overall data expension starts by converting the
map[string]pipelineStateinto a[]pipelineState. Would it make sense to keep the map instead, to reduce the chance of creating duplicate entries in general?
I think it's because I'm not finding the map[string]pipelineState you mentioned. Where are you seeing that?
There was a problem hiding this comment.
We do the conversion here.
I was wondering if it is necessary, or maybe can even lead to similar problems (in the future) if we are not careful about not producing duplicates.
| } | ||
|
|
||
| for _, clusterUUID := range clusterUUIDs { | ||
| for clusterUUID, _ := range clusterUUIDs { |
There was a problem hiding this comment.
nit clusterUUID := range clusterUUIDS { ... }
| clusterToPipelinesMap[overrideClusterUUID] = pipelines | ||
| return clusterToPipelinesMap | ||
| } | ||
|
|
There was a problem hiding this comment.
This is still a change in semantics and maybe a bc breaking change.
Before this change all pipelines have had been associated with overrideClusterUUID. With this change pipelines are only associated with overrideClusterUUID iff they do no clusterUUID set. This way overrideClusterUUID actually becomes defaultClusterUUID.
Instead of removing this line, how about printing a deprecation warning if this setting is configured and introduce an a default cluster uuid setting that matches the new semantics?
| }, | ||
| }, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Is this particular case able to reproduce the issue we have had? If not, we should add one.
The expannsion into the cluster map is somewhat critical in order to produce correct docs, that show the correct association of configurations to single Elasticsearch clusters. Can we make this test more exhaustive?
There was a problem hiding this comment.
Yes, this is the case that was problematic. Before this PR this case would've produced 3 PipelineState associated with the prod_cluster_id cluster UUID when we should've only got 1 PipelineState.
I will expand this test to add a few more test cases.
|
|
||
| for _, pipeline := range pipelines { | ||
| var clusterUUIDs []string | ||
| clusterUUIDs := make(map[string]struct{}, 0) |
|
@urso I've addressed all your feedback. Would you mind re-reviewing please, when you get a chance? Thanks! |
| } | ||
|
|
||
| for _, clusterUUID := range clusterUUIDs { | ||
| for _, clusterUUID := range clusterUUIDs.ToSlice() { |
There was a problem hiding this comment.
The ToSlice is a little overkill here (it copies and sorts all strings). The StringSet is defined as type StringSet map[string]struct{} and can be directly use with the range-clause.
#22808) (#22817) * Fixing logic to keep list of unique cluster UUIDs (#22808) * Fixing logic to keep list of unique cluster UUIDs * Adding CHANGELOG entry * Use common.StringSet * Adding more test cases * Adding back logic to broadly override cluster UUID for all pipelines, if set * Removing ToSlice() * Fixing loop * Fixing up CHANGELOG
* Fixing logic to keep list of unique cluster UUIDs * Adding CHANGELOG entry * Use common.StringSet * Adding more test cases * Adding back logic to broadly override cluster UUID for all pipelines, if set * Removing ToSlice() * Fixing loop
…-issues * upstream/master: (41 commits) Fix version parser regex for packaging (elastic#22581) Fix local_dynamic documentation and add providers inline doc. (elastic#22657) fix: use proper param name for e2e tests (elastic#22836) [Heartbeat] Fix exit on disabled monitor (elastic#22829) Update Golang to 1.14.12 (elastic#22790) docs: fix setup.template.overwrite typos (elastic#22804) Add docs section for ECS EC2 monitoring (elastic#22784) Fixing logic to keep list of unique cluster UUIDs (elastic#22808) Skip somewhat flaky UDP system test on Windows (elastic#22810) Fix polling node when it is not ready and monitor by hostname (elastic#22666) Skip Filebeat test_shutdown on windows 7 (elastic#22797) Make monitoring Namespace thread-safe (elastic#22640) Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721) Add support for reading from UNIX datagram sockets (elastic#22699) Fix export dashboard command from Elastic Cloud (elastic#22746) Skip flaky winlogbeat test on Windows-7 (elastic#22754) Missing `>` (elastic#22763) (elastic#22766) Fix k8s watcher issue when node access to list nodes and ns (elastic#22714) [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732) Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634) ...
…dows-7 * upstream/master: (41 commits) Fix version parser regex for packaging (elastic#22581) Fix local_dynamic documentation and add providers inline doc. (elastic#22657) fix: use proper param name for e2e tests (elastic#22836) [Heartbeat] Fix exit on disabled monitor (elastic#22829) Update Golang to 1.14.12 (elastic#22790) docs: fix setup.template.overwrite typos (elastic#22804) Add docs section for ECS EC2 monitoring (elastic#22784) Fixing logic to keep list of unique cluster UUIDs (elastic#22808) Skip somewhat flaky UDP system test on Windows (elastic#22810) Fix polling node when it is not ready and monitor by hostname (elastic#22666) Skip Filebeat test_shutdown on windows 7 (elastic#22797) Make monitoring Namespace thread-safe (elastic#22640) Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721) Add support for reading from UNIX datagram sockets (elastic#22699) Fix export dashboard command from Elastic Cloud (elastic#22746) Skip flaky winlogbeat test on Windows-7 (elastic#22754) Missing `>` (elastic#22763) (elastic#22766) Fix k8s watcher issue when node access to list nodes and ns (elastic#22714) [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732) Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634) ...
What does this PR do?
This PR fixes a pretty bad bug with the
logstashmodule whenxpack.enabled: trueis set.The
logstashmodule has two metricsets:nodeandnode_stats. Whenxpack.enabled: trueis set, these metricsets emit events for Stack Monitoring withtype: logstash_stateandtype: logstash_stats, respectively.The
nodemetricset is supposed to emit as many events perFetch()cycle as there are monitored Logstash pipelines x the number of Elasticsearch clusters that each such pipeline talks to. If a Logstash pipeline does not talk to an Elasticsearch cluster, one event for that pipeline must be emitted.Instead, due to terribly faulty logic in the code, the
nodemetricset was emitting as many events perFetch()cycle as there are monitored Logstash pipelines x the number of vertices in each such pipeline!There was similar faulty logic in the
node_statsmetricset.This PR fixes this bug in both metricsets.
Why is it important?
By fixing this bug, we are eliminating a huge source of redundant and unnecessary events being published from the
logstashmodule whenxpack.enabled: trueis set. This will not only make the module's behavior correct but also drastically reduce memory consumption in Metricbeat and the number of events sent to Metricbeat's output over the wire.Checklist
I have commented my code, particularly in hard-to-understand areasI have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.