[Agent] Support Node and Service autodiscovery in k8s provider by ChrsMark · Pull Request #26801 · elastic/beats

ChrsMark · 2021-07-09T09:06:53Z

What does this PR do?

This PR adds more resources in kubernetes dynamic provider.

Why is it important?

So as to support Node and Service discovery via kubernetes dynamic provider of Agent.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~- [ ] I have made corresponding changes to the documentation~~
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Test with `node` resource`:

Use the following config:

providers.kubernetes:
  scope: cluster
  kube_config: /Users/chrismark/.kube/config
  cleanup_timeout: 360s
  resources:
    node: 
      enabled: true

inputs:
  - type: logfile
    streams:
      - paths: ${kubernetes.node.name}/another.log

Run inspect command to check what is the compiled configuration (use proper path for the elastic-agent.yml config file :
./elastic-agent -c /Users/ubuntu/Desktop/elastic-agent.yml inspect output -o default
Verify that something similar to the following is produced:

filebeat:
  inputs:
  - index: logs-generic-default
    paths: kind-control-plane/another.log
    processors:
    - add_fields:
        fields:
          node:
            annotations:
              kubeadm:
                alpha:
                  kubernetes:
                    io/cri-socket: unix:///run/containerd/containerd.sock
              node:
                alpha:
                  kubernetes:
                    io/ttl: "0"
              volumes:
                kubernetes:
                  io/controller-managed-attach-detach: "true"
            ip: 172.18.0.3
            labels:
              beta.kubernetes.io/arch: amd64
              beta.kubernetes.io/os: linux
              kubernetes.io/arch: amd64
              kubernetes.io/hostname: kind-control-plane
              kubernetes.io/os: linux
              node-role.kubernetes.io/master: ""
            name: kind-control-plane
            uid: 65e26851-66d2-44f9-948d-37e2c43f50f7
        target: kubernetes

Test with `service` resource.

providers.kubernetes:
  scope: cluster
  kube_config: /Users/chrismark/.kube/config
  cleanup_timeout: 360s
  resources:
    service: 
      enabled: true

inputs:
  - type: logfile
    streams:
      - paths: ${kubernetes.service.name}/another.log

Test with `pod`

providers.kubernetes:
  scope: cluster
  kube_config: /Users/chrismark/.kube/config
  cleanup_timeout: 360s
  resources:
    pod:
       enabled: true

inputs:
  - type: logfile
    streams:
      - paths: ${kubernetes.pod.name}/another.log

Test with `service` resource and `node` scope.

providers.kubernetes:
  scope: node
  kube_config: /Users/chrismark/.kube/config
 cleanup_timeout: 360s
  resources:
    service:
      enabled: true
      

inputs:
  - type: logfile
    streams:
      - paths: ${kubernetes.service.name}/another.log

Verify from the logs that the scope was enforced to cluster scope (can not set scope to node when using resource Service. resetting scope to cluster), and that kubernetes.scope field is populated with cluster value.

Test with `pod` at node scope and define node

providers.kubernetes:
  scope: node
  kube_config: /Users/chrismark/.kube/config
  cleanup_timeout: 360s
  node: "kind-control-plane"
  resources:
    pod:
      enabled: true
      

inputs:
  - type: logfile
    streams:
      - paths: ${kubernetes.pod.name}/another.log

Related issues

Closes Add more resources in kubernetes composable provider #24496

Signed-off-by: chrismark <chrismarkou92@gmail.com>

elasticmachine · 2021-07-09T09:10:52Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2021-07-22T07:12:23.231+0000
Duration: 98 min 0 sec
Commit: 701236c

Test stats 🧪

Test	Results
Failed	0
Passed	7012
Skipped	16
Total	7028

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	7012
Skipped	16
Total	7028

Signed-off-by: chrismark <chrismarkou92@gmail.com>

MichaelKatsoulis

LGTM. It is nice splitting different resource watchers in their respectful files.

Signed-off-by: chrismark <chrismarkou92@gmail.com>

elasticmachine · 2021-07-12T07:39:59Z

Pinging @elastic/integrations (Team:Integrations)

ChrsMark · 2021-07-12T07:43:12Z

Opening this for review. I tested it manually with the scenarios mentioned in this PR's description. I plan to add unit tests for the provider too but the implementation should be ready for review now.

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark · 2021-07-13T09:37:11Z

Note: Latest commit (43f5136) tries to improve data emission handling in yield like way so as to help with test covering too. We can revert it if we find that it is not of our liking.

jsoriano · 2021-07-13T09:45:40Z

@ChrsMark thanks for working on this. Before going deep into implementation details, I would like to raise a concern, do we have plans to support discovery of multiple resources?

With current configuration it doesn't seem to be possible. There is no way to declare multiple providers of the same kind as in Beats, and resource supports a single kind of resource.

providers.kubernetes:
  resource: pod
  scope: cluster
  kube_config: /Users/chrismark/.kube/config

I would see two options for that, one would be to make resource a list, and then make one provider to be able to discover multiple kinds of resources. But then for each kind of resource we may need be able to define different scopes or even configs.
The other option would be to have different providers, one for each kind of resource, so we would have providers such as kubernetes_pod and kubernetes_node, kubernetes_service and so on... then we would be able to have one provider for each kind of resource. But this may be confusing when using common kubernetes fields as the namespace in the config (should it be kubernetes_pod.namespace and kubernetes_service.namespace, or kubernetes.namespace for both providers?).

Of course, with current implementation there is also the option of running multiple agents, one for each kind of resource, but this may be a bit overkill and cumbersome for users.

ChrsMark · 2021-07-13T11:24:08Z

That's a fair point @jsoriano , thanks for bringing this up! Especially now that same Agent will run collectors for logs+metrics+uptime at the same time, using the same config, we should be able to provide such flexibility to our users.

I think that splitting into different providers per k8s resource would make sense but fields should be reported under the same namespace like kubernetes.*. Maybe this is doable by setting the target to kubernetes at

beats/x-pack/elastic-agent/pkg/composable/providers/kubernetes/kubernetes.go

Line 164 in eb758b9

"target": "kubernetes",

. But in dynamic variables resolution we would need to refer to each namespace explicitly , ie ${kubernetes_service.service.name} == 'kube-controller-manager', which I'm not if is something we would like.

Other way to tackle this is to enable support for defining k8s provider multiple times but this most probably should happen on controller provider's layer.

@exekias any thoughts on this?

exekias · 2021-07-21T07:33:53Z

+	// Check if resource is service. If yes then default the scope to "cluster".
+	if c.Resources.Service != nil {
+		if c.Scope == "node" {
+			logp.L().Warnf("can not set scope to `node` when using resource `Service`. resetting scope to `cluster`")


Should this logger be namespaced? I also see logger widely used in agent, not sure if there is any preference here

exekias · 2021-07-21T07:35:20Z

+type ResourceConfig struct {
+	KubeConfig     string        `config:"kube_config"`
+	Namespace      string        `config:"namespace"`
+	SyncPeriod     time.Duration `config:"sync_period"`
+	CleanupTimeout time.Duration `config:"cleanup_timeout" validate:"positive"`
+
+	// Needed when resource is a Pod or Node
+	Node string `config:"node"`


I'm wondering about use cases for resource specific settings, do you have anyone in mind?

Now that I'm thinking of it again, maybe it is over-engineering to provide this option at the moment since the base config shared for all the resources should cover the cases. Flexibility for different accesses per resource or variant settings options would be nice to think of but we can and see if users actually need them. So, I will change it and move to single config for all of the resources.

exekias · 2021-07-21T07:36:30Z

+func (c *Config) Validate() error {
+	// Check if resource is service. If yes then default the scope to "cluster".
+	if c.Resources.Service != nil {
+		if c.Scope == "node" {


It's interesting that you can override almost all settings per resource, but not scope

See #26801 (comment).

exekias · 2021-07-21T07:38:10Z

+		Node:           c.Node,
+	}
+	if c.Resources.Pod == nil {
+		c.Resources.Pod = baseCfg


what if the user only overrides resources.pod.namespace? Does that mean that the rest of settings will be empty?

See #26801 (comment)

exekias · 2021-07-21T07:44:03Z

+func (n *node) emitRunning(node *kubernetes.Node) {
+	data := generateNodeData(node)
+	data.mapping["scope"] = n.scope
+	if data == nil {


Can this happen taking the previous line into account?

exekias · 2021-07-21T07:45:36Z

+		n.emitStopped(node)
+		n.emitRunning(node)


In theory emitRunning should be enough here, right? This will AddOrUpdate

exekias · 2021-07-21T07:46:26Z

+	return false
+}
+
+func getAddress(node *kubernetes.Node) string {


A explanatory comment here would help

exekias · 2021-07-21T07:59:45Z

+	// Pass annotations to all events so that it can be used in templating and by annotation builders.
+	annotations := common.MapStr{}
+	for k, v := range node.GetObjectMeta().GetAnnotations() {
+		safemapstr.Put(annotations, k, v)


Should we be dedotting these?

I was thinking about adding this when we will deal with metadata in general, but it's ok to add it now. Adding.

exekias · 2021-07-21T07:59:58Z

+		"node": map[string]interface{}{
+			"uid":         string(node.GetUID()),
+			"name":        node.GetName(),
+			"labels":      node.GetLabels(),


Same for labels, should we dedot?

Same as above!

exekias · 2021-07-21T08:01:55Z

+
+	providerDataChan := make(chan providerData)
+	done := make(chan bool, 1)
+	go generateContainerData(pod, containers, containerstatuses, providerDataChan, done)


Why using a channel for this? What would you think about emitting directly from the function?

Channel usage helps to isolate the generator function so as to be possible to be tested with unit tests, following a yield-like approach.

understood, you could also build a mocked emitter passed as a parameter to retrieve the results in the tests, right?

Ah, good idea. Something like 6f39378?

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark · 2021-07-21T09:26:53Z

Comments addressed and tested locally following the updated scenarios listed in the PR's description.

Signed-off-by: chrismark <chrismarkou92@gmail.com>

jsoriano

This is looking good. Added a question about the future of dedotting and the possibility of using flattened type instead.

Also I think we still need to polish some use cases that we recently polished in Beats, as discovery of short-living pods, crashing containers, ephemeral containers and so on, but this can be done as follow ups. elastic/e2e-testing#1090 will help to validate these use cases 😇

jsoriano · 2021-07-21T11:31:27Z

-	// Scope of the provider (cluster or node)
-	Scope string `config:"scope"`
+	LabelsDedot      bool `config:"labels.dedot"`
+	AnnotationsDedot bool `config:"annotations.dedot"`


@exekias @ChrsMark wdyt about start using the flattened type in Agent instead of dedotting? There are still some trade-offs but they should be eventually addressed. Hopefully flattened is the future for this kind of fields.
More context here: https://github.com/elastic/obs-dc-team/issues/461

I'm not completely aware of pros/cons but with a quick view flattened type looks better than dedoting to me, and sounds like a good idea regarding timing to do this change now.

taking this into account I'm inclined to leave dedotting out of this PR and investigate the experience when using flattened for these fields, any thoughts? Also, in case we end up introducing it, I would like to be opinionated here, and avoid adding any config parameter for it.

I'm particularly concerned about doing things like grouping some metric by a label, which is a valid use. I'm less concerned about annotations...

I'm +1 in leaving this out for now and open a follow-up issue to work on for dedotting/flattening

jsoriano · 2021-07-21T11:52:54Z

 // InitDefaults initializes the default values for the config.
 func (c *Config) InitDefaults() {
-	c.SyncPeriod = 10 * time.Minute
 	c.CleanupTimeout = 60 * time.Second


We may be hitting this issue #20543, not sure how we can do something now depending on the kind of data we are collecting.

Yeap, the provider is not really aware of the inputs right now, but maybe it could be handled better in the future if we introduce a "smart" controller which enables providers according to the inputs.

jsoriano · 2021-07-21T11:56:34Z

+	p.emitContainers(pod, pod.Spec.Containers, pod.Status.ContainerStatuses)
+
+	// TODO deal with init containers stopping after initialization
+	p.emitContainers(pod, pod.Spec.InitContainers, pod.Status.InitContainerStatuses)


TODO: add support for ephemeral containers.

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark · 2021-07-21T12:37:28Z

This is looking good. Added a question about the future of dedotting and the possibility of using flattened type instead.

Also I think we still need to polish some use cases that we recently polished in Beats, as discovery of short-living pods, crashing containers, ephemeral containers and so on, but this can be done as follow ups. elastic/e2e-testing#1090 will help to validate these use cases 😇

Thanks! Testing should be improved for sure and in a more complete/e2e approach. We have in our backlog elastic/e2e-testing#1090 which can cover this need I think, and I'm thinking that this could be better to be implemented after we have a more complete codebase including metadata handling too.

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark · 2021-07-22T07:15:05Z

@exekias @jsoriano I removed dedoting settings for now so as to better evaluate the possible usage of flattened in a separate issue. I will create follow-up issues for all the TODOs added in this PR.

Let me know if there is anything else missing :).

jsoriano

I think this can be merged, and we can iterate on details and specific use cases in future PRs.

blakerouse

Looks good, thanks for all the fixes and changes!

(cherry picked from commit 6635acb)

… (#27014) (cherry picked from commit 6635acb) Co-authored-by: Chris Mark <chrismarkou92@gmail.com>

ChrsMark added 3 commits July 8, 2021 17:20

First refactoring to the extandable provider

8e06537

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Add node discovery

8737d7b

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Add servcie discovery

8ff2e13

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark added Team:Integrations Label for the Integrations team autodiscovery kubernetes Enable builds in the CI for kubernetes v7.15.0 backport-v7.15.0 Automated backport with mergify labels Jul 9, 2021

ChrsMark self-assigned this Jul 9, 2021

botelastic Bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jul 9, 2021

ChrsMark added 2 commits July 9, 2021 12:08

Add changelog entry

743348b

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Add changelog entry

a7e6ccf

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Fix lint

eb4ffa0

Signed-off-by: chrismark <chrismarkou92@gmail.com>

MichaelKatsoulis approved these changes Jul 9, 2021

View reviewed changes

Fix service watcher

bd6956c

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark marked this pull request as ready for review July 12, 2021 07:39

ChrsMark requested a review from jsoriano July 12, 2021 07:40

ChrsMark added 6 commits July 12, 2021 11:44

Refactor pod provider and add tests

5dd5b94

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Refactor node provider and add tests

6e71318

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Refactor svc provider and add tests

fc494de

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Add annotations for pod too

4f12589

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Improve Pod's watcher options

929c183

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Improve container data emission

43f5136

Signed-off-by: chrismark <chrismarkou92@gmail.com>

exekias reviewed Jul 21, 2021

View reviewed changes

ChrsMark added 2 commits July 21, 2021 12:09

review comments

88260fd

Signed-off-by: chrismark <chrismarkou92@gmail.com>

fix config defaults logic

2823bb6

Signed-off-by: chrismark <chrismarkou92@gmail.com>

ChrsMark requested a review from exekias July 21, 2021 09:27

ChrsMark added 2 commits July 21, 2021 12:45

lint

f778936

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Remove usage of channels

6f39378

Signed-off-by: chrismark <chrismarkou92@gmail.com>

jsoriano reviewed Jul 21, 2021

View reviewed changes

Add TODO comment about ephemeral containers

d42013f

Signed-off-by: chrismark <chrismarkou92@gmail.com>

Remove dedoting for now

701236c

Signed-off-by: chrismark <chrismarkou92@gmail.com>

jsoriano approved these changes Jul 22, 2021

View reviewed changes

blakerouse approved these changes Jul 22, 2021

View reviewed changes

exekias approved these changes Jul 22, 2021

View reviewed changes

ChrsMark merged commit 6635acb into elastic:master Jul 22, 2021

mergify Bot pushed a commit that referenced this pull request Jul 22, 2021

[Agent] Support Node and Service autodiscovery in k8s provider (#26801)

eab9331

(cherry picked from commit 6635acb)

mergify Bot mentioned this pull request Jul 22, 2021

[7.x](backport #26801) [Agent] Support Node and Service autodiscovery in k8s provider #27014

Merged

This was referenced Jul 22, 2021

[Agent] Support labels dedot in k8s provider #27019

Closed

[Agent] Support ephemeral containers in k8s provider #27020

Closed

ChrsMark added a commit that referenced this pull request Jul 23, 2021

[Agent] Support Node and Service autodiscovery in k8s provider (#26801)…

de41823

… (#27014) (cherry picked from commit 6635acb) Co-authored-by: Chris Mark <chrismarkou92@gmail.com>

Conversation

ChrsMark commented Jul 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why is it important?

Checklist

How to test this PR locally

Test with node resource`:

Test with service resource.

Test with pod

Test with service resource and node scope.

Test with pod at node scope and define node

Related issues

Uh oh!

elasticmachine commented Jul 9, 2021 • edited by jenkins-beats-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

Uh oh!

MichaelKatsoulis left a comment

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Jul 12, 2021

Uh oh!

ChrsMark commented Jul 12, 2021

Uh oh!

ChrsMark commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsoriano commented Jul 13, 2021

Uh oh!

ChrsMark commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChrsMark commented Jul 21, 2021

Uh oh!

jsoriano left a comment

Choose a reason for hiding this comment

ChrsMark commented Jul 9, 2021 •

edited

Loading

Test with `node` resource`:

Test with `service` resource.

Test with `pod`

Test with `service` resource and `node` scope.

Test with `pod` at node scope and define node

elasticmachine commented Jul 9, 2021 •

edited by jenkins-beats-ci Bot

Loading

ChrsMark commented Jul 13, 2021 •

edited

Loading

ChrsMark commented Jul 13, 2021 •

edited

Loading