Cherry-pick #24742 to 7.x: Refactor kubernetes autodiscover to avoid skipping short-living pods by jsoriano · Pull Request #25167 · elastic/beats

jsoriano · 2021-04-20T08:57:16Z

Cherry-pick of PR #24742 to 7.x branch. Original message:

What does this PR do?

Refactor logic in kubernetes autodiscover that decides when to generate events to try to address #22718.

Kubernetes autodiscover can generate events without network information now (without host or port/ports). This allows to generate events for pods that haven't started yet, or have succeeded/failed before generating a running event. These events still include the container id, so they can be used to collect logs. Still, no start event is generated if no pod ip and no container ids are available.

Some helpers have been added to obtain relevant information from pods and their containers.

Some additional small refactors are done to improve readability.

Why is it important?

Current logic is checking similar things at the pod and container levels, try to simplify this logic focusing in containers only.

No matter what is the state of the pod, if there is a container running or trying to run, even if it is unhealthy, some configuration should be generated, so logs can be collected.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works. But more tests would be needed.
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

Fix update so it definitely stops configurations, respecting the cleanup timeout.

Double-check that nothing breaks with events without data.host.
This is logged at the debug level:

2021-04-08T12:53:34.453+0200	DEBUG	[autodiscover]	template/config.go:156	Configuration template cannot be resolved: field 'data.host' not available in event or environment accessing 'hosts' (source:'/home/jaime/tmp/metricbeat-autodiscover-k8s.yml')

Actually check that the refactor solves the issues and doesn't break anything else.
Try to add more testing. A couple of cases added to current tests.
Check that nothing breaks with hints including data.host or data.ports.
- data.ports.* doesn't seem to be working, check if this is a regression. Named ports were not added to container events. They are now.
- Multiple configurations generated for pods with multiple containers, expected? compare with master. False alarm.
Check autodiscover with heartbeat.

Author's notes for the future

When using autodiscover templates, if the condition matches pod conditions, container events will match too because they also include pod metadata. In pods with multiple containers this may lead to events with metadata of the incorrect container. This could be possibly avoided by matching per pod metadata only in the pod events. This is also happening in master, not solving in this PR.
data.ports is not included in container-level events, so it doesn't work when conditions for specific containers are used.

How to test this PR locally

Check that everything keeps running with some autodiscover happy cases.
Check that logs are collected from a pod in CrashLoopBackOff (kubectl run --image=redis redis -- echo foo).

Related issues

Closes Kubernetes autodiscover doesn't discover short living jobs (and pods?) #22718.

Use cases

Collect logs from containers in short-living or failing pods.

…lastic#24742) Refactor logic in kubernetes autodiscover that decides when to generate events to try to address issues with short-living containers. Kubernetes autodiscover can generate events without network information now (without host or port/ports). This allows to generate events for pods that haven't started yet, or have succeeded/failed before generating a running event. These events still include the container id, so they can be used to collect logs. Still, no start event is generated if no pod ip and no container ids are available. Some helpers have been added to obtain relevant information from pods and their containers. Some additional small refactors are done to improve readability. (cherry picked from commit b4b6e6e)

elasticmachine · 2021-04-20T08:57:19Z

Pinging @elastic/integrations (Team:Integrations)

elasticmachine · 2021-04-20T11:37:40Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #25167 opened
Start Time: 2021-04-20T08:58:56.446+0000
Duration: 158 min 47 sec
Commit: 0a8c5b9

Test stats 🧪

Test	Results
Failed	0
Passed	46992
Skipped	5176
Total	52168

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	46992
Skipped	5176
Total	52168

jsoriano added [zube]: In Review backport Team:Integrations Label for the Integrations team labels Apr 20, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 20, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 20, 2021

zube bot added [zube]: Inbox and removed [zube]: In Review labels Apr 20, 2021

jsoriano merged commit 2f4b0ce into elastic:7.x Apr 20, 2021

zube bot added [zube]: Done and removed [zube]: Inbox labels Apr 20, 2021

jsoriano deleted the backport_24742_7.x branch April 20, 2021 13:23

zube bot removed the [zube]: Done label Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick #24742 to 7.x: Refactor kubernetes autodiscover to avoid skipping short-living pods#25167

Cherry-pick #24742 to 7.x: Refactor kubernetes autodiscover to avoid skipping short-living pods#25167
jsoriano merged 1 commit intoelastic:7.xfrom
jsoriano:backport_24742_7.x

jsoriano commented Apr 20, 2021 •

edited by zube bot

Loading

Uh oh!

elasticmachine commented Apr 20, 2021

Uh oh!

elasticmachine commented Apr 20, 2021

Build stats

Test stats 🧪

Trends 🧪

Test stats 🧪

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsoriano commented Apr 20, 2021 • edited by zube bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why is it important?

Checklist

Author's Checklist

Author's notes for the future

How to test this PR locally

Related issues

Use cases

Uh oh!

elasticmachine commented Apr 20, 2021

Uh oh!

elasticmachine commented Apr 20, 2021

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsoriano commented Apr 20, 2021 •

edited by zube bot

Loading