Skip to content

guettli/check-conditions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

112 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Check all Conditions

Tiny tool to check all conditions of all resources in your Kubernetes cluster.

Takes only few milliseconds for small clusters running on localhost. Might take longer for large clusters.

Please provide feedback, PRs are welcome.

Background and Goal

I develop Kubernetes controllers in Go. I develop software since ages, but Kubernetes and Go are still a bit new to me.

I as a developer of a controllers want an overview. I want to see the difference between the desired state and the observed state.

If a controller discover that the observed state does not match the desired state, it could ...

... could write logs. But logs are just dust in the wind. After the next reconciliation, the log message will be outdated.

... could emit events. Same here: After the next reconciliation the event could be outdated.

... could write to status.conditions. That's what we currently do. Conditions have the benefit, that a warning disappears, as soon as the desired state is reached. If you look at logs or events, you never know if this represents the current state.

But how to monitor many conditions of many resources?

I found no tool which monitors all conditions of all resource objects. So I wrote this tiny tool.

Executing

go run github.com/guettli/check-conditions@latest all

Use -n/--namespace followed by a namespace name to restrict the checks to that namespace and skip cluster-scoped resources. For example:

go run github.com/guettli/check-conditions@latest all -n kube-system

Terminology

Since I found not good umbrella term for CRDs and core resource types, I use the term CRD.

Related: Kubernetes API Terminology

OK vs Warning?

Which conditions should create output and which conditions are ok and can get ignored?

Up to now the code contains some simple lists.

Examples:

  • *Ready=True will be ignored
  • *Healthy=True will be ignored
  • *Pressure=False will be ignored.

Command "while"

Imagine you want to get a signal if a condition is gone. For example you want to hear music if the condition "StillProvisioning" is gone.

The sub-command "while" takes on optional regex. If no line matches the regex, then command stops.

If you don't provide a regex, then check-conditions while runs forever.

go run github.com/guettli/check-conditions@latest while StillProvisioning; music

The script music needs to be provided by you.

From output to kubectl describe

You just need to copy the first three columns of the output and paste it to kubectl describe -n and then you can have a look at the correspondig resource.

Conditions in Kubernetes

  • There is package which helps working with Conditions in Go: There are functions MarkFalse, MarkTrue to update the conditions. The function SetSummary can get used to set the "Ready" condition according to the other conditions of the resource.

The API convention of Kubernetes about Status and Conditions are more general. Here "True" can mean "healthy" (for example "DiskPressure").

What its's not

My check-conditions tool was written to help me to debug Kubernetes controllers. I develop in the context of Cluster-API, and in this context Pods, Services, Ingres, RBACs, ... are less important. This tool is not a general purpose troubleshooting tool. It can help, but it only checks the conditions.

If you look for a general purpose troubleshooting tool, then try this:

go run github.com/derailed/popeye@latest -A -l warn

derailed/popeye: 👀 A Kubernetes cluster resource sanitizer

TODO

sort output. It is confusing if the second output has a different order than the first output.

check schema of resource before fetching all objects: skip resources which don't have status.conditions.

filter by namespace and labels. Maybe interactively. But is there a way to get all labels of the cluster (without reading all resources)?

Ideas

List all resource of namespace "foo". kubectl get all -n foo does not show CRDs.

Improve filtering: Only particular namespaces, only particular ressources.

Order output, so that results are stable. Maybe by kind, namespace, name.

Check if deletionTimestap is too old.

grep all values in the cluster for a string. Or JSONPath on everything.

Continously watch all resources for changes, monitor all changes.

Write all changes to a storage, so that the changes can get analyzed. With all changes I mean all changes of all resources in all namespaces. Not just conditions.

Report broken ownerRefs.

HTML GUI via localhost.

Negative conditions are ok for a defined time period. Example: It is ok if a Pod needs 20 seconds to start. But it is not ok if it takes 5 minutes.

To make warnings appear sooner after starting the programm (it takes 20 secs even for small clusters), we could use some kind of priority. CRDs which had warnings in the past, should be checked sooner. This state could be stored in $XDG_CACHE_HOME.

Eval more than conditions. Everything should be possible. How to make ignoring or adding some warnings super flexible? The most simple way would be to use Go code.

Related

guettli/watchall: Watch all Kubernetes Resources

About

Check Conditions of all Kubernets Resources

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors