Skip to content

Discussion: Tracking changes to index templates, component templates and ingest pipelines #108469

@flash1293

Description

@flash1293

Description

The "Logs+" initiative in Observability tries to make the experience around logs in the Elastic stack as seamless as possible.

An important part of this is detecting and mitigating ingestion issues. Most of the time ingestion issues start because something in the system changed. This can either be a change on the collection side or on the Elasticsearch side (mappings / ingest pipelines were rearranged, fleet integration packages got updated, ...)

When investigating an issue in this area, it would be very helpful to be able to understand what changes were made when things started to go south. There already is a very important building block for this - via the _ignored field and the failure store, it's possible to reconstruct when things started to act up.

The other important part is correlating the occurring errors with changes to the system - in a visual way, this is what I'm trying to get to:
Screenshot 2024-05-09 at 17 06 57

It's already possible to plot the errors over time, what's challenging is to give the user access to the annotations - changes to the configuration of the system. However, having access to this information and correlating both signals should speed up time-to-resolution a lot in a lot of cases. Having this information also would allow to automate or at least to simplify getting back to a working system by rolling back applied changes.

Some rough ideas / thoughts:

  • For each datastream, there could be a hidden .changes index which is written to each time an index template matching the stream, a component template referenced in this index template or an ingest pipeline referenced in it is updated
  • The change documents would need to contain:
    • timestamp of the change
    • delta of the change (what part of the configuration got updated how)
    • metadata about the change (who triggered it)
  • This isn't really something that can live on the Kibana layer - Kibana could track changes made through fleet automation, but it would miss changes that target Elasticsearch APIs directly which can be quite common based on the users setup
  • There are permission and storage concerns - who can access this information and how long should it live?
  • This is slightly distinct from the whole "stack monitoring" use case, as it's ultimately about the soundness of the configuration, not operational concerns - for example even on serverless this kind of information would be relevant to users

Any thoughts @ruflin @dakrone @felixbarny ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions