When Fleet installs Elasticsearch ingest assets (index and component templates, ingest pipelines, ILM policies, etc.) for a package, we're currently bottlenecked by queueing behavior on cluster state updates as observed in this issue: elastic/kibana#110500 (comment)
This is causing some package installs to take upwards of 30s. This is a problem for Fleet, Kibana, and Elastic Agent for two primary reasons:
- We need the ability to upgrade packages on Kibana upgrades to keep some ingest assets in sync with the rest of the Stack (eg. assets used by APM Server or Elastic Agents themselves for monitoring).
- We also likely will want the ability to automatically downgrade packages and reinstall older version of assets when there was an issue with a Kibana upgrade that requires a rollback to the previous Kibana version. This would require that we re-write all ingest assets in Elasticsearch to be sure they're compatible with the older Kibana version.
For both of these use cases, if this process is slow, Kibana upgrades and rollbacks will be too slow and possibly time out depending on the configuration of the orchestration layer.
When executing Fleet's setup process which installs the system package, we're seeing cluster state updates take ~150ms each on a single node cluster running on the same machine as Kibana. See the node stats results taken here before and after the setup process: node_stats.zip, es_logs.zip
@DaveCTurner mentioned that one way we could optimize this is by providing a bulk API to batch these cluster state updates in a single write.
When Fleet installs Elasticsearch ingest assets (index and component templates, ingest pipelines, ILM policies, etc.) for a package, we're currently bottlenecked by queueing behavior on cluster state updates as observed in this issue: elastic/kibana#110500 (comment)
This is causing some package installs to take upwards of 30s. This is a problem for Fleet, Kibana, and Elastic Agent for two primary reasons:
For both of these use cases, if this process is slow, Kibana upgrades and rollbacks will be too slow and possibly time out depending on the configuration of the orchestration layer.
When executing Fleet's setup process which installs the
systempackage, we're seeing cluster state updates take ~150ms each on a single node cluster running on the same machine as Kibana. See the node stats results taken here before and after the setup process: node_stats.zip, es_logs.zip@DaveCTurner mentioned that one way we could optimize this is by providing a bulk API to batch these cluster state updates in a single write.