| 🚩 Note |
I intend to use the main issue description to reflect our growing understanding of the situation, so I will be periodically updating the main issue description to reflect what we discuss. I'll make sure to add a comment denoting that this has occurred, so it's not silently changing. |
A majority of Kibana's entities are persisted in saved-objects. However, there's a growing number of non-saved-object Elasticsearch indices that are being used to store Kibana specific entities. The following are the ones that I'm currently aware of:
- Alerting's event log -
.kibana-event-log-*
- APM agent configuration -
.apm-agent-configuration
- APM custom link -
.apm-custom-link
- Detection engine signals -
.siem-signals-*
- Security solution lists -
.lists and .values
- Reporting -
.reporting-*
I've started this discuss issue to determine what other Elasticsearch indices are being used to store Kibana specific entities, and enumerate the reasons for why they aren't being stored as saved-objects. Saved-objects provide a number of features including migrations, authorization, audit logging, export/import, space awareness, and encrypted attributes that developers forgo when using non-saved-object ES indices.
I'd like to perform this exercise to ensure that there aren't limitations that should be addressed with saved-objects to make them applicable to other use-cases or figure out which current saved-object specific features should be made available when using non-saved-object ES indices.
Reasons we haven't used saved-objects
End-users should be able to query the indices directly
Saved-objects are stored in a "system index", and as such, end-users will not be able to query these indices directly starting in 8.0. Even if end-users could theoretically query system-indices, we treat the ES document format as an implementation detail of saved-objects, and they're prone to change during minor versions in a non-backward compatible manner, so end-users shouldn't be querying them directly.
Applies to: Alerting's event log, Detection engine signals
There are too many saved-objects
The SIEM team has outlined a few of the issues that they experienced when trying to model their lists using saved-objects in #64715. Notably, SavedObjectsClient#find's paging implementation doesn't function properly when there are more than 10k results, which is being tracked by #77961.
Applies to: Security solution lists
Documents are too large
Reporting is using its own dedicated .reporting-* indices because they include base64 encoded data for the generated CSVs, PDFs and PNGs. Since these documents are generally so large, they can't be migrated using saved-object migrations, and they're created on a weekly basis.
Applies to: Reporting
Aggregations
Plugins wanting to run aggregations cannot use the saved objects client (we have made good progress in #64002 but it might take some time for plugins to adopt it).
In addition, it will not be possible to use a query to limit the documents to aggregate over. One workaround is to use a KQL filter, but this impacts performance and is discouraged by the ES team #69172
Applies to: APM Agent Configuration
Filtering on update / delete queries
It's not possible to efficiently delete or update many documents without doing these operations over all documents of a certain saved object type
Filtering on nested fields
Filter validation fails when writing a KQL query for nested field types #81009
A majority of Kibana's entities are persisted in saved-objects. However, there's a growing number of non-saved-object Elasticsearch indices that are being used to store Kibana specific entities. The following are the ones that I'm currently aware of:
.kibana-event-log-*.apm-agent-configuration.apm-custom-link.siem-signals-*.listsand.values.reporting-*I've started this discuss issue to determine what other Elasticsearch indices are being used to store Kibana specific entities, and enumerate the reasons for why they aren't being stored as saved-objects. Saved-objects provide a number of features including migrations, authorization, audit logging, export/import, space awareness, and encrypted attributes that developers forgo when using non-saved-object ES indices.
I'd like to perform this exercise to ensure that there aren't limitations that should be addressed with saved-objects to make them applicable to other use-cases or figure out which current saved-object specific features should be made available when using non-saved-object ES indices.
Reasons we haven't used saved-objects
End-users should be able to query the indices directly
Saved-objects are stored in a "system index", and as such, end-users will not be able to query these indices directly starting in 8.0. Even if end-users could theoretically query system-indices, we treat the ES document format as an implementation detail of saved-objects, and they're prone to change during minor versions in a non-backward compatible manner, so end-users shouldn't be querying them directly.
Applies to: Alerting's event log, Detection engine signals
There are too many saved-objects
The SIEM team has outlined a few of the issues that they experienced when trying to model their lists using saved-objects in #64715. Notably,
SavedObjectsClient#find's paging implementation doesn't function properly when there are more than 10k results, which is being tracked by #77961.Applies to: Security solution lists
Documents are too large
Reporting is using its own dedicated
.reporting-*indices because they include base64 encoded data for the generated CSVs, PDFs and PNGs. Since these documents are generally so large, they can't be migrated using saved-object migrations, and they're created on a weekly basis.Applies to: Reporting
Aggregations
Plugins wanting to run aggregations cannot use the saved objects client (we have made good progress in #64002 but it might take some time for plugins to adopt it).
In addition, it will not be possible to use a query to limit the documents to aggregate over. One workaround is to use a KQL filter, but this impacts performance and is discouraged by the ES team #69172
Applies to: APM Agent Configuration
Filtering on update / delete queries
It's not possible to efficiently delete or update many documents without doing these operations over all documents of a certain saved object type
Filtering on
nestedfieldsFilter validation fails when writing a KQL query for nested field types #81009