Our documentation around snapshots, system indices, and system features is unclear. The technically correct information exists in the API documentation, but mostly under the information about the feature_states parameter. I don't believe there is a high-level summary about what a system feature is, what it means to take a snapshot of one, and how to include or exclude it from a snapshot restoration.
For example, here is what we currently have in the "Create a snapshot" page:
Besides creating a copy of each data stream and index, the snapshot process can also store global cluster metadata, which includes persistent cluster settings, templates, and data stored in system indices, such as Watches and task records, regardless of whether those system indices are named in the indices section of the request. You can also use the create snapshot API’s feature_states parameter to include only a subset of system indices in the snapshot. Snapshots do not store transient settings or registered snapshot repositories.
User stories to document
First, a few definitions from the developer's perspective:
- In the codebase, a "system feature" is a component that defines one or more system indices, associated indices, or system data streams, alongside code for various management operations for those indices and data streams.
- "Feature state" means the system indices, associated indices, and system data streams of a feature at a given time, for example, all of the backing data that Kibana stores in Elasticsearch.
- A "system index" is an index that is meant to be hidden from users. In 7.x, this means that you get deprecation warnings when you access them. In 8.0, you have to have special permissions to access them directly. Eventually, we don't want there to be any direct access. (Due to a quirk of development, there is no direct access to the GeoIP system index in 7.x)
- A "system data stream" is a data stream that is hidden from users but used by a system component. Currently, only Fleet has a system data stream.
- An "associated index" is an index that the feature uses, but that doesn't need the protections provided to system indices. Often this is because it contains information that we want end-users to be able to see and search. Such indices are also included in snapshots of "feature state."
We hope that "feature" and "feature state" are the main concepts with which users needs to concern themselves.
Snapshot
- A user wants to make a snapshot that includes the system configuration and data for one or more system components, for example, Elasticsearch security or Kibana.
- The user calls the GET features api to see which system features are present in the system. (Under the hood, a "system feature" is usually defined in an x-pack plugin, but that is an implementation detail which already has one exception and is subject to further change.)
- The user decides to include a feature state in their snapshot request using the
feature_states parameter.
- Once the snapshot is taken, the user can see the included system indices using the GET snapshot API.
When the snapshot has include_global_state set to true, all feature states (meaning, all system indices, associated indices, and system data streams) are included in the snapshot. Snapshots with global state have already proved tricky to restore into new clusters, often because a system index in the global state clashes with something already named in the cluster.
Restore
- A user wants to reset a feature to a previous state, or restore a feature's settings into a new cluster.
- The user calls the GET snapshot API to see which features are included in the snapshot of interest.
- The user issues a restore request with the names of those features in the
feature_states request parameter.
Where users are running into trouble at the moment is when system indices they don't really care about clash with an index that is already in the cluster. Many users want to know how to exclude a particular system index from the restore operation. Our desired solution is that the user could exclude the feature that owns the system index. Unfortunately, we don't have an excluded_feature_states request parameter; the only way to exclude a feature state right now is to put all the feature states except that one in the feature_states parameter.
Tricky cases
- In 7.x, snapshotting and restoring all indices includes system indices for backwards compatibility. Users might not expect this. The documented behavior is as follows:
** Request snapshots or restores all indices, with no feature_states or include_global_state parameter: all feature states included
** Request snapshots or restores one specific index, with include_global_state set to true: all feature states included
** Request snapshots or restores all indices, with include_global_state set to false: no feature states included
** Request snapshots or restores all indices, with feature_states: []: no feature states included
** Request snapshots or restores all indices, with feature_states: ["none"]: no feature states included
- I honestly can't remember what happens if a snapshot has a feature state that the cluster we are restoring to lacks. I don't think this can happen in practice, but it could in the feature if users have custom plugins that define system indices.
- For normal indices, a user must explicitly close the index before it can be restored from a snapshot. System indices are different; they are automatically closed and overwritten during the restore operation. In hindsight, I can see that this might have unintended consequences.
cc: @gwbrown, @lockewritesdocs, @debadair
Our documentation around snapshots, system indices, and system features is unclear. The technically correct information exists in the API documentation, but mostly under the information about the
feature_statesparameter. I don't believe there is a high-level summary about what a system feature is, what it means to take a snapshot of one, and how to include or exclude it from a snapshot restoration.For example, here is what we currently have in the "Create a snapshot" page:
User stories to document
First, a few definitions from the developer's perspective:
We hope that "feature" and "feature state" are the main concepts with which users needs to concern themselves.
Snapshot
feature_statesparameter.When the snapshot has
include_global_stateset to true, all feature states (meaning, all system indices, associated indices, and system data streams) are included in the snapshot. Snapshots with global state have already proved tricky to restore into new clusters, often because a system index in the global state clashes with something already named in the cluster.Restore
feature_statesrequest parameter.Where users are running into trouble at the moment is when system indices they don't really care about clash with an index that is already in the cluster. Many users want to know how to exclude a particular system index from the restore operation. Our desired solution is that the user could exclude the feature that owns the system index. Unfortunately, we don't have an
excluded_feature_statesrequest parameter; the only way to exclude a feature state right now is to put all the feature states except that one in thefeature_statesparameter.Tricky cases
** Request snapshots or restores all indices, with no
feature_statesorinclude_global_stateparameter: all feature states included** Request snapshots or restores one specific index, with
include_global_stateset to true: all feature states included** Request snapshots or restores all indices, with
include_global_stateset to false: no feature states included** Request snapshots or restores all indices, with
feature_states: []: no feature states included** Request snapshots or restores all indices, with
feature_states: ["none"]: no feature states includedcc: @gwbrown, @lockewritesdocs, @debadair