Optimize indexing in create once and never update scenarios

The Elasticsearch API supports all CRUD operations on every single document, topped by things like optimistic versioning controls and real time get API. To achieve this we need several data structures that support looking up existing documents both in real time (before a refresh) and if not found there, we need to do a lookup by ID in Lucene to make sure we find and replace any existing document.  For certain use cases, if our users can guarantee that documents are inserted once and are never changed again (like, for example logging, metrics or a one off re-indexing of existing data) this turns out to be a significant overhead in terms of indexing speed, without adding any real value. We sometimes refer to these scenarios as the "append only usecase" as well.

In the past, we have tried to [introduce optimizations](https://github.com/elastic/elasticsearch/pull/5917) in this area, but had to go and [revert them](https://github.com/elastic/elasticsearch/pull/9468) as their have proven to be [unsafe](https://github.com/elastic/elasticsearch/issues/8788) under cetrain race/error condition in our distributed system.

Since then things have evolved and we now have more tools at our disposal. This issue is to track the list of things it will take to optimize the append only use case, while keeping it safe. Most concretely, we should explore not maintaining the [LiveVersionMap](https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/index/engine/LiveVersionMap.java) and always use `addDocument` (rather then `updateDocument`) when indexing to Lucene.  Some of the things listed here are already done. Some require more work.

The list is mostly focused on dealing with the main issue in this cases: when an indexing operation is sent to the primary, it maybe be that the sending node is disconnect from the node with a primary and thus have no idea if the operation was successful or failed. In those cases, the node will resend it's request once the connection is restored. This may result in a duplicate delivery on the primary as thus requires the usage of `updateDocument`. It is easy to add a 'isRetry` flag on those retry requests but we still run the danger of the retry request being processed before the original, in which case we will have duplicates. This scenario needs to be avoided at all costs.
- [x] Disable optimization when shard is recovering and do things as we do today. During recovery document may be written twice to a shard.
- [x] Rely in primary terms to solve collisions between request originating from an old primary and a retry.
- [ ] ~~Rely on the task manager to avoid have an original request and a retry request executing concurrently on a node. For this we need to make sure they use the same task id and reject/have one wait on another.~~

~~Having dealt with concurrency, we need to deal with the case where the retry request is already executed and removed form the task manager. This can be done as follows:~~
- [ ] ~~Introduce a new “connection generation” id (a long) that is incremented every time a connection is (re)established between the nodes. This generation id will be added to every request and will also be exchanged as part of establishing a connection, making sure the target node is aware of the new generation id before exposing the connection outside of the transport service.~~
- [ ] ~~Networking threads on the target node will check that the generation id of a request is still the current one _after_ adding the request to the task manager. If this is not the case the request will be removed from the task manager and will be failed (though there is no way to respond now, because the channel is closed). This guarantees that an old request will never be executed _after_ a new connection was established. This in turn means it can not be processed after a retry request was processed.~~

Instead of the above we went with a simpler approach of dealing with concurrency:
- [x] add a timestamp to each request with an autogen id and create a "safety" barrier guaranteeing that all problematic requests (i.e., an original request that is processed after a retry is request) have a lower timestamp. This allows us to identify them and do the the right things. See details [below](https://github.com/elastic/elasticsearch/issues/19813#issuecomment-243079678)
- [ ] ~~Use an index level settings to indicate to the engine it can disable the `LiveVersionMap` and use `addDocument`, when the shard is started. This can be updated on a closed index.~~
- [ ] ~~Disable real time get API on these indices.~~
- [ ] ~~Disable indexing with an id.~~
- [ ] ~~Disable delete & update API ~~

Instead of disabling single doc update/delete/index with id, we have managed to come up with a scheme to skip adding append only operations to the `LiveVersionMap` and marking it as unsafe. Whenever a non-append only indexing operation comes along we detect that and force a refresh. See https://github.com/elastic/elasticsearch/pull/27752

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize indexing in create once and never update scenarios #19813

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Optimize indexing in create once and never update scenarios #19813

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions