The Elasticsearch API supports all CRUD operations on every single document, topped by things like optimistic versioning controls and real time get API. To achieve this we need several data structures that support looking up existing documents both in real time (before a refresh) and if not found there, we need to do a lookup by ID in Lucene to make sure we find and replace any existing document. For certain use cases, if our users can guarantee that documents are inserted once and are never changed again (like, for example logging, metrics or a one off re-indexing of existing data) this turns out to be a significant overhead in terms of indexing speed, without adding any real value. We sometimes refer to these scenarios as the "append only usecase" as well.
In the past, we have tried to introduce optimizations in this area, but had to go and revert them as their have proven to be unsafe under cetrain race/error condition in our distributed system.
Since then things have evolved and we now have more tools at our disposal. This issue is to track the list of things it will take to optimize the append only use case, while keeping it safe. Most concretely, we should explore not maintaining the LiveVersionMap and always use addDocument (rather then updateDocument) when indexing to Lucene. Some of the things listed here are already done. Some require more work.
The list is mostly focused on dealing with the main issue in this cases: when an indexing operation is sent to the primary, it maybe be that the sending node is disconnect from the node with a primary and thus have no idea if the operation was successful or failed. In those cases, the node will resend it's request once the connection is restored. This may result in a duplicate delivery on the primary as thus requires the usage of updateDocument. It is easy to add a 'isRetry` flag on those retry requests but we still run the danger of the retry request being processed before the original, in which case we will have duplicates. This scenario needs to be avoided at all costs.
Having dealt with concurrency, we need to deal with the case where the retry request is already executed and removed form the task manager. This can be done as follows:
Instead of the above we went with a simpler approach of dealing with concurrency:
Instead of disabling single doc update/delete/index with id, we have managed to come up with a scheme to skip adding append only operations to the LiveVersionMap and marking it as unsafe. Whenever a non-append only indexing operation comes along we detect that and force a refresh. See #27752
The Elasticsearch API supports all CRUD operations on every single document, topped by things like optimistic versioning controls and real time get API. To achieve this we need several data structures that support looking up existing documents both in real time (before a refresh) and if not found there, we need to do a lookup by ID in Lucene to make sure we find and replace any existing document. For certain use cases, if our users can guarantee that documents are inserted once and are never changed again (like, for example logging, metrics or a one off re-indexing of existing data) this turns out to be a significant overhead in terms of indexing speed, without adding any real value. We sometimes refer to these scenarios as the "append only usecase" as well.
In the past, we have tried to introduce optimizations in this area, but had to go and revert them as their have proven to be unsafe under cetrain race/error condition in our distributed system.
Since then things have evolved and we now have more tools at our disposal. This issue is to track the list of things it will take to optimize the append only use case, while keeping it safe. Most concretely, we should explore not maintaining the LiveVersionMap and always use
addDocument(rather thenupdateDocument) when indexing to Lucene. Some of the things listed here are already done. Some require more work.The list is mostly focused on dealing with the main issue in this cases: when an indexing operation is sent to the primary, it maybe be that the sending node is disconnect from the node with a primary and thus have no idea if the operation was successful or failed. In those cases, the node will resend it's request once the connection is restored. This may result in a duplicate delivery on the primary as thus requires the usage of
updateDocument. It is easy to add a 'isRetry` flag on those retry requests but we still run the danger of the retry request being processed before the original, in which case we will have duplicates. This scenario needs to be avoided at all costs.Rely on the task manager to avoid have an original request and a retry request executing concurrently on a node. For this we need to make sure they use the same task id and reject/have one wait on another.Having dealt with concurrency, we need to deal with the case where the retry request is already executed and removed form the task manager. This can be done as follows:Introduce a new “connection generation” id (a long) that is incremented every time a connection is (re)established between the nodes. This generation id will be added to every request and will also be exchanged as part of establishing a connection, making sure the target node is aware of the new generation id before exposing the connection outside of the transport service.Networking threads on the target node will check that the generation id of a request is still the current one after adding the request to the task manager. If this is not the case the request will be removed from the task manager and will be failed (though there is no way to respond now, because the channel is closed). This guarantees that an old request will never be executed after a new connection was established. This in turn means it can not be processed after a retry request was processed.Instead of the above we went with a simpler approach of dealing with concurrency:
Use an index level settings to indicate to the engine it can disable theLiveVersionMapand useaddDocument, when the shard is started. This can be updated on a closed index.Disable real time get API on these indices.Disable indexing with an id.Instead of disabling single doc update/delete/index with id, we have managed to come up with a scheme to skip adding append only operations to the
LiveVersionMapand marking it as unsafe. Whenever a non-append only indexing operation comes along we detect that and force a refresh. See #27752