-
Notifications
You must be signed in to change notification settings - Fork 25.8k
CCR: Aborted document is exposed in Lucene changes #32269
Copy link
Copy link
Closed
Labels
:Distributed/CCRIssues around the Cross Cluster State Replication featuresIssues around the Cross Cluster State Replication features>bug
Description
The CCR branch started failing frequently after merging #31007. Some CI instances:
- https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+ccr+feature-branch-periodic/1002/console
- https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+ccr+feature-branch-periodic/1000/console
These failures can be explained as follows:
- A user issues an indexing which will throw an exception in the analyzing phase. Since the IndexingChain fails to process a document, the DocumentsWriterPerThread will hard-delete that document internally in Lucene on the primary.
[2018-07-20T18:04:15,095][DEBUG][o.e.a.b.TransportShardBulkAction] [test][0] failed to execute bulk item (index) BulkShardRequest [[test][0]] containing [index {[test][test][2], source[{"suggest_context":{"input":"foo"}}]}]
java.lang.IllegalArgumentException: Contexts are mandatory in context enabled completion field [suggest_context]
- On ES, we make all docs live when reading Lucene changes history (in fact, we can not distinguish between hard-deletes and soft-deletes). If a recovering replica reads the aborted document, it will fail to index that document. In fact, the replica will never be able to complete its recovery.
[2018-07-20T18:04:15,148][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-1] fatal error in thread [elasticsearch[node-1][generic][T#4]], exiting
java.lang.AssertionError: unexpected failure while replicating translog entry: java.lang.IllegalArgumentException: Contexts are mandatory in context enabled completion field [suggest_context]
at org.elasticsearch.indices.recovery.RecoveryTarget.indexTranslogOperations(RecoveryTarget.java:401) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:458) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:448)
The problem is that we read aborted documents which should never be exposed. This might be a critical problem in CCR and Lucene rollbacks.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
:Distributed/CCRIssues around the Cross Cluster State Replication featuresIssues around the Cross Cluster State Replication features>bug
Type
Fields
Give feedbackNo fields configured for issues without a type.