Skip to content

CCR: Aborted document is exposed in Lucene changes #32269

@dnhatn

Description

@dnhatn

The CCR branch started failing frequently after merging #31007. Some CI instances:

These failures can be explained as follows:

  1. A user issues an indexing which will throw an exception in the analyzing phase. Since the IndexingChain fails to process a document, the DocumentsWriterPerThread will hard-delete that document internally in Lucene on the primary.
[2018-07-20T18:04:15,095][DEBUG][o.e.a.b.TransportShardBulkAction] [test][0] failed to execute bulk item (index) BulkShardRequest [[test][0]] containing [index {[test][test][2], source[{"suggest_context":{"input":"foo"}}]}]
java.lang.IllegalArgumentException: Contexts are mandatory in context enabled completion field [suggest_context]
  1. On ES, we make all docs live when reading Lucene changes history (in fact, we can not distinguish between hard-deletes and soft-deletes). If a recovering replica reads the aborted document, it will fail to index that document. In fact, the replica will never be able to complete its recovery.
[2018-07-20T18:04:15,148][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-1] fatal error in thread [elasticsearch[node-1][generic][T#4]], exiting
java.lang.AssertionError: unexpected failure while replicating translog entry: java.lang.IllegalArgumentException: Contexts are mandatory in context enabled completion field [suggest_context]
    at org.elasticsearch.indices.recovery.RecoveryTarget.indexTranslogOperations(RecoveryTarget.java:401) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:458) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:448)

The problem is that we read aborted documents which should never be exposed. This might be a critical problem in CCR and Lucene rollbacks.

/cc @s1monw and @bleskes

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed/CCRIssues around the Cross Cluster State Replication features>bug

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions