Skip to content

[Reporting] Document why it's possible to have many version conflict errors in the logs #99994

@tsullivan

Description

@tsullivan

Describe the feature:
New section of documentation to add to https://www.elastic.co/guide/en/kibana/current/reporting-troubleshooting.html#reporting-troubleshooting-error-messages

Why am I seeing version conflict errors in the Kibana server logs?

Every instance of Kibana with Reporting shares the "queue" of incoming report jobs. An instance "claims" a job, and ensures no other instance will claim it, by doing the following:

  1. search for reporting jobs in the index that have pending status
  2. update the job doc and change the status to processing

There is a few milliseconds of time in between step 1 and 2, so what happens if two instances find the same job at the same time? Both instances will attempt to perform the update. Elasticsearch allows the first successful request to actually update. The second request fails, because Elasticsearch detects that the request contains outdated information about current state of the document. The failure will cause something like the following to be logged in the Kibana server logs:

StatusCodeError: [version_conflict_engine_exception] [...]: version conflict, required seqNo [1624], primary term [1]. current document has seqNo [1625] and primary term [1], with { ... }
    at ... {
  status: 409,
  displayName: 'Conflict',
  path: '/.reporting-...',
  query: { if_seq_no: 1624, if_primary_term: 1 },
  body: {
    error: {
      root_cause: [Array],
      type: 'version_conflict_engine_exception',
      reason: '[...]: version conflict, required seqNo [1624], primary term [1]. current document has seqNo [1625] and primary term [1]',
      index_uuid: '...',
      shard: '0',
      index: '.reporting-...'
    },
    status: 409
  },
  statusCode: 409
}

When you see this error, it means that Elasticsearch rejected a document from being updated, because something else updated it first. You can ignore these errors. While one or more instances may log that they failed to claim a job, you'll find that another instance had logged that it was able to claim that job.

"instance-a - Job marked as claimed: /.reporting-.../_doc/SAMPLE_ID"
"instance-b - _claimPendingJobs encountered a version conflict on updating pending job SAMPLE_ID:  [SAMPLE_ID]: version conflict, required seqNo [1618], primary term [1]. current document has seqNo [1619] and primary term [1]
"instance-c - _claimPendingJobs encountered a version conflict on updating pending job SAMPLE_ID:  [SAMPLE_ID]: version conflict, required seqNo [1618], primary term [1]. current document has seqNo [1619] and primary term [1]
"instance-d - _claimPendingJobs encountered a version conflict on updating pending job SAMPLE_ID:  [SAMPLE_ID]: version conflict, required seqNo [1618], primary term [1]. current document has seqNo [1619] and primary term [1]

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsenhancementNew value added to drive a business resultimpact:lowAddressing this issue will have a low level of impact on the quality/strength of our product.loe:mediumMedium Level of EffortzDeprecated Feature:ReportingUse Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions