Skip to content

Log warning message during API key update on version mismatch#143127

Merged
ebarlas merged 5 commits intoelastic:mainfrom
ebarlas:log-warning-on-api-key-version-mismatch
Feb 27, 2026
Merged

Log warning message during API key update on version mismatch#143127
ebarlas merged 5 commits intoelastic:mainfrom
ebarlas:log-warning-on-api-key-version-mismatch

Conversation

@ebarlas
Copy link
Copy Markdown
Contributor

@ebarlas ebarlas commented Feb 26, 2026

Summary

Replace the assert in ApiKeyService.maybeBuildIndexRequest with a warning log to prevent node crashes during rolling upgrades when an older node updates an API key created by a newer node.

Problem

During a rolling upgrade, an API key created on an upgraded node is stamped with the newer ApiKey.CURRENT_API_KEY_VERSION. If an update for that key is routed to an older node, the older node's maybeBuildIndexRequest encounters a doc version greater than its own target version.

The existing assert guard treats this as an impossible state:

assert currentDocVersion.onOrBefore(targetDocVersion)

In CI and test environments where assertions are enabled (-ea), this fires an AssertionError and terminates the JVM, killing the node.

In deployed environments where assertions are not enabled, the assert is a no-op and these updates proceed unchecked. The doc is silently rewritten with the older node's target version, downgrading the version metadata. This has been the production behavior since #104156 decoupled API key versions from node versions.

This was the root cause of the cascading test failures. The ApiKeyBackwardsCompatibilityIT.testCreatingAndUpdatingApiKeys test creates and updates API keys. The assertion crash on the old node then cascades to every subsequent test sharing the cluster, including the SearchableSnapshotsRollingUpgradeIT test named in the issue.

Fix

The assert is replaced with a WARN-level log message. The update proceeds as before: the doc is rewritten with the local node's target version.

This matches the existing deployed behavior, just with added observability via the warning log

@ebarlas ebarlas self-assigned this Feb 26, 2026
@ebarlas ebarlas added >test Issues or PRs that are addressing/adding tests :Security/Security Security issues without another label Team:Security Meta label for security team auto-backport Automatically create backport pull requests when merged v9.1.11 v8.19.13 v9.2.7 labels Feb 26, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-security (Team:Security)

@ebarlas ebarlas added v9.3.2 and removed v9.1.11 labels Feb 26, 2026
@jfreden
Copy link
Copy Markdown
Contributor

jfreden commented Feb 27, 2026

A side effect of this change (and the way it works in prod since assertions are note enabled) is that an upgrade from an ES version < 8_15_00_99 can hit this condition. We'll overwrite a key created with a new format (and probably already returned to the caller) with another format. The purpose of the format is to make sure we can conditionally deserialize a stored key, so hopefully any new format can support an old format for bwc reasons.

One thing to consider is that if we were to introduce a new format, writing it should be placed behind a node feature and all nodes should be upgraded before writing the new format, so in practice this is probably not a real concern and the warning log is probably enough. Another option could be to completely remove this logging? WDYT @ebarlas ?

Copy link
Copy Markdown
Contributor

@jfreden jfreden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left my thoughts in a comment. Regardless the outcome I think the assertion has to be removed.

@ebarlas
Copy link
Copy Markdown
Contributor Author

ebarlas commented Feb 27, 2026

so in practice this is probably not a real concern and the warning log is probably enough

This is exactly my thinking as well. The version isn't an appropriate instrument for gating changes at this point and a warning is appropriate if version drift is detected.

@ebarlas ebarlas merged commit 22ba074 into elastic:main Feb 27, 2026
41 checks passed
ebarlas added a commit to ebarlas/elasticsearch that referenced this pull request Feb 27, 2026
During a rolling upgrade, an older node could encounter an
API key stamped with a newer version than it recognizes.
This hasn't been an issue in practice since API key versions
haven't changed, but the assert in maybeBuildIndexRequest
treated it as impossible, firing an AssertionError that
killed the JVM in CI where assertions are enabled.

Replace the assert with a WARN-level log. The update
proceeds as before, rewriting the doc with the local
node's target version. This matches existing production
behavior (where assertions are disabled) but adds
observability via the warning.
ebarlas added a commit to ebarlas/elasticsearch that referenced this pull request Feb 27, 2026
During a rolling upgrade, an older node could encounter an
API key stamped with a newer version than it recognizes.
This hasn't been an issue in practice since API key versions
haven't changed, but the assert in maybeBuildIndexRequest
treated it as impossible, firing an AssertionError that
killed the JVM in CI where assertions are enabled.

Replace the assert with a WARN-level log. The update
proceeds as before, rewriting the doc with the local
node's target version. This matches existing production
behavior (where assertions are disabled) but adds
observability via the warning.
ebarlas added a commit to ebarlas/elasticsearch that referenced this pull request Feb 27, 2026
During a rolling upgrade, an older node could encounter an
API key stamped with a newer version than it recognizes.
This hasn't been an issue in practice since API key versions
haven't changed, but the assert in maybeBuildIndexRequest
treated it as impossible, firing an AssertionError that
killed the JVM in CI where assertions are enabled.

Replace the assert with a WARN-level log. The update
proceeds as before, rewriting the doc with the local
node's target version. This matches existing production
behavior (where assertions are disabled) but adds
observability via the warning.
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
9.3
8.19
9.2

szybia added a commit to szybia/elasticsearch that referenced this pull request Feb 27, 2026
…cations

* upstream/main:
  Warn on API key version mismatch (elastic#143127)
  Fixed wrong malformed value ordering in synthetic source tests (elastic#143187)
  [ML] Fix: required_native_memory_bytes Calculated with Wrong Allocation Count (elastic#143077)
  Add configureBenchmarkLogging calls across the various benchmarks (elastic#143185)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_aggregate_metric_double_implicit_casting} elastic#143292
  Give system role permission to invoke shard refresh (elastic#143190)
  Mute testSyntheticSourceWithTranslogSnapshot (elastic#143260)
  Adds ResumeInfo Tests (elastic#142769)
  Use a static method to configure benchmark logging (elastic#143056)
  add connectors release notes (elastic#142884)
  Add CI triage guidance for AI agents (elastic#142994)
  ESQL: Data sources: ZSTD, BZIP2 (elastic#143228)
  [ES|QL] Channels issue when an agg is called with the same field (elastic#142180) (elastic#142269)
  Add support for project routing in reindex requests (elastic#142240)
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2026
During a rolling upgrade, an older node could encounter an
API key stamped with a newer version than it recognizes.
This hasn't been an issue in practice since API key versions
haven't changed, but the assert in maybeBuildIndexRequest
treated it as impossible, firing an AssertionError that
killed the JVM in CI where assertions are enabled.

Replace the assert with a WARN-level log. The update
proceeds as before, rewriting the doc with the local
node's target version. This matches existing production
behavior (where assertions are disabled) but adds
observability via the warning.
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2026
During a rolling upgrade, an older node could encounter an
API key stamped with a newer version than it recognizes.
This hasn't been an issue in practice since API key versions
haven't changed, but the assert in maybeBuildIndexRequest
treated it as impossible, firing an AssertionError that
killed the JVM in CI where assertions are enabled.

Replace the assert with a WARN-level log. The update
proceeds as before, rewriting the doc with the local
node's target version. This matches existing production
behavior (where assertions are disabled) but adds
observability via the warning.
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2026
During a rolling upgrade, an older node could encounter an
API key stamped with a newer version than it recognizes.
This hasn't been an issue in practice since API key versions
haven't changed, but the assert in maybeBuildIndexRequest
treated it as impossible, firing an AssertionError that
killed the JVM in CI where assertions are enabled.

Replace the assert with a WARN-level log. The update
proceeds as before, rewriting the doc with the local
node's target version. This matches existing production
behavior (where assertions are disabled) but adds
observability via the warning.
neilbhavsar pushed a commit that referenced this pull request Feb 27, 2026
During a rolling upgrade, an older node could encounter an
API key stamped with a newer version than it recognizes.
This hasn't been an issue in practice since API key versions
haven't changed, but the assert in maybeBuildIndexRequest
treated it as impossible, firing an AssertionError that
killed the JVM in CI where assertions are enabled.

Replace the assert with a WARN-level log. The update
proceeds as before, rewriting the doc with the local
node's target version. This matches existing production
behavior (where assertions are disabled) but adds
observability via the warning.
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
During a rolling upgrade, an older node could encounter an
API key stamped with a newer version than it recognizes.
This hasn't been an issue in practice since API key versions
haven't changed, but the assert in maybeBuildIndexRequest
treated it as impossible, firing an AssertionError that
killed the JVM in CI where assertions are enabled.

Replace the assert with a WARN-level log. The update
proceeds as before, rewriting the doc with the local
node's target version. This matches existing production
behavior (where assertions are disabled) but adds
observability via the warning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Security/Security Security issues without another label Team:Security Meta label for security team >test Issues or PRs that are addressing/adding tests v8.19.13 v9.2.7 v9.3.2 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants