Log warning message during API key update on version mismatch#143127
Log warning message during API key update on version mismatch#143127ebarlas merged 5 commits intoelastic:mainfrom
Conversation
… is greater than target version
|
Pinging @elastic/es-security (Team:Security) |
|
A side effect of this change (and the way it works in prod since assertions are note enabled) is that an upgrade from an ES version < One thing to consider is that if we were to introduce a new format, writing it should be placed behind a node feature and all nodes should be upgraded before writing the new format, so in practice this is probably not a real concern and the warning log is probably enough. Another option could be to completely remove this logging? WDYT @ebarlas ? |
jfreden
left a comment
There was a problem hiding this comment.
LGTM! Left my thoughts in a comment. Regardless the outcome I think the assertion has to be removed.
This is exactly my thinking as well. The version isn't an appropriate instrument for gating changes at this point and a warning is appropriate if version drift is detected. |
During a rolling upgrade, an older node could encounter an API key stamped with a newer version than it recognizes. This hasn't been an issue in practice since API key versions haven't changed, but the assert in maybeBuildIndexRequest treated it as impossible, firing an AssertionError that killed the JVM in CI where assertions are enabled. Replace the assert with a WARN-level log. The update proceeds as before, rewriting the doc with the local node's target version. This matches existing production behavior (where assertions are disabled) but adds observability via the warning.
During a rolling upgrade, an older node could encounter an API key stamped with a newer version than it recognizes. This hasn't been an issue in practice since API key versions haven't changed, but the assert in maybeBuildIndexRequest treated it as impossible, firing an AssertionError that killed the JVM in CI where assertions are enabled. Replace the assert with a WARN-level log. The update proceeds as before, rewriting the doc with the local node's target version. This matches existing production behavior (where assertions are disabled) but adds observability via the warning.
During a rolling upgrade, an older node could encounter an API key stamped with a newer version than it recognizes. This hasn't been an issue in practice since API key versions haven't changed, but the assert in maybeBuildIndexRequest treated it as impossible, firing an AssertionError that killed the JVM in CI where assertions are enabled. Replace the assert with a WARN-level log. The update proceeds as before, rewriting the doc with the local node's target version. This matches existing production behavior (where assertions are disabled) but adds observability via the warning.
…cations * upstream/main: Warn on API key version mismatch (elastic#143127) Fixed wrong malformed value ordering in synthetic source tests (elastic#143187) [ML] Fix: required_native_memory_bytes Calculated with Wrong Allocation Count (elastic#143077) Add configureBenchmarkLogging calls across the various benchmarks (elastic#143185) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_aggregate_metric_double_implicit_casting} elastic#143292 Give system role permission to invoke shard refresh (elastic#143190) Mute testSyntheticSourceWithTranslogSnapshot (elastic#143260) Adds ResumeInfo Tests (elastic#142769) Use a static method to configure benchmark logging (elastic#143056) add connectors release notes (elastic#142884) Add CI triage guidance for AI agents (elastic#142994) ESQL: Data sources: ZSTD, BZIP2 (elastic#143228) [ES|QL] Channels issue when an agg is called with the same field (elastic#142180) (elastic#142269) Add support for project routing in reindex requests (elastic#142240)
During a rolling upgrade, an older node could encounter an API key stamped with a newer version than it recognizes. This hasn't been an issue in practice since API key versions haven't changed, but the assert in maybeBuildIndexRequest treated it as impossible, firing an AssertionError that killed the JVM in CI where assertions are enabled. Replace the assert with a WARN-level log. The update proceeds as before, rewriting the doc with the local node's target version. This matches existing production behavior (where assertions are disabled) but adds observability via the warning.
During a rolling upgrade, an older node could encounter an API key stamped with a newer version than it recognizes. This hasn't been an issue in practice since API key versions haven't changed, but the assert in maybeBuildIndexRequest treated it as impossible, firing an AssertionError that killed the JVM in CI where assertions are enabled. Replace the assert with a WARN-level log. The update proceeds as before, rewriting the doc with the local node's target version. This matches existing production behavior (where assertions are disabled) but adds observability via the warning.
During a rolling upgrade, an older node could encounter an API key stamped with a newer version than it recognizes. This hasn't been an issue in practice since API key versions haven't changed, but the assert in maybeBuildIndexRequest treated it as impossible, firing an AssertionError that killed the JVM in CI where assertions are enabled. Replace the assert with a WARN-level log. The update proceeds as before, rewriting the doc with the local node's target version. This matches existing production behavior (where assertions are disabled) but adds observability via the warning.
During a rolling upgrade, an older node could encounter an API key stamped with a newer version than it recognizes. This hasn't been an issue in practice since API key versions haven't changed, but the assert in maybeBuildIndexRequest treated it as impossible, firing an AssertionError that killed the JVM in CI where assertions are enabled. Replace the assert with a WARN-level log. The update proceeds as before, rewriting the doc with the local node's target version. This matches existing production behavior (where assertions are disabled) but adds observability via the warning.
During a rolling upgrade, an older node could encounter an API key stamped with a newer version than it recognizes. This hasn't been an issue in practice since API key versions haven't changed, but the assert in maybeBuildIndexRequest treated it as impossible, firing an AssertionError that killed the JVM in CI where assertions are enabled. Replace the assert with a WARN-level log. The update proceeds as before, rewriting the doc with the local node's target version. This matches existing production behavior (where assertions are disabled) but adds observability via the warning.
Summary
Replace the
assertinApiKeyService.maybeBuildIndexRequestwith a warning log to prevent node crashes during rolling upgrades when an older node updates an API key created by a newer node.Problem
During a rolling upgrade, an API key created on an upgraded node is stamped with the newer
ApiKey.CURRENT_API_KEY_VERSION. If an update for that key is routed to an older node, the older node'smaybeBuildIndexRequestencounters a doc version greater than its own target version.The existing
assertguard treats this as an impossible state:In CI and test environments where assertions are enabled (
-ea), this fires anAssertionErrorand terminates the JVM, killing the node.In deployed environments where assertions are not enabled, the
assertis a no-op and these updates proceed unchecked. The doc is silently rewritten with the older node's target version, downgrading the version metadata. This has been the production behavior since #104156 decoupled API key versions from node versions.This was the root cause of the cascading test failures. The
ApiKeyBackwardsCompatibilityIT.testCreatingAndUpdatingApiKeystest creates and updates API keys. The assertion crash on the old node then cascades to every subsequent test sharing the cluster, including theSearchableSnapshotsRollingUpgradeITtest named in the issue.Fix
The assert is replaced with a
WARN-level log message. The update proceeds as before: the doc is rewritten with the local node's target version.This matches the existing deployed behavior, just with added observability via the warning log