Skip to content

Fix async fire-and-forget in SecurityFeatureResetTests that races with teardown's FixedBitSet check#145063

Merged
ebarlas merged 11 commits intoelastic:mainfrom
ebarlas:flaky-security-feature-reset-tests
Mar 31, 2026
Merged

Fix async fire-and-forget in SecurityFeatureResetTests that races with teardown's FixedBitSet check#145063
ebarlas merged 11 commits intoelastic:mainfrom
ebarlas:flaky-security-feature-reset-tests

Conversation

@ebarlas
Copy link
Copy Markdown
Contributor

@ebarlas ebarlas commented Mar 27, 2026

The tests in SecurityFeatureResetTests fired the feature reset request using an async ActionListener without waiting for the response. This meant:

  1. The test assertions inside the listener might never execute (the test passes regardless of the actual result).
  2. The reset operation could still be in-flight when teardown begins, racing with ensureEstimatedStats() which checks FixedBitSet cache size before assertRequestsFinished() drains pending requests.

The .security-7 index contains a nested field (realm_domain.realms) whose BitsetFilterCache warmer eagerly loads 16-byte FixedBitSet entries. When the async reset hasn't cleaned up the index by the time the teardown check runs, the non-zero bitset memory triggers the flaky assertion.

Fix: use PlainActionFuture to block until the reset completes before the test method returns.

@ebarlas ebarlas self-assigned this Mar 27, 2026
@ebarlas ebarlas added >test Issues or PRs that are addressing/adding tests :Security/Security Security issues without another label Team:Security Meta label for security team v9.4.0 labels Mar 27, 2026
@ebarlas ebarlas marked this pull request as ready for review March 27, 2026 15:38
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-security (Team:Security)

@n1v0lg n1v0lg self-requested a review March 30, 2026 07:40
Copy link
Copy Markdown
Contributor

@n1v0lg n1v0lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One optional nit.

public void testFeatureResetNoManageRole() {
final ResetFeatureStateRequest req = new ResetFeatureStateRequest(TEST_REQUEST_TIMEOUT);

PlainActionFuture<ResetFeatureStateResponse> future = new PlainActionFuture<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total nit but I think the client interface provides a sync method, i.e. we can do:

Exception e = expectThrows(
    Exception.class,
    client().filterWithHeader(Collections.singletonMap(BASIC_AUTH_HEADER, basicAuthHeaderValue("usr", SUPER_USER_PASSWD)))
        .admin()
        .cluster()
        .execute(TransportResetFeatureStateAction.TYPE, req)
);

here.

@ebarlas ebarlas merged commit 0d96a2e into elastic:main Mar 31, 2026
41 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 31, 2026
…rics

* upstream/main: (21 commits)
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352
  [DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324)
  Fix dense_vector default index options when using BFLOAT16 (elastic#145202)
  Use checked exceptions in entitlement constructor rules (elastic#145234)
  ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334)
  Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243)
  Make Settings.Builder.remove() fluent (elastic#145294)
  Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211)
  Fix flaky SecurityFeatureResetTests (elastic#145063)
  [DOCS] Fix conflict markers in ESQL processing command list (elastic#145338)
  Skip certain metric assertions on Windows (elastic#144933)
  [ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220)
  Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784)
  ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109)
  [Transform] Track Linked Projects (elastic#144399)
  Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316)
  Clean up TickerScheduleEngineTests (elastic#145303)
  [CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317)
  Add CRUD doc to the DistributedArchitectureGuide (elastic#144710)
  ...
seanzatzdev pushed a commit to seanzatzdev/elasticsearch that referenced this pull request Mar 31, 2026
Tests fired feature reset requests fire-and-forget via an async
ActionListener, so assertions might never execute and the reset
could still be in-flight during teardown. This raced with
ensureEstimatedStats() checking FixedBitSet cache size before
pending requests were drained, causing a flaky assertion. Fix by
blocking on PlainActionFuture until the reset completes.
seanzatzdev pushed a commit to seanzatzdev/elasticsearch that referenced this pull request Mar 31, 2026
Tests fired feature reset requests fire-and-forget via an async
ActionListener, so assertions might never execute and the reset
could still be in-flight during teardown. This raced with
ensureEstimatedStats() checking FixedBitSet cache size before
pending requests were drained, causing a flaky assertion. Fix by
blocking on PlainActionFuture until the reset completes.
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Apr 1, 2026
Tests fired feature reset requests fire-and-forget via an async
ActionListener, so assertions might never execute and the reset
could still be in-flight during teardown. This raced with
ensureEstimatedStats() checking FixedBitSet cache size before
pending requests were drained, causing a flaky assertion. Fix by
blocking on PlainActionFuture until the reset completes.
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
Tests fired feature reset requests fire-and-forget via an async
ActionListener, so assertions might never execute and the reset
could still be in-flight during teardown. This raced with
ensureEstimatedStats() checking FixedBitSet cache size before
pending requests were drained, causing a flaky assertion. Fix by
blocking on PlainActionFuture until the reset completes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Security/Security Security issues without another label Team:Security Meta label for security team >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants