Fix race in FileSettingsServiceIT.testSettingsAppliedOnStart#134368
Merged
elasticsearchmachine merged 2 commits intoelastic:mainfrom Sep 19, 2025
Merged
Conversation
Collaborator
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
…ngsAppliedOnStart
Member
|
@mosche should/can this be backported? |
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Sep 19, 2025
* upstream/main: Turn NumericValues into functional interface (elastic#135068) Improve block loader for source only runtime fields of type keyword (elastic#135026) Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {csv-spec:stats.StdDeviationGroupedAllTypes} elastic#135103 Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {csv-spec:stats.StdDeviationWithLongs} elastic#135102 Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {csv-spec:inlinestats.StdDevFilter} elastic#135101 Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {csv-spec:stats.StdDevFilter} elastic#135100 Remove track_live_docs_in_memory_bytes feature flag (elastic#134900) Create SPI to allow prohibiting certain top-level mappings (elastic#132360) Only validate primary ids on release branches (elastic#135044) Added no-op support for project_routing query param to REST endpoints that will support cross-project search (elastic#134741) Fix race in FileSettingsServiceIT.testSettingsAppliedOnStart (elastic#134368)
mosche
added a commit
to mosche/elasticsearch
that referenced
this pull request
Sep 22, 2025
…#134368) This was failing very very rarely due to unfortunate timing conditions. Cluster state changes are applied to all nodes prior to being published on the master node itself. However, the cluster state listener was previously attached to the data node, allowing for a very short time window where the state update wasn't visible on the master node itself when checking in `assertClusterStateSaveOK`. This changes the test to attach the listener to the master node itself preventing above condition. I was initially worried it might be attached too late in cases, but I couldn't reproduce any more issues this way. > According to the dashboard, this started to fail on Monday (13/07). It definitely does not look like a test failure, so I'm assigning a medium priority, which we could raise if we discover this is a new bug. I couldn't find any related commit that might have caused this. Still wondering why this started failing around that time 🤔 Fixes elastic#131210 (cherry picked from commit a29392c) # Conflicts: # muted-tests.yml
Contributor
Author
💔 Some backports could not be created
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
Contributor
Author
|
@rjernst I've backported to 9.1, older branches don't contain an earlier fix this is based on. Anyways, this fails very rarely and was only ever observed on main |
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 22, 2025
#135196) This was failing very very rarely due to unfortunate timing conditions. Cluster state changes are applied to all nodes prior to being published on the master node itself. However, the cluster state listener was previously attached to the data node, allowing for a very short time window where the state update wasn't visible on the master node itself when checking in `assertClusterStateSaveOK`. This changes the test to attach the listener to the master node itself preventing above condition. I was initially worried it might be attached too late in cases, but I couldn't reproduce any more issues this way. > According to the dashboard, this started to fail on Monday (13/07). It definitely does not look like a test failure, so I'm assigning a medium priority, which we could raise if we discover this is a new bug. I couldn't find any related commit that might have caused this. Still wondering why this started failing around that time 🤔 Fixes #131210 (cherry picked from commit a29392c) # Conflicts: # muted-tests.yml
Member
Can you figure out which change wasn't backported? We should keep the test as in-sync across branches as possible so as to make applying fixes that do fail in older branches easier to backport. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This was failing very very rarely due to unfortunate timing conditions.
Cluster state changes are applied to all nodes prior to being published on the master node itself.
However, the cluster state listener was previously attached to the data node, allowing for a very short time window where the state update wasn't visible on the master node itself when checking in
assertClusterStateSaveOK.This changes the test to attach the listener to the master node itself preventing above condition.
I was initially worried it might be attached too late in cases, but I couldn't reproduce any more issues this way.
I couldn't find any related commit that might have caused this. Still wondering why this started failing around that time 🤔
Fixes #131210