Replace health request with a state observer.#88641
Replace health request with a state observer.#88641idegtiarenko merged 6 commits intoelastic:masterfrom
Conversation
This was using health request assuming yellow index is one with all primaries initialized. This is not true as newly created index remain yellow when primaries allocation is trottled. This was discovered while working on a new shards allocator.
|
Hi @idegtiarenko, I've created a changelog YAML for you. |
DaveCTurner
left a comment
There was a problem hiding this comment.
LGTM but I'd like Tim to take a look too.
|
Actually I think we should have a test that exposes this change. It looks to me like |
Tim-Brooks
left a comment
There was a problem hiding this comment.
The changes looks fine to me. Agree with David's comment.
|
Pinging @elastic/es-distributed (Team:Distributed) |
| Settings.builder().putNull(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey()).build() | ||
| ) | ||
| .get(); | ||
| } |
There was a problem hiding this comment.
Inline diff is totally confusing, consider reviewing in a split mode.
In short, this test:
- sets
cluster.routing.allocation.node_initial_primaries_recoveries=0so that every created index is throttled - creates an index
- executes
GetGlobalCheckpointsActionwith a timeout - asserts it is complaining about unavailable shards rather then anything else
- reverts the setting to a default value
There was a problem hiding this comment.
I think I could also rewrite it to revert the setting after executing GetGlobalCheckpointsAction and assert it could get result back but I am not sure it would make the test better
There was a problem hiding this comment.
Yes I think we want that. As it stands the test suite still passes on master today (i.e. reverting the production-code changes in this PR). I think we have to check the success case here to see the value in this change.
| .cluster() | ||
| .prepareUpdateSettings() | ||
| .setPersistentSettings( | ||
| Settings.builder().put(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey(), 0).build() |
There was a problem hiding this comment.
Hmm it seems like a bug that we even permit 0 here. I don't see a good reason to do this in production and it would be pretty harmful to do this accidentally. Ok for now, but if we fixed this bug we'd need to find some other way to delay allocation.
| assertEquals("Primary shards were not active [shards=1, active=0]", exception.getMessage()); | ||
| } | ||
|
|
||
| public void testWaitOnPrimaryShardsReady() throws Exception { |
There was a problem hiding this comment.
I think we don't want to remove this test. It's still valid isn't it?
I was expecting us to strengthen this test to verify that creating the index concurrently with running the action still works. It turns out we already have that test, so we can leave this one as-is IMO.
| Settings.builder().putNull(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey()).build() | ||
| ) | ||
| .get(); | ||
| } |
There was a problem hiding this comment.
Yes I think we want that. As it stands the test suite still passes on master today (i.e. reverting the production-code changes in this PR). I think we have to check the success case here to see the value in this change.
...gin/fleet/src/main/java/org/elasticsearch/xpack/fleet/action/GetGlobalCheckpointsAction.java
Show resolved
Hide resolved
* upstream/master: (40 commits) Fix CI job naming [ML] disallow autoscaling downscaling in two trained model assignment scenarios (elastic#88623) Add "Vector Search" area to changelog schema [DOCS] Update API key API (elastic#88499) Enable the pipeline on the feature branch (elastic#88672) Adding the ability to register a PeerFinderListener to Coordinator (elastic#88626) [DOCS] Fix transform painless example syntax (elastic#88364) [ML] Muting InternalCategorizationAggregationTests testReduceRandom (elastic#88685) Fix double rounding errors for disk usage (elastic#88683) Replace health request with a state observer. (elastic#88641) [ML] Fail model deployment if all allocations cannot be provided (elastic#88656) Upgrade to OpenJDK 18.0.2+9 (elastic#88675) [ML] make bucket_correlation aggregation generally available (elastic#88655) Adding cardinality support for random_sampler agg (elastic#86838) Use custom task instead of generic AckedClusterStateUpdateTask (elastic#88643) Reinstate test cluster throttling behavior (elastic#88664) Mute testReadBlobWithPrematureConnectionClose Simplify plugin descriptor tests (elastic#88659) Add CI job for testing more job parallelism [ML] make deployment infer requests fully cancellable (elastic#88649) ...
This was using health request assuming yellow index is one with all primaries
initialized. This is not true as newly created index remain yellow when
primaries allocation is throttled. This was discovered while working on a new
shards allocator.