Add IT for num_reduced_phases with batched query execution#134312
Add IT for num_reduced_phases with batched query execution#134312benchaplin merged 12 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
| return searchRequest.indicesOptions(); | ||
| } | ||
|
|
||
| public List<ShardToQuery> shards() { |
There was a problem hiding this comment.
Consider limiting this to a package-protected scope or exposing only shard.size(), as the test doesn’t need the complete list.
There was a problem hiding this comment.
++ another way to do this would be to inspect the indices service in the test itself, and extract the info about the how many nodes and how many shards per node from there, as opposed to from the request.
There was a problem hiding this comment.
Another another way to go about this test is to run it in a more controlled scenario, along the same lines as Dimi suggested above: decide upfront how many shards and nodes, and make the execution more predictable that way.
There was a problem hiding this comment.
Good call, thanks @drempapis - reduced to package-private and just the size.
There was a problem hiding this comment.
And thanks @javanna, I agree this is cleaner without the request intercepting stuff. I've reworked the test a bit to deduce how many shards aren't batched from cluster state.
| @@ -1,7 +1,4 @@ | |||
| setup: | |||
| - skip: | |||
| awaits_fix: "TODO fix this test, the response with batched execution is not deterministic enough for the available matchers" | |||
There was a problem hiding this comment.
can we just get away with unskipping this ? I was under the impression that it's going to fail. Or does it run in a controlled scenario where the result is predictable?
There was a problem hiding this comment.
If you take a look at #121885 the line that would make this test fail was removed. Now it essentially just tests batched_reduce_size validation. I'm not sure if it's worth keeping, what do you think?
There was a problem hiding this comment.
I see, thanks. I'd keep it. Can we have some simpler check on num_reduce_phases, like greather than some threshold that's easier to predict?
There was a problem hiding this comment.
Don't think we can assume anything about num_reduce_phases now - if it's 1, it's left out of the response entirely. And it can be 1 if all shards are batched.
* upstream/main: Add additional logging to make spotting stats issues easier (elastic#133972) [ESQL] Clean up ESQL enrich landing page (elastic#134820) ES|QL: Make kibana docs for Query settings more consistent (elastic#134881) Add file extension metadata to cache miss counter from SharedBlobCacheService (elastic#134374) Add IT for num_reduced_phases with batched query execution (elastic#134312) Remove `SizeValue` (elastic#134871)
The 120_batch_reduce_size.yml test was skipped with a TODO when batched query execution was introduced in #121885. A key piece of the test was an assertion on the num_reduce_phases in the search response - the assertion was removed as num_reduce_phases now depends on how shards are laid out in the cluster, which may change across test runs.
To know how many reductions occurred, we need to understand the shard layout:
We can do this in an IT that captures the batched transport requests then derives the layout from them. I've added this IT, and removed the skip on the YAML test. I'd hear an argument for removing the YAML test completely, but it still covers some validation on the batched_reduce_size query parameter so I'm leaning towards keeping it.