Fix ShardSplittingQuery to respect nested documents.#27398
Fix ShardSplittingQuery to respect nested documents.#27398s1monw merged 4 commits intoelastic:masterfrom
ShardSplittingQuery to respect nested documents.#27398Conversation
Today if nested docs are used in an index that is split the operation will only work correctly if the index is not routing partitioned or unless routing is used. This change fixes the query that selectes the docs to delete to also select all parents nested docs as well. Closes elastic#27378
|
@elasticmachine test this please |
jimczi
left a comment
There was a problem hiding this comment.
I left some minor comments regarding naming and caching, otherwise LGTM
| }; | ||
| } | ||
|
|
||
| private void markChildDocs(BitSet parentDocs, BitSet deletedDocs) { |
There was a problem hiding this comment.
Why deletedDocs ? It's the opposite, no ? The bitset of the matching parent+children, allDocs ?
There was a problem hiding this comment.
naming issue, I will fix. In theory it will hold all docs that need to be deleted by the IndexWriter.
| /* | ||
| * this is used internally to obtain a bitset for parent documents. We don't cache this since we never access the same reader more | ||
| * than once | ||
| */ |
There was a problem hiding this comment.
We warm this query per segment in BitsetFilterCache#BitSetProducerWarmer so why not using the per-segment cache directly ?
There was a problem hiding this comment.
we use this only as a delete by query which is executed on a recovery-private index writer. There is no point in cacheing it and it won't have a cache hit either.
There was a problem hiding this comment.
Ok I missed the recovery-private thing. Thanks
| .numberOfShards(numShards) | ||
| .setRoutingNumShards(numShards * 1000000) | ||
| .numberOfReplicas(0).build(); | ||
| boolean hasNested = true || randomBoolean(); |
There was a problem hiding this comment.
s/true || randomBoolean();/randomBoolean();
| .numberOfShards(numShards) | ||
| .setRoutingNumShards(numShards * 1000000) | ||
| .numberOfReplicas(0).build(); | ||
| boolean hasNested = true;randomBoolean(); |
| findSplitDocs(IdFieldMapper.NAME, includeInShard, leafReader, bitSet::set); | ||
| } else { | ||
| final BitSet parentBitSet; | ||
| final IntPredicate includeDoc; |
There was a problem hiding this comment.
So just to double check, the includeDoc is to ensure that only root docs get selected and later we select the nested docs of the selected root docs in markChildDocs(...), right?
| * this is used internally to obtain a bitset for parent documents. We don't cache this since we never access the same reader more | ||
| * than once | ||
| */ | ||
| private static BitSetProducer newParentDocBitSetProducer() { |
| } | ||
|
|
||
| @Override | ||
| public boolean matches() throws IOException { |
There was a problem hiding this comment.
I was trying to understand why this works, because of the forward iteration here (with nested, we usually seek backwards (BitSet.prevSetBit(...))). So this works because all live doc ids (root docs and nested docs) are evaluated in order.
There was a problem hiding this comment.
correct, I will leave a comment
|
@jimczi @martijnvg I pushed new commits if you wanna take another look |
Today if nested docs are used in an index that is split the operation will only work correctly if the index is not routing partitioned or unless routing is used. This change fixes the query that selectes the docs to delete to also select all parents nested docs as well. Closes #27378
* master: Stop skipping REST test after backport of #27056 Fix default value of ignore_unavailable for snapshot REST API (#27056) Add composite aggregator (#26800) Fix `ShardSplittingQuery` to respect nested documents. (#27398) [Docs] Restore section about multi-level parent/child relation in parent-join (#27392) Add TcpChannel to unify Transport implementations (#27132) Add note on plugin distributions in plugins folder Remove implementations of `TransportChannel` (#27388) Update Google SDK to version 1.23 (#27381) Fix Gradle 4.3.1 compatibility for logging (#27382) [Test] Change Elasticsearch startup timeout to 120s in packaging tests Docs/windows installer (#27369)
Today if nested docs are used in an index that is split the operation
will only work correctly if the index is not routing partitioned or
unless routing is used. This change fixes the query that selectes the docs
to delete to also select all parents nested docs as well.
Closes #27378