Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index#24978
Conversation
…hild relation within documents of the same index This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping it uses an internal field to materialize parent/child relation within a single index. This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields. The compatibility with `has_parent`, `has_child` queries and `children` agg will be added in a follow up. Relates elastic#20257
martijnvg
left a comment
There was a problem hiding this comment.
Nice 🎉! I left some comments.
| * A field mapper used internally by the {@link ParentJoinFieldMapper} to index | ||
| * the value that link documents in the index (parent _id or _id if the document is a parent). | ||
| */ | ||
| public final class ParentIDFieldMapper extends FieldMapper { |
There was a problem hiding this comment.
maybe rename to ParentIdFieldMapper? That is more consistent with IdFieldMapper name.
| FIELD_TYPE.setTokenized(false); | ||
| FIELD_TYPE.setOmitNorms(true); | ||
| FIELD_TYPE.setHasDocValues(true); | ||
| FIELD_TYPE.setIndexOptions(IndexOptions.DOCS); |
There was a problem hiding this comment.
ParentFieldMapper sets this to IndexOptions.NONE. I wonder if we should that too here?
Upside of adding an indexed field is that somone doesn't need to use the parent_id query, but on the other hand it does increase the index size and I'm not sure how often one would search by this field. With _parent field the field was made a non indexed field with the idea in mind that only a few would ever use _parent field for plain querying.
There was a problem hiding this comment.
My main concern is inner_hits where we use a DocValueQuery to find the children of a parent. I agree that a plain parent_id query is an edge case but for inner_hits it is quite a common case so could be worth to index by default ?
| } | ||
| } | ||
|
|
||
| final ArrayList<ParentIDFieldMapper> newParentIDFields = new ArrayList<>(); |
There was a problem hiding this comment.
nit: s/final ArrayList/final List
| builder.addParent(parent, children); | ||
| iterator.remove(); | ||
| } | ||
| return builder; |
There was a problem hiding this comment.
I think additionally we should exposen eager_global_ordinals setting here to like in ParentFieldMapper.java?
There was a problem hiding this comment.
Right, I can do that in a follow up. This PR is big enough ;)
There was a problem hiding this comment.
+1 then we shouldn't forget about it :)
| * A field mapper used internally by the {@link ParentJoinFieldMapper} to index | ||
| * the value that link documents in the index (parent _id or _id if the document is a parent). | ||
| */ | ||
| public final class ParentIDFieldMapper extends FieldMapper { |
There was a problem hiding this comment.
I wonder if this class can not extend from FieldMapper and be a regular class? Initially this confused me a bit.
This class just needs to have parent and children fields. Also for ParentJoinFieldMapper#iterator() it can have a field mapper field (I think KeywordFieldMapper will suffice as type). Then the ParentJoinFieldMapper#parse(...) can just iterate over parentIDFields and add the required join doc values fields.
There was a problem hiding this comment.
I started with this approach but this field is special and we'll need to add some helpers to handle has_child and has_parent rewriting so I opted for a full field mapper impl.
Also I think it's good to have a dedicated field type. For instance the keyword field uses sorted_set for the doc_values but for this field we want to enforce a single value for each document.
| @Override | ||
| public Mapper parse(ParseContext context) throws IOException { | ||
| // Only one join field per document | ||
| checkDuplicateJoinFields(context.doc()); |
There was a problem hiding this comment.
Maybe we should only allow one join field definition per index? (not sure what the best way to validate this)
Because all parent/child relations can be expressed in a single join field.
Also as is now one is free to add a join field mapper to an index, which I think is problematic when documents have already been indexed.
There was a problem hiding this comment.
Would be nice to restrict to a single join field per index but there is no clean way to do it yet.
We'll need to add some logic in the mapper service to allow this.
There was a problem hiding this comment.
yes, lets do this in a follow up change.
|
retest this please |
|
Thanks @martijnvg |
…hild relation within documents of the same index (elastic#24978) * Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping it uses an internal field to materialize parent/child relation within a single index. This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields. The compatibility with `has_parent`, `has_child` queries and `children` agg will be added in a follow up. Relates elastic#20257
This is a full backport of the typeless parent child feature (parent-join) introduced in master. It includes: * Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index (#24978) * Disallow multiple parent-join fields per mapping (#25002) * Change `has_child`, `has_parent` queries and `childen` aggregation to work with the new join field type and at the same time maintaining support for the `_parent` meta field type. * Move parent_id query to the parent-join module (#25072) * Changed inner_hits to work with the new join field type and at the same time maintaining support for the `_parent` meta field type Relates #20257
This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping
it uses an internal field to materialize parent/child relation within a single index.
This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields.
The compatibility with
has_parent,has_childqueries andchildrenagg will be added in a follow up.Relates #20257