Implement synthetic source support for range fields#107081
Implement synthetic source support for range fields#107081lkts merged 28 commits intoelastic:mainfrom
Conversation
This PR adds basic synthetic source support for range fields. There are following notable properties of synthetic source produced: * Ranges are always normalized to be inclusive on both ends (this is how they are stored). * Original order of ranges is not preserved. * Date ranges are always expressed in epoch millis, format is not preserved. * IP ranges are always expressed as a range of IPs while it could have been originally provided as a CIDR. This PR only implements retrieval of data for source reconstruction from doc values.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @lkts, I've created a changelog YAML for you. |
|
@elasticmachine update branch |
|
This is ready for review. |
| "lte": 299 | ||
| } | ||
| } | ||
| ---- |
There was a problem hiding this comment.
Maybe we should add this to the list of limitations for synthetic source? docs/reference/mapping/fields/synthetic-source.asciidoc
There was a problem hiding this comment.
This is specific to ranges and i followed existing docs that have specific examples on type page (for example ip).
There was a problem hiding this comment.
I am open to moving it though.
There was a problem hiding this comment.
What I mean is that in the docs about synthetic source, where we provide limitations we should mention also this one, not necessarily including an example but just explaining it so that users have it together with all other limitations.
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/range/20_synthetic_source.yml
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/IpRangeFieldMapperTests.java
Outdated
Show resolved
Hide resolved
|
Thanks for your PR, this looks great...very comprehensive test coverage. I left a few comments. |
|
@elasticmachine update branch |
| configuration. Synthetic `_source` cannot be used together with | ||
| <<copy-to,`copy_to`>> or with <<doc-values,`doc_values`>> disabled. | ||
|
|
||
| Synthetic source always sorts values and removes duplicates for all `range` fields except `ip_range` . Ranges are sorted by their lower bound and then by upper bound. For example: |
There was a problem hiding this comment.
Maybe move these documented limitations to a paragraph in synthetic-source.asciidoc (at the end of the file)? I think all the limitations are mentioned there.
There was a problem hiding this comment.
Or maybe summarize the limitations in synthetic-source.asciidoc and keep the extensive examples here?
There was a problem hiding this comment.
We have such examples that are specific to a field type for other fields like ip, that's why i added it here.
There was a problem hiding this comment.
I think we already have a general statement that order is not preserved in synthetic-source.
server/src/main/java/org/elasticsearch/index/mapper/BinaryDocValuesSyntheticFieldLoader.java
Show resolved
Hide resolved
| ip_range: { "gte": "::", "lte": "10.10.10.10" } | ||
| - match: | ||
| hits.hits.7._source: | ||
| ip_range: { "gte": "10.10.10.10", "lte": "ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" } |
There was a problem hiding this comment.
This mix of IPv6 and IPv4 is pretty unfortunate but i don't see a good way to address that. We could look at other end of range and try to format it in the same way but it's pretty brittle in my opinion.
There was a problem hiding this comment.
Actually our docs say that we support mixed ranges so maybe this is okay?
There was a problem hiding this comment.
I think when it come to this the semantic is something like:
- apply gte to all IPv4 fields as if you only had the gte defined
- apply the lte to all IPv6 fields as if you only had the lte defined
It is like saying for IPv4 fields the range is [10.10.10.10, null] while for IPv6 fields the range is [null, ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff]. Does it make sense?
There was a problem hiding this comment.
Yes that would be ideal, let me see if i can do that.
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
I left a couple of comments...everything else looks good. |
|
@elasticmachine update branch |
This PR adds basic synthetic source support for range fields. There are
following notable properties of synthetic source produced:
they are stored).
have been originally provided as a CIDR.
This PR only implements retrieval of data for source reconstruction from
doc values similar to fields already supporting synthetic source.
Contributes to #106460.