Histogram field data type#139457
Conversation
Conflicts: server/src/main/resources/transport/upper_bounds/9.3.csv x-pack/plugin/analytics/src/test/java/org/elasticsearch/xpack/analytics/mapper/HistogramFieldBlockLoaderTests.java
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Conflicts: server/src/main/resources/transport/upper_bounds/9.3.csv
…ld-data-type' into histogram-field-data-type
nik9000
left a comment
There was a problem hiding this comment.
Requested one thing. The other "big" types don't have it, but should as well. I just hadn't noticed before.
| throw new IllegalArgumentException("Expected START_OBJECT but found: " + parser.currentToken()); | ||
| } | ||
| parser.nextToken(); | ||
| // TODO: This is striaght up copied from HistgramParser. There are even fewer good places to put that for resue than |
| boolean exponentialHistogramFieldSupported, | ||
| boolean tDigestFieldSupported | ||
| boolean tDigestFieldSupported, | ||
| boolean histogramFieldSupported |
There was a problem hiding this comment.
TODO for who knows when: this is too many booleans for one method. It needs something kinder to read.
There was a problem hiding this comment.
+1. I can try to get to it during the holiday lull, but no promises. I've already got a lot on my list for that, and I'm only working a couple of days.
There was a problem hiding this comment.
Do we really need this check here for the classic histogram field?
I had to add these checks for exponential_histogram because mixed-cluster / bwc tests would create clusters without exponential_histogram support and would fail on test-setup when trying to ingest the test-data.
I'd assume that the histogram ES field type (!= ESQL type) is old enough to be supported in all versions we test with?
Whether we actually support it as an ES|QL type will be guarded via the capabilities and doesn't need this check.
There was a problem hiding this comment.
That might be the case, but I couldn't prove it to my satisfaction in the time I had to work on this, so I added this check defensively. My opinion, we need to rethink how we're doing this in general. It's not sustainable for us to have to keep touching this, and it's functionally adding another poorly supported proxy for a version. I think ideally, this would tie to a transport version or index version or something like that.
| for (int i = 0; i < values.size(); i++) { | ||
| long count = counts.get(i); | ||
| assert count >= 0; | ||
| // we do not add elements with count == 0 |
There was a problem hiding this comment.
Honestly, this is bad copy pasta. There's no reason we should ever generate a zero in the random test data here, and I'll just update it to not do that. I believe the original code I copied this from, in the field mapper, does skip empty buckets as a space optimization, but I don't 100% remember the reasoning there.
| case NULL -> EvalOperator.CONSTANT_NULL_FACTORY; | ||
| case UNSUPPORTED, SHORT, BYTE, DATE_PERIOD, OBJECT, DOC_DATA_TYPE, SOURCE, TIME_DURATION, FLOAT, HALF_FLOAT, TSID_DATA_TYPE, | ||
| SCALED_FLOAT, AGGREGATE_METRIC_DOUBLE, TDIGEST, DENSE_VECTOR -> throw new UnsupportedOperationException( | ||
| SCALED_FLOAT, AGGREGATE_METRIC_DOUBLE, TDIGEST, HISTOGRAM, DENSE_VECTOR -> throw new UnsupportedOperationException( |
There was a problem hiding this comment.
We should get to stuff like this before long.
I wonder if there's a good way of "querying" the code base for the list of outstanding "stuff" for new data types. Like this.
There was a problem hiding this comment.
Anything like that would need a way to differentiate between "real" types (like Histogram) and types that should not exist by this point in the plan, like Half Float.
That said, I intend to implement this for Histogram (and probably T-Digest, unless @JonasKunz gets to it first) soon. Just, not yet.
|
|
||
| public static String histogramToString(BytesRef histogram) { | ||
| // TODO: reuse the nearly identical code from HistogramFieldMapper | ||
| try (XContentBuilder builder = JsonXContent.contentBuilder()) { |
There was a problem hiding this comment.
Would you kindly throw an IllegalArgumentException from this if the histogram is more than, like, 2mb worth of buckets? And add IllegalArgumentException to the list of warnExceptions in the @ConvertEvaluator in ToString. Just out of paranoia.
…ld-data-type' into histogram-field-data-type
…ld-data-type' into histogram-field-data-type
Conflicts: server/src/main/resources/transport/upper_bounds/9.3.csv x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/EsqlSpecTestCase.java x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java
Add minimal support for a histogram field data type. This adds the type as "under construction" but not behind a feature flag; as with the previous PR, I don't see any need for that additional layer. As is tradition with new data types, most of this PR is test support. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Add minimal support for a histogram field data type. This adds the type as "under construction" but not behind a feature flag; as with the previous PR, I don't see any need for that additional layer. As is tradition with new data types, most of this PR is test support.