Skip to content

Histogram field data type#139457

Merged
not-napoleon merged 26 commits intoelastic:mainfrom
not-napoleon:histogram-field-data-type
Dec 15, 2025
Merged

Histogram field data type#139457
not-napoleon merged 26 commits intoelastic:mainfrom
not-napoleon:histogram-field-data-type

Conversation

@not-napoleon
Copy link
Copy Markdown
Member

Add minimal support for a histogram field data type. This adds the type as "under construction" but not behind a feature flag; as with the previous PR, I don't see any need for that additional layer. As is tradition with new data types, most of this PR is test support.

@not-napoleon not-napoleon added >non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL v9.3.0 labels Dec 12, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested one thing. The other "big" types don't have it, but should as well. I just hadn't noticed before.

throw new IllegalArgumentException("Expected START_OBJECT but found: " + parser.currentToken());
}
parser.nextToken();
// TODO: This is striaght up copied from HistgramParser. There are even fewer good places to put that for resue than
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling

boolean exponentialHistogramFieldSupported,
boolean tDigestFieldSupported
boolean tDigestFieldSupported,
boolean histogramFieldSupported
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for who knows when: this is too many booleans for one method. It needs something kinder to read.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I can try to get to it during the holiday lull, but no promises. I've already got a lot on my list for that, and I'm only working a couple of days.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this check here for the classic histogram field?
I had to add these checks for exponential_histogram because mixed-cluster / bwc tests would create clusters without exponential_histogram support and would fail on test-setup when trying to ingest the test-data.
I'd assume that the histogram ES field type (!= ESQL type) is old enough to be supported in all versions we test with?

Whether we actually support it as an ES|QL type will be guarded via the capabilities and doesn't need this check.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be the case, but I couldn't prove it to my satisfaction in the time I had to work on this, so I added this check defensively. My opinion, we need to rethink how we're doing this in general. It's not sustainable for us to have to keep touching this, and it's functionally adding another poorly supported proxy for a version. I think ideally, this would tie to a transport version or index version or something like that.

for (int i = 0; i < values.size(); i++) {
long count = counts.get(i);
assert count >= 0;
// we do not add elements with count == 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just "compression"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, this is bad copy pasta. There's no reason we should ever generate a zero in the random test data here, and I'll just update it to not do that. I believe the original code I copied this from, in the field mapper, does skip empty buckets as a space optimization, but I don't 100% remember the reasoning there.

case NULL -> EvalOperator.CONSTANT_NULL_FACTORY;
case UNSUPPORTED, SHORT, BYTE, DATE_PERIOD, OBJECT, DOC_DATA_TYPE, SOURCE, TIME_DURATION, FLOAT, HALF_FLOAT, TSID_DATA_TYPE,
SCALED_FLOAT, AGGREGATE_METRIC_DOUBLE, TDIGEST, DENSE_VECTOR -> throw new UnsupportedOperationException(
SCALED_FLOAT, AGGREGATE_METRIC_DOUBLE, TDIGEST, HISTOGRAM, DENSE_VECTOR -> throw new UnsupportedOperationException(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get to stuff like this before long.

I wonder if there's a good way of "querying" the code base for the list of outstanding "stuff" for new data types. Like this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything like that would need a way to differentiate between "real" types (like Histogram) and types that should not exist by this point in the plan, like Half Float.

That said, I intend to implement this for Histogram (and probably T-Digest, unless @JonasKunz gets to it first) soon. Just, not yet.


public static String histogramToString(BytesRef histogram) {
// TODO: reuse the nearly identical code from HistogramFieldMapper
try (XContentBuilder builder = JsonXContent.contentBuilder()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you kindly throw an IllegalArgumentException from this if the histogram is more than, like, 2mb worth of buckets? And add IllegalArgumentException to the list of warnExceptions in the @ConvertEvaluator in ToString. Just out of paranoia.

@not-napoleon not-napoleon enabled auto-merge (squash) December 12, 2025 19:34
@not-napoleon not-napoleon enabled auto-merge (squash) December 15, 2025 16:13
 Conflicts:
	server/src/main/resources/transport/upper_bounds/9.3.csv
	x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/EsqlSpecTestCase.java
	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java
@not-napoleon not-napoleon merged commit 27feccd into elastic:main Dec 15, 2025
35 checks passed
parkertimmins pushed a commit to parkertimmins/elasticsearch that referenced this pull request Dec 17, 2025
Add minimal support for a histogram field data type. This adds the type as "under construction" but not behind a feature flag; as with the previous PR, I don't see any need for that additional layer. As is tradition with new data types, most of this PR is test support.
---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants