ESQL: Union Types Support by craigtaverner · Pull Request #107545 · elastic/elasticsearch

craigtaverner · 2024-04-16T17:23:18Z

If the query sources multiple indexes, and the same field exists in multiple indexes with different types, this would normally fail the query. However, if the query includes a conversion function to resolve the field to a single type before it is used in other functions or aggregations, then this should work.

The following query works in this third prototype:

FROM sample_data* METADATA _index
| EVAL client_ip = TO_IP(client_ip)
| KEEP _index, @timestamp, client_ip, event_duration, message
| SORT _index ASC, @timestamp DESC

The client_ip field is an IP in the sample_data index, but a keyword in the sample_data_str index.

The first prototype did stuff to the drivers to create an index specific DriverContext to use during field evaluator construction so that the conversion function would be index/type aware. However, that abuses the idea of multi-threaded drivers. So the second prototype took a new approach to instead re-plan the logical plan to extract the converter from the EVAL expressions, setting them as resolved (claiming the input type is already the converted type), and stored the converter in the EsRelation for later use in Physical planning. This third prototype takes this further by replacing the conversion function with a new FieldAttribute. ANd both old and new FieldAttributes exist in parallel, so that the logic around handling unsupported fields is not changed.

Fixes #100603

Tasks to do:

elasticsearchmachine · 2024-04-16T17:24:04Z

Hi @craigtaverner, I've created a changelog YAML for you.

elasticsearchmachine · 2024-04-17T09:51:41Z

Hi @craigtaverner, I've created a changelog YAML for you.

costin

Left a round of comments.

costin · 2024-04-19T23:21:33Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

This rule tries to rollback something that either shouldn't have occurred in the first place or that is ignored by the verifier.
If there's no AbstractConvertFunction, MultiTypeFields can considered unsupported in projections and unsupported inside expressions (see EsqlProject and the Verifier).

This rule does something that used to be done by the ResolveRefs code. We have a catch-22 situation where ResolveUnionTypes depends on ResolveRefs to run first, but also relies on InvalidMappedField (or currently MultiTypeEsField.UnresolvedField) to not be converted into an UnresolvedAttribute (which ResolveRefs also does). So that conversion needed to be removed from ResolveRefs and moved after ResolveUnionTypes. This is that new location. This code also did something much more complex before, also editing the contents of EsRelation and the contained EsIndex, but that was getting very messy, and only needed to handle regression tests around unsupported types. I had three choices:

Continue to make UnresolveUnionTypes more complex (I got all but one regression test passing, so just needed to add nested resolution of Object fields)

Find a way to get MultiTypeEsField.UnresolvedField to pass through Plan serialization (this is the route I took, but it looks hacky, and I see your comment on that already). This allowed me to reduce this class a lot (you see the simplified version here).

Merge MultiTypeEsField.UnresolvedField with InvalidMappedField so we avoid the plan serialisation hack. This is my new preferred approach, and I see you suggest this approach too in another comment.

But in none of these cases does this class completely disappear. Somewhere we need to recognise InvalidMappedField as an unresolved type. That used to happen in ResolveRefs, and now happens here. You are suggesting an alternative approach/location? Verifying EsqlProject? I can investigate that on Monday. I've looked through the Verifier a few times, so I can imagine possibilities there.

Thanks - it's a minor point but you could incorporate the rule as part of the UnionType rules in a second pass.
That is convert all InvalidMappedFields that were are not wrapped in a conversion function to the unresolved attribute.
Again minor.

costin · 2024-04-19T23:23:05Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java

This might be causing more problems than it tries to fix. Allow AbstractConvertFunction to work on InvalidMappedField and replace them with MultiType/ConvertedEsField.

Agreed. It did actually fix a bunch of things, because UnresolveUnionTypes was becoming quite complex in order to get regressions to pass, and putting this here made it much simpler. However, if I instead combine the classes InvalidMappedField and MultiTypeEsField.UnresolvedField into one class, that same simplification occurs, without this hack. I noticed a lot of failures in CI that I could not reproduce locally, but this particular hack feels like a likely source of the failures.

Also, in case it was not clear, the existence of this in the plan was only when there is no AbstractConvertFunction, so the remaining InvalidMappedField needs to be serialised in the plan, to support existing behaviour around unsupported types.

costin · 2024-04-20T05:02:26Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

Why not use InvalidMappedField directly since that is already picked up by the analyzer/verifier.

Since InvalidMappedField was in QL, I did not want to touch it, but I'm been coming round more and more to the idea that I really should. I think it will make things much simpler, and avoid the awkward PlanNamedTypes hack.

This conversation has come full circle now, due to the recent reviews asking to not edit QL directly. After the above comment, I switched to editing QL directly. There were two classes where we edited QL FieldAttribute where the equals and hashcode were updated to include the underlying field, so query plan re-writing would work (could be seen as an oversight in QL code, so perhaps even a bug-fix), and the above mentioned InvalidMappedField. An attempt to completely extricate these two from QL failed (dependencies increased scope dramatically, turning a moderate PR into a monster). A second attempt at a compromise by making ESQL versions that maintained the simple name, and extended the QL versions worked for both classes, but was quite messy for FieldAttribute. After discussing with the team, we decided to keep the ESQL port of InvalidMappedField, but revert to the direct edit of the FieldAttribute. The thinking here is the up-coming split of QL and ESQL would be slightly facilitated by not having any edits in InvalidMappedField, while the edit to FieldAttribute we would probably want to keep anyway.

costin

Left a round of comments.

elasticsearchmachine · 2024-05-02T17:17:31Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

craigtaverner · 2024-05-02T17:18:14Z

After some recent refinements, and the addition of the unit tests in ValuesSourceReaderTypeConversionTests, I think it is time for some reviews!

luigidellaquila

Hi Craig, I did a couple of tests, but I think I need some guidance to understand which types/functions this PR covers, because I couldn't manage to make it work...

I did the following:

created two indexes, test1 and test2 with only one @timestamp field of type long and date respectively.
tried a few queries with empty indexes:
- from test* | eval x = to_long(@timestamp) | keep x: all good
- from test* | eval x = to_string(@timestamp) | keep x: all good
- from test* | eval x = to_date(@timestamp) | keep x: verification_exception, Found 1 problem\nline 1:31: Cannot use field [@timestamp] due to ambiguities being mapped as [2] incompatible types: [date] in [test2], [long] in [test1]

Then I added one record per index

{"index": {"_index":"test1"}}
{"@timestamp":10000000}
{"index": {"_index":"test2"}}
{"@timestamp":"2022-05-06T12:01:00.000Z"}

and tried the same queries again:

from test* | eval x = to_long(@timestamp) | keep x: null_pointer_exception, Cannot invoke \"Object.hashCode()\" because \"pk\" is null
from test* | eval x = to_string(@timestamp) | keep x: same as above
from test* | eval x = to_date(@timestamp) | keep x: verification_exception. Found 1 problem\nline 1:31: Cannot use field [@timestamp] due to ambiguities being mapped as [2] incompatible types: [date] in [test2], [long] in [test1]
from test* | eval x = to_ip(@timestamp) | keep x: esql_illegal_argument_exception, illegal data type [long] (suppressed null_pointer_exception as above)

I also tried IP vs long and I got some results with to_string() but not with other functions.

So a couple of questions:

does this cover all the types? I didn't find any type-specific code in the PR, so I assumed it was the case.
is it supposed to handle partially invalid conversions? Eg. if I have an IP and a date and I'm only interested in the date, can I do it?

[edit] it should be to_datetime() (to_date() does not exist); probably the validation order makes it trip on the incompatible types before it realizes that the function name is wrong, it makes the message a bit confusing, but probably it's a minor problem.
The above queries, with to_datetime() return the same error as to_long()

craigtaverner · 2024-05-03T15:10:50Z

The issues with @timestamp have been fixed in #6fb0622dc43070aa3c71c425d405ac273bf43d45, and more tests added in that and other commits to cover this case. The related issue regarding exactly which error message to return can be dealt with later, perhaps in this PR, or perhaps in another.

nik9000 · 2024-05-03T16:25:30Z

I feel bad for the github user @timestamp. We ping them so much.

nik9000

I left a few comments. I suggested reworking how row-by-row loading is modeled with the BlockBuilders inside of the value loaders themselves. This feels like it's much cleaner to think about. I think we can make it compatible with a shared loader if we need to. But for now the copying seems fine because this is never the hottest path.

nik9000 · 2024-05-06T13:03:13Z

.../esql/compute/src/main/java/org/elasticsearch/compute/lucene/ValuesSourceReaderOperator.java

nik9000 · 2024-05-06T13:03:51Z

.../esql/compute/src/main/java/org/elasticsearch/compute/lucene/ValuesSourceReaderOperator.java

nik9000 · 2024-05-06T13:06:46Z

.../esql/compute/src/main/java/org/elasticsearch/compute/lucene/ValuesSourceReaderOperator.java

Could we keep the BlockBuilder and converter inside the block loader itself? That'd rework the row-stride block loader somewhat, but I think that'd make this more readable.

I remember originally trying the combine these, but seem to remember it causing hassles with the ColumnAtATimeReader. This was before I really got the row-stride reader working, so perhaps it is time for another attempt.

I'd be ok delaying it to a followup too.

I realise a key difference between what you suggested and what I tried (and was thinking of trying again). You said to put the builder and converter inside the loader. Instead I had tried to put the converter inside the builder. This is because we will always have both a builder and loader (the builder is made by the loader once additional information is known using code like field.loader.builder(loaderBlockFactory, docs.count())), so merging the loader and builder seems unworkable. Right now the converter is the only thing at play. It is currently implemented by the loader, since it does not need any of the information in the builder. But this does lead to lines like this:

Block build() { return (Block) loader.convert(builder.build()); }

If I put the converter into the builder, we can hide it inside the build() method call, simplifying things slightly. That was what I was trying before, and could try again.

nik9000 · 2024-05-06T13:51:19Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types-IT_tests_only.csv-spec

I'd like to replace the IT_tests_only suffix with a check against a feature similar to how we do version testing. We're already going to add features for BWC skipping, we can add a skip in the hand-rolled CSV testing infrastructure too. I did it for one example here: #108313

Would you be ok doing the same on this one?

OK. I've done this now, will push soon.

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

nik9000 · 2024-05-06T13:57:23Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

Could you include the evaluator too?

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

nik9000 · 2024-05-06T14:00:40Z

x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/expression/FieldAttribute.java

Modifying this one scares me a bit.

Instead of modifying this, we should create a copy in the esql project.

This may involve some yak shaving, but I'm happy to help with that!

costin

I have minor comments around the logistics and code dependency - such as moving the QL classes like FieldAttribute, EsRelation and InvalidMapperField to ESQL.
An alternative would be to have some workaround code in ESQL that adds/maintains that information outside of QL (even though it's ugly) and then get rid of that during the QL migration to not postpone this PR any longer.

costin · 2024-05-07T01:43:48Z

...ute/src/test/java/org/elasticsearch/compute/lucene/ValueSourceReaderTypeConversionTests.java

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

costin · 2024-05-07T01:55:00Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlFeatures.java

Unrelated to this PR - we need to come up with a naming strategy for the features so we can reason about when they were added without having to open this file which is already too long.
I propose YY_MM.esql.feature_name pattern with the date prefix used in the variable name as well:

NodeFeature 24_05_UNION_TYPES = new NodeFeature("24_05.esql.union_types")

This sounds like a solution to a problem we suspect we might get in future, but do not (yet) suffer from, the issue of the creation of too many NodeFeatures. I think there are many solutions to that problem, like regular removal of older NodeFeature instances and references to them, since they should only really matter for BWC and rollover of recent releases. Or some other solution... let's not make it part of this PR (or any feature PR).

I'm with @craigtaverner on this. I'd prefer not to adopt a new pattern here. I'm not really sure what the pattern should be either. Or if we need one.

costin · 2024-05-07T01:56:31Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlFeatures.java

And add the features here in reverse order, so the latest features are at the top (that's because in time, the early features become wildly spread).

That's pretty reasonable.

Maybe just stick this one on top and leave a comment saying add new ones on top.

This is just for those reading - it's an unordered Set - but we humans do read it in order.

So far it seems everyone, including me, has be appending to the end. So we're talking about reversing the order of the complete list? I can do that, although I don't fully understand the reason for this. The term 'wildly spread' does not clarify things for me.

costin · 2024-05-07T01:59:03Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

Nit: move these methods at the bottom since their non-essential.

costin · 2024-05-07T02:02:21Z

x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/type/InvalidMappedField.java

Clone this to ESQL and add the modifications there.

OK. I'll discuss this with @alex-spies and do it together with the FieldAttribute changes, since both are really part of the 'split from QL' project.

Done for InvalidMappedField, but we decided to keep the QL changes in FieldAttribute. Moving that was a much bigger job, and it was argued that these changes are an improvement anyway.

alex-spies · 2024-05-08T10:03:27Z

x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/plan/logical/EsRelation.java

EsRelation was already ported to esql by Nhat - I think this fix should be applied to esql's copy of that, not here.

Curiously, in the QL version, it used attrs in the hashCode, but not in the equals method. My change adds it to equals. Nhat's version removed it from hashCode. I'm curious if his change fixed something else, that I will now break.

Turns out the changes I made were no longer necessary, and only needed for an earlier version of union-types. I kept just one clarifying comment.

costin

I think there's room for improvement however this PR took long enough and github doesn't like this commit history.
I'd wait for Andrei's comments, create a separate issue with feedback, merge this in and address the remaining issues in a separate PR.

Thanks for your patience Craig.

Note that one test, `multiIndexIpStringStatsInline` is muted due to failing with the error: UnresolvedException: Invalid call to dataType on an unresolved object ?client_ip

astefan

Thank you for adding tests. I like them.
I've left few very minor comments and one that is more serious.
I think there is value in the PR and it should be merged, but it needs follow-ups.

LGTM.

astefan · 2024-06-11T12:57:59Z

...ute/src/test/java/org/elasticsearch/compute/lucene/ValueSourceReaderTypeConversionTests.java

Leftover or follow up item? If it's the latter it would help having a gh issue created.

astefan · 2024-06-11T12:58:50Z

...ute/src/test/java/org/elasticsearch/compute/lucene/ValueSourceReaderTypeConversionTests.java

Same here about a followup issue.

astefan · 2024-06-11T13:24:38Z

...ute/src/test/java/org/elasticsearch/compute/lucene/ValueSourceReaderTypeConversionTests.java

This method and the next 5-6 are borrowed from OperatorTestCase. Can you reuse them instead? (this test class is already big and it would help to have a more manageable size)

astefan · 2024-06-12T09:14:22Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

If I understand this well, you remove these attributes assuming they are only used in the "alias function" by which you mean an eval x = field::type. I don't think eval is the only place where you can place such a conversion function.

STATS c=count(*) BY client_ip::ip

FROM sample_data, sample_data_str | SORT client_ip::ip

Both of these have issues. After you merge this PR, I will create follow-ups for these two.

luigidellaquila

LGTM

The integration tests do not fail the tests if the capability does not even exist on cluster nodes, instead the tests are ignored. The same behaviour should happen with CsvTests for consistency.

This way we don't have to add more features to the test framework in this PR, but we would probably want a mute feature (like a `skip` line).

Since the sub-fields are AbstractConvertFunction expressions, and Expression is not yet fully supported as a category class for NamedWritable, we need a few slight tweaks to this, notably registering this explicitly in the EsqlPlugin, as well as calling PlanStreamInput.readExpression() instead of StreamInput.readNamedWritable(Expression.class). These can be removed later once Expression is fully supported as a category class.

We used required_capability to mute the tests, but this caused issues with CsvTests which also uses this as a spelling mistake checker for typing the capability name wrong, so we tried to use muted-tests.yml, but that only mutes tests in specific run configurations (ie. we need to mute each and every IT class separately). So now we just remove the tests entirely. We left a comment in the muted-tests.yml file for future reference about how to mute csv-spec tests.

nik9000 · 2024-06-18T14:34:15Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

+            in.readString(),
+            DataType.readFrom(in),
+            in.readBoolean(),
+            in.readImmutableMap(StreamInput::readString, i -> ((PlanStreamInput) i).readExpression())


I tend to just do readMap( i -> ((PlanStreamInput) i).readExpression())

That's a little more convenient and I just trust that we don't mutate the map.

Though I should probably do:

readImutableMap(i -> ((PlanStreamInput) i).readExpression())

nik9000

@craigtaverner told me that the concurrent serialization tests are taking a surprising amount of time on this PR so I'm going to check that.

nik9000 · 2024-06-18T14:37:37Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

+        out.writeString(getName());
+        out.writeString(getDataType().typeName());
+        out.writeBoolean(isAggregatable());
+        out.writeMap(getIndexToConversionExpressions(), (o, v) -> out.writeNamedWriteable(v));


👍
This what we need. It lines up perfectly with writeNamed from the old stuff.

Actually, I don't believe this is going to work yet. Expression isn't yet fully properly implemented as a NamedWriteable. You'd be better off doing

out.writeMap(getIndexToConversionExpressions(), (o, v) -> ((PlanStreamOutput) o).writeExpression(v))

I'm in the process of converting all remaining Expression subclasses to proper NamedWriteables. they don't all properly implement it yet.

And yet it does work. At least my multi-node integration tests pass, and they rely on this serialisation succeeding (both for write and read). I assumed it was because the on-the-wire serialization was the same in both cases.

It'll only work if we've implemented NamedWriteable for all the subclasses you want to serialize. I'm doing that with stuff like #109892 but not all subclasses are done. I suppose if you only allow certain stuff in there it'll be fine. I dunno.

nik9000 · 2024-06-18T14:38:31Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

+            in.readString(),
+            DataType.readFrom(in),
+            in.readBoolean(),
+            in.readImmutableMap(StreamInput::readString, i -> ((PlanStreamInput) i).readExpression())


That's a little more convenient and I just trust that we don't mutate the map.

Though I should probably do:

readImutableMap(i -> ((PlanStreamInput) i).readExpression())

nik9000 · 2024-06-18T14:39:15Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

+    }
+
+    public MultiTypeEsField(StreamInput in) throws IOException {
+        // TODO: Change the conversion expression serialization to i.readNamedWriteable(Expression.class) once Expression is fully supported


That TODO will get done before too long. I'll remove readExpression entirely.

nik9000 · 2024-06-18T14:43:41Z

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/esql/161_union_types_nested.yml

@@ -0,0 +1,203 @@
+setup:


The word "nested" means the nested type to me. Could you rename this file to something like "sub_fields" or something? I forgot what we call those sub field things, but nested and object mean something specific to me.

Recreating the config on every test was very expensive.

craigtaverner · 2024-06-19T14:57:27Z

@elasticmachine update branch

bpintea

Leaving some comments that could be considered for later.

bpintea · 2024-06-19T13:21:29Z

...lugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/InvalidMappedField.java

        return new Exact(false, "Field [" + getName() + "] is invalid, cannot access it");
    }
+
+    public Map<String, Set<String>> getTypesToIndices() {


Nit: just to preserve the getters style.

Suggested change

public Map<String, Set<String>> getTypesToIndices() {

public Map<String, Set<String>> typesToIndices() {

bpintea · 2024-06-19T13:22:15Z

...lugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/InvalidMappedField.java

    }

+    /**
+     * Constructor supporting union types, used in ES|QL.


Nit: since it's no longer in QL.

Suggested change

* Constructor supporting union types, used in ES|QL.

* Constructor supporting union types.

bpintea · 2024-06-19T13:23:34Z

...lugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/InvalidMappedField.java

 /**
 * Representation of field mapped differently across indices.
 * Used during mapping discovery only.
+ * Note that the field <code>typesToIndices</code> is not serialized because that information is


Wondering if there's an actual valid case when such an object does get serialised (outside tests, that is).

There really should not be. This class, InvalidMappedField is used in the Analyzer to produce either UnsupportedField, or MultiTypeEsField (the non-union-types and the union-types), long before any serialization. The typesToIndices is used to produce the think that those two fields need, either and error message, or the union-types converters (which are serialized as part of MultiTypeEsField).

What would make sense in future, once the port to NamedWritable is complete, would be for this class to throw an exception!

bpintea · 2024-06-19T13:52:26Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

+            return plan;
+        }
+
+        private Expression resolveConvertFunction(AbstractConvertFunction convert, List<FieldAttribute> unionFieldAttributes) {


All the private methods in this class can be made static.

bpintea · 2024-06-19T14:03:21Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/IndexResolver.java

                    return unsupported(name, fc);
                }
-                typesToIndices.computeIfAbsent(type.esType(), _key -> new TreeSet<>()).add(ir.getIndexName());
+                typesToIndices.computeIfAbsent(type.typeName(), _key -> new TreeSet<>()).add(ir.getIndexName());


Why not use the type as key? That'll make the repeated type resolution in ResolveUnionTypes#resolveConvertFunction() redundant.
Edit: and #resolvedMultiTypeEsField().

Yeah, I think I just minimized the differences to the original code, but your idea makes sense.

bpintea · 2024-06-19T14:13:02Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

+                        TypeResolutionKey key = new TypeResolutionKey(fa.name(), type);
+                        var concreteConvert = typeSpecificConvert(convert, fa.source(), type, imf);
+                        typeResolutions.put(key, concreteConvert);
+                    }


Shouldn't the the alternative branch to this if return convert; already? And the check below skipped? If in one index the field is mapped as a type that the conversion function doesn't support, there shouldn't be any MultiTypeEsField created anyways (and the analysis would converge sooner).

bpintea · 2024-06-19T14:16:35Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

+            return MultiTypeEsField.resolveFrom(imf, typesToConversionExpressions);
+        }
+
+        private Expression typeSpecificConvert(AbstractConvertFunction convert, Source source, DataType type, InvalidMappedField mtf) {


Suggested change

private Expression typeSpecificConvert(AbstractConvertFunction convert, Source source, DataType type, InvalidMappedField mtf) {

private Expression typeSpecificConvert(AbstractConvertFunction convert, Source source, DataType type, InvalidMappedField imf) {

bpintea

And a last batch of (post-merge) notes to potentially be later considered.
Very nice.

bpintea · 2024-06-19T15:26:41Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

+            imf.getTypesToIndices().forEach((typeName, indexNames) -> {
+                DataType type = DataType.fromTypeName(typeName);
+                TypeResolutionKey key = new TypeResolutionKey(fa.name(), type);
+                if (typeResolutions.containsKey(key)) {


Can there be a case this check is false?

Yes, if there are not any convert functions for the types found in the index mappings. Perhaps we have more than one convert function needed, but only one exists in the query. I'm pretty sure I have integration tests (yml tests) that cover this case.

bpintea · 2024-06-19T15:27:18Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

+        }
+
+        private MultiTypeEsField resolvedMultiTypeEsField(FieldAttribute fa, HashMap<TypeResolutionKey, Expression> typeResolutions) {
+            Map<String, Expression> typesToConversionExpressions = new HashMap<>();


Suggested change

Map<String, Expression> typesToConversionExpressions = new HashMap<>();

Map<String, Expression> typesToConversionExpressions = new HashMap<>(typeResolutions.size());

bpintea · 2024-06-19T15:39:07Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

+
+/**
+ * During IndexResolution it could occur that the same field is mapped to different types in different indices.
+ * The class MultiTypeEfField.UnresolvedField holds that information and allows for later resolution of the field


UnresolvedField doesn't exist (anymore?).

Good point, this comment reflects an earlier version of the class. That is now merged into InvalidMappedField, and not its own class, since we moved QL to ESQL.CORE:

bpintea · 2024-06-19T15:46:59Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

+        return ENTRY.name;
+    }
+
+    public Map<String, Expression> getIndexToConversionExpressions() {


Nit: same style comment as in previous batch about the getter, which we usually have w/o the get prefix when it matches the field name. But not sure if it's that widespread.

bpintea · 2024-06-19T15:56:58Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/MultiTypeEsField.java

+        Map<String, Expression> typesToConversionExpressions
+    ) {
+        Map<String, Set<String>> typesToIndices = invalidMappedField.getTypesToIndices();
+        DataType resolvedDataType = DataType.UNSUPPORTED;


Nit:

Suggested change

DataType resolvedDataType = DataType.UNSUPPORTED;

DataType resolvedDataType = null;

I guess init'ing to DataType.UNSUPPORTED is safe and maybe caught later down the path (if caused by defect), but it should be an error if resolvedDataType isn't updated.

craigtaverner added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Apr 16, 2024

elasticsearchmachine added the v8.14.0 label Apr 16, 2024

craigtaverner mentioned this pull request Apr 16, 2024

ESQL: Union Types Support (take2) #107255

Closed

14 tasks

craigtaverner force-pushed the union_types_take3 branch from d740c88 to e76988c Compare April 17, 2024 08:53

craigtaverner force-pushed the union_types_take3 branch from 13c4186 to 20582c4 Compare April 17, 2024 14:23

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

costin reviewed Apr 20, 2024

View reviewed changes

craigtaverner force-pushed the union_types_take3 branch 3 times, most recently from 4125129 to cc66455 Compare April 24, 2024 16:36

craigtaverner changed the title ~~ESQL: Union Types Support (take3)~~ ESQL: Union Types Support May 2, 2024

craigtaverner marked this pull request as ready for review May 2, 2024 17:17

luigidellaquila reviewed May 3, 2024

View reviewed changes

craigtaverner force-pushed the union_types_take3 branch from 30047bf to 78824e7 Compare May 3, 2024 15:08

nik9000 requested changes May 6, 2024

View reviewed changes

craigtaverner force-pushed the union_types_take3 branch from e2c21ff to 13cc584 Compare May 6, 2024 17:27

costin reviewed May 7, 2024

View reviewed changes

costin requested a review from astefan May 7, 2024 02:04

alex-spies reviewed May 8, 2024

View reviewed changes

craigtaverner force-pushed the union_types_take3 branch from b1c6dc7 to 35b0fc9 Compare May 10, 2024 08:34

costin approved these changes Jun 12, 2024

View reviewed changes

craigtaverner removed request for a team June 12, 2024 08:25

Added more tests from code review

221b8fc

Note that one test, `multiIndexIpStringStatsInline` is muted due to failing with the error: UnresolvedException: Invalid call to dataType on an unresolved object ?client_ip

astefan approved these changes Jun 12, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into union_types_take3

abf5648

luigidellaquila approved these changes Jun 12, 2024

View reviewed changes

craigtaverner added 5 commits June 12, 2024 13:16

Make CsvTests consistent with integration tests for capabilities

7793b54

The integration tests do not fail the tests if the capability does not even exist on cluster nodes, instead the tests are ignored. The same behaviour should happen with CsvTests for consistency.

Return assumeThat to assertThat, but change order

118f9fc

This way we don't have to add more features to the test framework in this PR, but we would probably want a mute feature (like a `skip` line).

Merge remote-tracking branch 'origin/main' into union_types_take3

dfa28e4

craigtaverner mentioned this pull request Jun 18, 2024

ESQL Union-types missing support for multi-type rename and keep/drop #109842

Closed

nik9000 reviewed Jun 18, 2024

View reviewed changes

nik9000 approved these changes Jun 18, 2024

View reviewed changes

craigtaverner added 2 commits June 18, 2024 17:12

Merge remote-tracking branch 'origin/main' into union_types_take3

8548e45

Fix rather massive issue with performance of testConcurrentSerialization

d4af8bc

Recreating the config on every test was very expensive.

astefan mentioned this pull request Jun 19, 2024

ESQL: sorting on union types creates columns with the same name #109916

Closed

Code review by Nik

bd08a43

craigtaverner force-pushed the union_types_take3 branch from 60d194c to bd08a43 Compare June 19, 2024 12:43

astefan mentioned this pull request Jun 19, 2024

ESQL: union type used in aggregation grouping fails #109922

Closed

Merge branch 'main' into union_types_take3

b2ed400

bpintea reviewed Jun 19, 2024

View reviewed changes

craigtaverner merged commit d1e3c0a into elastic:main Jun 19, 2024

bpintea reviewed Jun 19, 2024

View reviewed changes

craigtaverner mentioned this pull request Jun 26, 2024

Union types documentation #110183

Merged

alex-spies mentioned this pull request Jul 31, 2024

ESQL: Fix for overzealous validation in case of invalid mapped fields #111475

Merged

	public Map<String, Set<String>> getTypesToIndices() {
	public Map<String, Set<String>> typesToIndices() {

	* Constructor supporting union types, used in ES\|QL.
	* Constructor supporting union types.

	private Expression typeSpecificConvert(AbstractConvertFunction convert, Source source, DataType type, InvalidMappedField mtf) {
	private Expression typeSpecificConvert(AbstractConvertFunction convert, Source source, DataType type, InvalidMappedField imf) {

	Map<String, Expression> typesToConversionExpressions = new HashMap<>();
	Map<String, Expression> typesToConversionExpressions = new HashMap<>(typeResolutions.size());

	DataType resolvedDataType = DataType.UNSUPPORTED;
	DataType resolvedDataType = null;

Conversation

craigtaverner commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 16, 2024

Uh oh!

elasticsearchmachine commented Apr 17, 2024

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigtaverner Apr 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented May 2, 2024

Uh oh!

craigtaverner commented May 2, 2024

Uh oh!

luigidellaquila left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigtaverner commented May 3, 2024

Uh oh!

nik9000 commented May 3, 2024

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

craigtaverner commented Apr 16, 2024 •

edited

Loading

craigtaverner Apr 20, 2024 •

edited

Loading

luigidellaquila left a comment •

edited

Loading

craigtaverner May 15, 2024 •

edited

Loading