ESQL: Union Types Support (take2) by craigtaverner · Pull Request #107255 · elastic/elasticsearch

craigtaverner · 2024-04-09T10:18:05Z

If the query sources multiple indexes, and the same field exists in multiple indexes with different types, this would normally fail the query. However, if the query includes a conversion function to resolve the field to a single type before it is used in other functions or aggregations, then this should work.

The following query works in this second prototype:

FROM sample_data* METADATA _index
| EVAL client_ip = TO_IP(client_ip)
| KEEP _index, @timestamp, client_ip, event_duration, message
| SORT _index ASC, @timestamp DESC

The client_ip field is an IP in the sample_data index, but a keyword in the sample_data_str index.

The first prototype did stuff to the drivers to create an index specific DriverContext to use during field evaluator construction so that the conversion function would be index/type aware. However, that abuses the idea of multi-threaded drivers. So this new approach instead re-plans the logical plan to extract the converter from the EVAL expressions, setting them as resolved (claiming the input type is already the converted type), and stores the converter in the EsRelation for later use in Physical planning. We no longer need an aggregation to force the conversion function onto the data node, as the 'real' conversion is now done at field extraction time using the converter function previously saved in the EsRelation and replanned into the EsQueryExec.

Fixes #100603

Tasks to do:

elasticsearchmachine · 2024-04-09T10:18:49Z

Hi @craigtaverner, I've created a changelog YAML for you.

Landing elastic#107018 broke running ESQL unit tests in IntelliJ. It has *something* to do with turning on the stringtemplate plugin in the esql project but I don't really know what. After that PR we'd often get errors about trying to regenerate evaluators twice. I dunno. This fixes it. But I don't really know why. The way this fixes it is by making the `esql` project more like the `copmute` project. It makes sense that that would help - they both have the same code generation configuration. Anyway, the operative change is landing the generated files in the same place as the `compute` project. Thus all of the file moves. Again, I have no idea why this works. It's build black magic and I just shook it until it worked. Most of the credit goes to git-bisect for finding the commit that broke this.

Specifying `?master_timeout=-1` on an API which performs a cluster state update means that the cluster state update task will never time out while waiting in the pending tasks queue. However this parameter is also re-used in a few places where a timeout of `-1` means something else, typically to timeout immediately. This commit fixes those places so that `?master_timeout=-1` consistently means to wait forever.

@dakrone

👋 @dakrone mentioned baby-update for docs. Sometimes users believe they should point client-side (e.g. Logstash's Elasticsearch output `index`) to the bootstrapped index. This highlights they point ingest towards the alias as we're expecting.

This PR adds the ability to modify the failure store indices on a data stream using the modify data stream API. These options are available in the event that we need to pull indices out of a failure store or add them back to the failure store for any reason. The operations are done using the existing modify data stream actions with a new flag on the action body to denote if the action should be done on the failure stores or not.

`.` is not a wildcard character.... Closes elastic#106791

On some systems Java appears to return amd64 (even if not an amd processor), but on others it returns x86_64. This commit handles the latter case to correctly associate the arch with the appropriate platform dir.

* Change rerank result types, add named writables, new transportVersion * Add description for namedWriteables * fix RankedDocsResults toString --------- Co-authored-by: David Kyle <david.kyle@elastic.co>

This commit allows the cli access to sending SIGKILL to the underlying jvm process.

* string literal casting for scalar functions and arithmetic operations.

This adds a new SPI based `LoggingDataProvider` service that can be implemented in order to add new fields to the main JSON log

…ry.* settings (elastic#104908)

…7322)

For now this just sends a random version; later, we will want to specify applicable versions in the csv tests themselves.

This is no longer needed in testRollupIndex, no test failures over a week. Fixes elastic#105437 Related to elastic#105485

…or (elastic#107253)

We've run into heap dumps that had instances of this class consume tens and in one case more than a hundred MB of heap. It seems reasonable to use a thread-local for the `tmp` long array and trade the cost of looking up the thread-local for the memory savings and cycles saved for allocating and assigning instances.

@timestamp

The first prototype did stuff to the drivers to create an index specific DriverContext to use during field evaluator construction so that the conversion function would be index/type aware. However, that abuses the idea of multi-threaded drivers. So this new approach instead re-plans the logical plan to extract the converter from the EVAL expressions, setting them as resolved (claiming the input type is already the converted type), and stores the converter in the EsRelation for later use in Physical planning. The following query works in this second prototype: ``` multiIndexIpString FROM sample_data* METADATA _index | EVAL client_ip = TO_IP(client_ip) | KEEP _index, @timestamp, client_ip, event_duration, message | SORT _index ASC, @timestamp DESC ``` We no longer need an aggregation to force the conversion function onto the data node, as the 'real' conversion is now done at field extraction time using the converter function previously saved in the EsRelation and replanned into the EsQueryExec.

So the plan can be communicated across the network to data nodes.

The resolution is almost identical, but not only do we move it to the analyser, so failed resolutions will still generate errors, but instead of replacing all previous unresolved MultiTypeField with EsField, we replace them with resolved MultiTypeField, so we no longer need a special EsUnionTypesQueryExec for finding the union types from the FieldExractExec, because the attributes already in that object contain the necessary information for type resolution.

When multiple conversions functions are required, for multiple fields with union types, we had a bug in the fields replacement code. These tests also assert on the correct error messages when type conversion does not occur. For now we use the same error message was had before union types was introduced.

This was for debugging only, and now we have much better negative test coverage in the yaml tests 140_union_types.yml

…n in EsRelation

craigtaverner · 2024-04-16T17:26:06Z

Replaced by #107545

craigtaverner added >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Apr 9, 2024

elasticsearchmachine added the v8.14.0 label Apr 9, 2024

craigtaverner force-pushed the union_types_take2 branch from 591a05e to e7a635f Compare April 9, 2024 17:03

craigtaverner mentioned this pull request Apr 9, 2024

ESQL: Union Types Support (take1) #106885

Closed

nik9000 and others added 23 commits April 10, 2024 10:59

this

ddac7ee

uStop it

9abec4c

Style

aae1319

Merge branch 'main' into esql_fix_tests_in_intellij

3b2a4a3

[DOCS] Fixes a typo in the HugggingFace tutorial. (elastic#107321)

60f5b71

[8.13.2] Update release notes with missing JDK downgrade PR

ee0f9ea

Don't escape . in ESQL wildcard tests (elastic#107283)

b947a08

`.` is not a wildcard character.... Closes elastic#106791

mute VectorSearchIT.testQuantizedVectorSearch (elastic#107336)

a7152ed

mute EsqlActionTaskIT.testTaskContentsForLimitQuery (elastic#107337)

0d1ee77

Handle x86_64 os.arch for native libraries (elastic#107289)

4d62546

On some systems Java appears to return amd64 (even if not an amd processor), but on others it returns x86_64. This commit handles the latter case to correctly associate the arch with the appropriate platform dir.

[ML] Rerank response format change (elastic#107288)

45e4775

* Change rerank result types, add named writables, new transportVersion * Add description for namedWriteables * fix RankedDocsResults toString --------- Co-authored-by: David Kyle <david.kyle@elastic.co>

Allow force stopping server process (elastic#107170)

34379e9

This commit allows the cli access to sending SIGKILL to the underlying jvm process.

[ES|QL] String literal implicit casting (elastic#106932)

3747d96

* string literal casting for scalar functions and arithmetic operations.

Allow additional JSON log fields via SPI (elastic#106980)

a7366df

This adds a new SPI based `LoggingDataProvider` service that can be implemented in order to add new fields to the main JSON log

Deprecate Telemetry / APM legacy settings in favor of the new telemet…

f7693a0

…ry.* settings (elastic#104908)

ES|QL: Fix adjustment of warnings for multi-cluster tests (elastic#10…

346a699

…7322)

AwaitsFix for elastic#107347

5863126

ESQL: Send version in spec tests (elastic#107268)

c8c7530

For now this just sends a random version; later, we will want to specify applicable versions in the csv tests themselves.

[TEST] Trace logging in testDownsampleTwiceSameInterval (elastic#107346)

54a9ece

This is no longer needed in testRollupIndex, no test failures over a week. Fixes elastic#105437 Related to elastic#105485

jedrazb and others added 8 commits April 11, 2024 18:20

[Connector API] Support cleaning up sync jobs when deleting a connect…

a983a1d

…or (elastic#107253)

Refactored to new MultiTypeeEsField which supports serialization

d13fa08

So the plan can be communicated across the network to data nodes.

Start enabling multiple indices in CsvTests

e5376fa

Start working on errors

7fe54b0

Review code

a167790

craigtaverner force-pushed the union_types_take2 branch from e7a635f to a167790 Compare April 11, 2024 17:17

craigtaverner added 7 commits April 11, 2024 19:28

Get resolution to work without Evals

07dfc30

Get verifier to produce unresolved error messages

c32e687

Disable union-types tests on older versions

367a4ee

Remove negative test

1a3a099

This was for debugging only, and now we have much better negative test coverage in the yaml tests 140_union_types.yml

Make sure rewriting away unresolved multitype fields works

d8f57f2

If we have unresolved attributes consider this an error condition eve…

5ece3a3

…n in EsRelation

craigtaverner mentioned this pull request Apr 16, 2024

ESQL: Union Types Support #107545

Merged

21 tasks

craigtaverner closed this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Union Types Support (take2)#107255

ESQL: Union Types Support (take2)#107255
craigtaverner wants to merge 38 commits intoelastic:mainfrom
craigtaverner:union_types_take2

craigtaverner commented Apr 9, 2024 •

edited

Loading

Uh oh!

elasticsearchmachine commented Apr 9, 2024

Uh oh!

craigtaverner commented Apr 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Conversation

craigtaverner commented Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 9, 2024

Uh oh!

craigtaverner commented Apr 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

craigtaverner commented Apr 9, 2024 •

edited

Loading