ESQL: Union Types Support (take2)#107255
Closed
craigtaverner wants to merge 38 commits intoelastic:mainfrom
Closed
ESQL: Union Types Support (take2)#107255craigtaverner wants to merge 38 commits intoelastic:mainfrom
craigtaverner wants to merge 38 commits intoelastic:mainfrom
Conversation
Collaborator
|
Hi @craigtaverner, I've created a changelog YAML for you. |
591a05e to
e7a635f
Compare
Landing elastic#107018 broke running ESQL unit tests in IntelliJ. It has *something* to do with turning on the stringtemplate plugin in the esql project but I don't really know what. After that PR we'd often get errors about trying to regenerate evaluators twice. I dunno. This fixes it. But I don't really know why. The way this fixes it is by making the `esql` project more like the `copmute` project. It makes sense that that would help - they both have the same code generation configuration. Anyway, the operative change is landing the generated files in the same place as the `compute` project. Thus all of the file moves. Again, I have no idea why this works. It's build black magic and I just shook it until it worked. Most of the credit goes to git-bisect for finding the commit that broke this.
Specifying `?master_timeout=-1` on an API which performs a cluster state update means that the cluster state update task will never time out while waiting in the pending tasks queue. However this parameter is also re-used in a few places where a timeout of `-1` means something else, typically to timeout immediately. This commit fixes those places so that `?master_timeout=-1` consistently means to wait forever.
👋 @dakrone mentioned baby-update for docs. Sometimes users believe they should point client-side (e.g. Logstash's Elasticsearch output `index`) to the bootstrapped index. This highlights they point ingest towards the alias as we're expecting.
This PR adds the ability to modify the failure store indices on a data stream using the modify data stream API. These options are available in the event that we need to pull indices out of a failure store or add them back to the failure store for any reason. The operations are done using the existing modify data stream actions with a new flag on the action body to denote if the action should be done on the failure stores or not.
`.` is not a wildcard character.... Closes elastic#106791
On some systems Java appears to return amd64 (even if not an amd processor), but on others it returns x86_64. This commit handles the latter case to correctly associate the arch with the appropriate platform dir.
* Change rerank result types, add named writables, new transportVersion * Add description for namedWriteables * fix RankedDocsResults toString --------- Co-authored-by: David Kyle <david.kyle@elastic.co>
This commit allows the cli access to sending SIGKILL to the underlying jvm process.
* string literal casting for scalar functions and arithmetic operations.
This adds a new SPI based `LoggingDataProvider` service that can be implemented in order to add new fields to the main JSON log
For now this just sends a random version; later, we will want to specify applicable versions in the csv tests themselves.
This is no longer needed in testRollupIndex, no test failures over a week. Fixes elastic#105437 Related to elastic#105485
We've run into heap dumps that had instances of this class consume tens and in one case more than a hundred MB of heap. It seems reasonable to use a thread-local for the `tmp` long array and trade the cost of looking up the thread-local for the memory savings and cycles saved for allocating and assigning instances.
The first prototype did stuff to the drivers to create an index specific DriverContext to use during field evaluator construction so that the conversion function would be index/type aware. However, that abuses the idea of multi-threaded drivers. So this new approach instead re-plans the logical plan to extract the converter from the EVAL expressions, setting them as resolved (claiming the input type is already the converted type), and stores the converter in the EsRelation for later use in Physical planning. The following query works in this second prototype: ``` multiIndexIpString FROM sample_data* METADATA _index | EVAL client_ip = TO_IP(client_ip) | KEEP _index, @timestamp, client_ip, event_duration, message | SORT _index ASC, @timestamp DESC ``` We no longer need an aggregation to force the conversion function onto the data node, as the 'real' conversion is now done at field extraction time using the converter function previously saved in the EsRelation and replanned into the EsQueryExec.
So the plan can be communicated across the network to data nodes.
The resolution is almost identical, but not only do we move it to the analyser, so failed resolutions will still generate errors, but instead of replacing all previous unresolved MultiTypeField with EsField, we replace them with resolved MultiTypeField, so we no longer need a special EsUnionTypesQueryExec for finding the union types from the FieldExractExec, because the attributes already in that object contain the necessary information for type resolution.
e7a635f to
a167790
Compare
When multiple conversions functions are required, for multiple fields with union types, we had a bug in the fields replacement code. These tests also assert on the correct error messages when type conversion does not occur. For now we use the same error message was had before union types was introduced.
This was for debugging only, and now we have much better negative test coverage in the yaml tests 140_union_types.yml
21 tasks
Contributor
Author
|
Replaced by #107545 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If the query sources multiple indexes, and the same field exists in multiple indexes with different types, this would normally fail the query. However, if the query includes a conversion function to resolve the field to a single type before it is used in other functions or aggregations, then this should work.
The following query works in this second prototype:
The client_ip field is an
IPin thesample_dataindex, but akeywordin thesample_data_strindex.The first prototype did stuff to the drivers to create an index specific DriverContext to use during field evaluator construction so that the conversion function would be index/type aware. However, that abuses the idea of multi-threaded drivers. So this new approach instead re-plans the logical plan to extract the converter from the EVAL expressions, setting them as resolved (claiming the input type is already the converted type), and stores the converter in the EsRelation for later use in Physical planning. We no longer need an aggregation to force the conversion function onto the data node, as the 'real' conversion is now done at field extraction time using the converter function previously saved in the EsRelation and replanned into the EsQueryExec.
Fixes #100603
Tasks to do: