Skip to content

ESQL: Serialize Blocks in plan#108334

Merged
nik9000 merged 8 commits intoelastic:mainfrom
nik9000:esql_serialize_blocks_in_plan
May 7, 2024
Merged

ESQL: Serialize Blocks in plan#108334
nik9000 merged 8 commits intoelastic:mainfrom
nik9000:esql_serialize_blocks_in_plan

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented May 6, 2024

This adds support for serializing Blocks as part of the plan. Unlike serializing blocks in the normal stream, we attempt to prevent sending duplicate blocks. This is'll make it easier to serialize blocks in the plan without worrying about duplicates.

This adds support for serializing `Block`s as part of the plan. Unlike
serializing blocks in the normal stream, we attempt to prevent sending
duplicate blocks. This is'll make it easier to serialize blocks in the
plan without worrying about duplicates.
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 6, 2024
Copy link
Copy Markdown
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, but feel free to merge this. Thanks Nik!


@Override
public int readArraySize() throws IOException {
return super.readArraySize();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta make it public! I'll leave a comment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked it so I don't need to make this public.

*/
public final class PlanStreamOutput extends StreamOutput {

private final Map<Block, BytesReference> cachedBlocks = new IdentityHashMap<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about using Block as the key since we'll need to read all values in hashCode and equals. However, if you're okay with that, I'm fine too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an IdentityHashMap - just the object itself.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave a comment about how that's important. It's be expensive to use the Block as a regular key.

}
Block[] blocks = new Block[count];
for (int i = 0; i < blocks.length; i++) {
blocks[i] = in.readBlock();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we don't track memory here, but releasing blocks in case of exceptions would make our life easier in the future.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++


private static final Supplier<LongFunction<NameId>> DEFAULT_NAME_ID_FUNC = NameIdMapper::new;

private final Map<Integer, Block> cachedBlocks = new HashMap<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's decouple this from the stream, similar to NameIdMapper - this way we could support both streams that don't do any caching and followed by those who do.
How common is it that we'll have duplicate blocks? Comparing their content will take CPU and I wonder if being opportunistic about reusing them (when we know it's the same data) isn't an alternative for at least some cases.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the plan I think we're fairly likely to have duplicates - at least, that's how I want to build the serialization. Let the streaming deal with the duplicates.

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor comments.

Comment on lines +58 to 60
static LocalSupplier of(Block[] blocks) {
return new ImmediateLocalSupplier(blocks);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ This is nice

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 7, 2024
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented May 7, 2024

elasticsearch-ci/eql-correctness Pending

Test passed.....

@nik9000 nik9000 merged commit ef12e39 into elastic:main May 7, 2024
@nik9000 nik9000 deleted the esql_serialize_blocks_in_plan branch May 7, 2024 17:39
elasticsearchmachine pushed a commit that referenced this pull request Jun 7, 2024
This adds support for `LOOKUP`, a command that implements a sort of
inline `ENRICH`, using data that is passed in the request:

```
$ curl -uelastic:password -HContent-Type:application/json -XPOST \
    'localhost:9200/_query?error_trace&pretty&format=txt' \
-d'{
    "query": "ROW a=1::LONG | LOOKUP t ON a",
    "tables": {
        "t": {
            "a:long":     [    1,     4,     2],
            "v1:integer": [   10,    11,    12],
            "v2:keyword": ["cat", "dog", "wow"]
        }
    },
    "version": "2024.04.01"
}'
      v1       |      v2       |       a       
---------------+---------------+---------------
10             |cat            |1
```

This required these PRs: * #107624 * #107634 * #107701 * #107762 *
#107923 * #107894 * #107982 * #108012 * #108020 * #108169 * #108191 *
#108334 * #108482 * #108696 * #109040 * #109045

Closes #107306
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants