Skip to content

Abstract RowInTable logic#108696

Merged
nik9000 merged 5 commits intoelastic:mainfrom
nik9000:block_lookup_class
May 17, 2024
Merged

Abstract RowInTable logic#108696
nik9000 merged 5 commits intoelastic:mainfrom
nik9000:block_lookup_class

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented May 15, 2024

This moves the logic for finding the offset in a table that we will use in LOOKUP from a method on BlockHash and some complex building logic in HashLookupOperator. Now it's in an RowInTable interface - both a static builder method and some implementations.

There are three implementations:

  1. One that talks to BlockHash just like HashLookupOperator used to. Right now it talks to PackedValuesBlockHash because it's the only one who's lookup method returns the offset in the original row, but we'll fix it eventually.
  2. A RowInTable that works with increasing sequences of integers, say, 1, 2, 3, 4, 5 - this is fairly simple - it just checks that the input is between 1 and 5 and, if it is, subtracts 1. Easy. Obvious. And very very fast. Simple. Good simple example.
  3. An RowInTable that handles empty tables - this just makes writing the rest of the code simpler. It always returns null.

This moves the logic for finding the offset in a table that we will use
in `LOOKUP` from a method on `BlockHash` and some complex building logic
in `HashLookupOperator`. Now it's in an `RowInTable` interface - both
a static builder method and some implementations.

There are three implementations:
1. One that talks to `BlockHash` just like `HashLookupOperator` used to.
   Right now it talks to `PackedValuesBlockHash` because it's the only
   one who's `lookup` method returns the offset in the original row, but
   we'll fix it eventually.
2. A `RowInTable` that works with increasing sequences of integers,
   say, `1, 2, 3, 4, 5` - this is fairly simple - it just checks that
   the input is between `1` and `5` and, if it is, subtracts `1`. Easy.
   Obvious. And very very fast. Simple. Good simple example.
3. An `RowInTable` that handles empty tables - this just makes
   writing the rest of the code simpler. It always returns `null`.
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 15, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Copy Markdown
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks Nik! I have some optional comments, but feel free to merge as is.

"keys must have the same number of positions but [" + positions + "] != [" + keys[k].getPositionCount() + "]"
);
}
for (int p = 0; p < keys[k].getPositionCount(); p++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a quick check with Block#mayHaveMultivaluedFields(), then double-check every position.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍. no need to check if it can't have it.

);
boolean success = false;
try {
final int[] lastOrd = new int[] { -1 };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe move lastOrd inside the AddInput and change it to an int?

}

private final List<String> keys;
private final RowInTable lookup;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this rowInTable or table instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ - old names didn't get changed.

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice abstraction!

* Consumes {@link Page}s and looks up each row in a pre-built table, and returns the
* offsets of each row in the table.
*/
public abstract sealed class RowInTable implements Releasable permits EmptyRowInTable, AscendingSequenceRowInTable, BlockHashRowInTable {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the name suggests this models a row (in a table), but it really represents looking up a row.

Suggested change
public abstract sealed class RowInTable implements Releasable permits EmptyRowInTable, AscendingSequenceRowInTable, BlockHashRowInTable {
public abstract sealed class RowInTableLookup implements Releasable permits EmptyRowInTable, AscendingSequenceRowInTable, BlockHashRowInTable {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's a better name!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed this thing like 3 times already.

Comment on lines +53 to +62
private IntVector lookupVector(IntVector vector) {
try (IntVector.Builder builder = blockFactory.newIntVectorFixedBuilder(vector.getPositionCount())) {
for (int i = 0; i < vector.getPositionCount(); i++) {
builder.appendInt(vector.getInt(i) - min);
}
return builder.build();
}
}

private IntBlock lookupBlock(IntVector vector) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: names are a bit confusing.

Suggested change
private IntVector lookupVector(IntVector vector) {
try (IntVector.Builder builder = blockFactory.newIntVectorFixedBuilder(vector.getPositionCount())) {
for (int i = 0; i < vector.getPositionCount(); i++) {
builder.appendInt(vector.getInt(i) - min);
}
return builder.build();
}
}
private IntBlock lookupBlock(IntVector vector) {
private IntVector lookupVectorInRange(IntVector vector) {
try (IntVector.Builder builder = blockFactory.newIntVectorFixedBuilder(vector.getPositionCount())) {
for (int i = 0; i < vector.getPositionCount(); i++) {
builder.appendInt(vector.getInt(i) - min);
}
return builder.build();
}
}
private IntBlock lookupVector(IntVector vector) {


@Override
public String toString() {
return "DirectLookup[" + min + "-" + max + "]";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the toString match the class name? That could be confusing during debugging.

Applies in general to the classes added in this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boo. yeah. Old tostring

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to the name of the class - like here I'll call it AscendingSequence . But, yeah, I'll double check them. It's what I get when I rename a bunch of stuff as I go.

Comment on lines +232 to +234
if (v != null) {
values.add(v);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why a null value for v doesn't translate into a null added to the builder - won't the builders get misaligned? Could it be that currently nulls don't occur in the keys?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me go poke the tests some more. null is valid key and should get mapped to whatever row has the null. And you can look it up. That's how aggs work because that's how postgresql and friends work. Let me double check it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I've got is actually correct, but it's quite tricky. Tricky in ways ways the certainly deserve a block comment. Adding one.

@nik9000 nik9000 merged commit dff3bd2 into elastic:main May 17, 2024
elasticsearchmachine pushed a commit that referenced this pull request Jun 7, 2024
This adds support for `LOOKUP`, a command that implements a sort of
inline `ENRICH`, using data that is passed in the request:

```
$ curl -uelastic:password -HContent-Type:application/json -XPOST \
    'localhost:9200/_query?error_trace&pretty&format=txt' \
-d'{
    "query": "ROW a=1::LONG | LOOKUP t ON a",
    "tables": {
        "t": {
            "a:long":     [    1,     4,     2],
            "v1:integer": [   10,    11,    12],
            "v2:keyword": ["cat", "dog", "wow"]
        }
    },
    "version": "2024.04.01"
}'
      v1       |      v2       |       a       
---------------+---------------+---------------
10             |cat            |1
```

This required these PRs: * #107624 * #107634 * #107701 * #107762 *
#107923 * #107894 * #107982 * #108012 * #108020 * #108169 * #108191 *
#108334 * #108482 * #108696 * #109040 * #109045

Closes #107306
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants