-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Efficient read of subcolumns from tables. #14196
Description
Use case
Scenario 1:
User has column of type Tuple in table, example:
CREATE TABLE t (x Tuple(Int8, String)) ENGINE = MergeTree ...
If they write
SELECT x.2 FROM t
now both subcolumns of tuple will be read (we will read whole x column).
We want to read only required subcolumn.
Scenario 2 (done, #17310):
Users can create a table with Nested fields.
They already can query subcolumns:
SELECT nested.x, nested.y FROM table
But it is implemented in very special way.
And users cannot query all nested column as an array of tuples:
SELECT nested FROM table
Neither they can do arrayJoin on nested as a whole:
SELECT arrayJoin(nested) FROM table
We want to rectify the code around support for Nested fields.
Scenario 3 (done, #17310):
Allow to have Nested fields with arbitrary nesting:
CREATE TABLE t (x Nested(a Int8, b Nested(u String, v Float32))) ENGINE = MergeTree ...
SELECT x.a, x.b.u, x.b.v FROM t;
Scenario 4:
Allow to write nested.* expressions.
Scenario 5 (done, #24406):
When user is only using array length in query, we don't have to read array contents - we can only read subcolumn with length:
SELECT length(arr) FROM table
Can be rewritten to
SELECT arr.size0 FROM table
And size0 is the name of subcolumn that we already have on disk.
Scenario 6 (done, #17310):
We already support named tuples.
Allow to write x.name for named tuples. Now only the syntax tupleElement(x, 'name') is possible.
Scenario 7:
Support the case when the existence and the types of subcolumns are determined at query time, not at table creation time.
It can be represented by special data type for semistructured data.