Skip to content

rowexec: disk spilling hash joiner doesn't use `(EncDatum).Fingerprint #48130

@rohany

Description

@rohany

Repro:

Run the following logictest:

# LogicTest: fakedist-disk

statement ok
CREATE TABLE t1 (x INT[]);
CREATE TABLE t2 (x INT[]);
INSERT INTO t1 VALUES (ARRAY[1, 2, NULL, 3]);
INSERT INTO t2 VALUES (ARRAY[1, 2, NULL, 3]);

query T
SELECT * FROM t1 INNER HASH JOIN t2 ON t1.x = t2.x
----
{1,2,NULL,3}

It will fail with an error that it is trying to encode the array as a table key. This is because the DiskRowContainer uses Encode to create a key for the rows it contains. It doesn't look like the solution is just to use Fingerprint however -- it assumes it can decode the hashed key back into the row that was placed in the container. For some types of fingerprint (JSON (and soon not to be arrays)) this is not true.

Jira issue: CRDB-4354

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-sql-queriesSQL Queries Team

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions