Skip to content

DISTINCT doesn't work with jsonb_build_object #46709

@Bessonov

Description

@Bessonov

Describe the problem

Goal:
We have a relation from an user to postal addresses and phones. The goal is to get whole net for the user (1 or <=100) as json. We distinct between projection and filtering. If we filter by phone (filtering) or want to show phones (projection), left join clause is "automatically" added.

Because of join and projection we have a problem, if user has for example 2 phones and 1 postal address. We get the same postal two times. It's possible to get rid of this problem with postgres, but I don't found a solution with cockroachdb. See reproduction.

To Reproduce

Data snippet for postgres:

CREATE TABLE profile (id UUID NOT NULL, name TEXT NOT NULL, CONSTRAINT "primary1" PRIMARY KEY (id));
CREATE TABLE phone (id UUID NOT NULL, number TEXT NOT NULL, userid UUID NOT NULL, CONSTRAINT "primary2" PRIMARY KEY (id), CONSTRAINT phone_userid_foreign FOREIGN KEY (userid) REFERENCES profile(id));
CREATE TABLE postal (id UUID NOT NULL, streetname TEXT NOT NULL, userid UUID NOT NULL, CONSTRAINT "primary3" PRIMARY KEY (id), CONSTRAINT postal_userid_foreign FOREIGN KEY (userid) REFERENCES profile(id));

INSERT INTO profile VALUES ('00000000-0000-0000-0000-000000000000', 'Tester');
INSERT INTO phone VALUES ('11111111-1111-1111-1111-111111111111', '11111', '00000000-0000-0000-0000-000000000000');
INSERT INTO phone VALUES ('22222222-2222-2222-2222-222222222222', '22222', '00000000-0000-0000-0000-000000000000');
INSERT INTO postal VALUES ('33333333-3333-3333-3333-333333333333', '33333', '00000000-0000-0000-0000-000000000000');

Data snippet for crdb:

CREATE TABLE profile (id UUID NOT NULL, name STRING NOT NULL, CONSTRAINT "primary" PRIMARY KEY (id ASC));
CREATE TABLE phone (id UUID NOT NULL, number STRING NOT NULL, userid UUID NOT NULL, CONSTRAINT "primary" PRIMARY KEY (id ASC), CONSTRAINT phone_userid_foreign FOREIGN KEY (userid) REFERENCES profile(id));
CREATE TABLE postal (id UUID NOT NULL, streetname STRING NOT NULL, userid UUID NOT NULL, CONSTRAINT "primary" PRIMARY KEY (id ASC), CONSTRAINT postal_userid_foreign FOREIGN KEY (userid) REFERENCES profile(id));

INSERT INTO profile VALUES ('00000000-0000-0000-0000-000000000000', 'Tester');
INSERT INTO phone VALUES ('11111111-1111-1111-1111-111111111111', '11111', '00000000-0000-0000-0000-000000000000');
INSERT INTO phone VALUES ('22222222-2222-2222-2222-222222222222', '22222', '00000000-0000-0000-0000-000000000000');
INSERT INTO postal VALUES ('33333333-3333-3333-3333-333333333333', '33333', '00000000-0000-0000-0000-000000000000');

This query shows the problem on both dbs (filtering omitted, because not relevant for the output):

select "profile"."id", "profile"."name", COUNT(*) OVER() as "count", 
CASE count("postal".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(jsonb_build_object('id', "postal"."id",'userid', "postal"."userid",'streetname', "postal"."streetname"))
END as "contactpostal", 
CASE count("phone".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(jsonb_build_object('id', "phone"."id",'userid', "phone"."userid",'number', "phone"."number"))
END as "contactphone"
from "profile"
left join "postal" on "profile"."id" = "postal"."userid" -- and clause for postal's filtering/permissions
left join "phone" on "profile"."id" = "phone"."userid" -- and clause for phone's filtering/permissions
-- clause for profile's filtering/permissions
group by "profile"."id", "profile"."name";

Result:

id            | 00000000-0000-0000-0000-000000000000
name          | Tester
count         | 1
contactpostal | [{"id": "33333333-3333-3333-3333-333333333333", "streetname": "33333", "userid": "00000000-0000-0000-0000-000000000000"}, {"id": "33333333-3333-3333-3333-333333333333", "streetname": "33333", "userid": "00000000-0000-0000-0000-000000000000"}]
contactphone  | [{"id": "11111111-1111-1111-1111-111111111111", "number": "11111", "userid": "00000000-0000-0000-0000-000000000000"}, {"id": "22222222-2222-2222-2222-222222222222", "number": "22222", "userid": "00000000-0000-0000-0000-000000000000"}]

The closest workaround/solution is to use DISTINCT before jsonb_build_object, and it works on postgres (see sqlfiddle to play with it):

select "profile"."id", "profile"."name", COUNT(*) OVER() as "count", 
CASE count("postal".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(DISTINCT jsonb_build_object('id', "postal"."id",'userid', "postal"."userid",'streetname', "postal"."streetname"))
END as "contactpostal", 
CASE count("phone".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(DISTINCT jsonb_build_object('id', "phone"."id",'userid', "phone"."userid",'number', "phone"."number"))
END as "contactphone"
from "profile"
left join "postal" on "profile"."id" = "postal"."userid" -- and clause for postal's filtering/permissions
left join "phone" on "profile"."id" = "phone"."userid" -- and clause for phone's filtering/permissions
-- clause for profile's filtering/permissions
group by "profile"."id", "profile"."name";

but not on crdb:

ERROR: unable to encode table key: *tree.DJSON

I've seen that something similar was fixed by @rohany , but even after upgrade from 19.2.0 to unstable:

version | CockroachDB CCL v20.1.0-beta.3 (x86_64-unknown-linux-gnu, built 2020/03/23 17:54:39, go1.13.5)

nothing changes.

Expected behavior

I'm not sure if there is an efficient way to workaround it. It can be achieved if instead of usage of postal/phone in the select, a subselect is fired:

select distinct "profile"."id", "profile"."name", COUNT(*) OVER() as "count",
(
    select CASE count("postal".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(json_build_object('id', "postal"."id",'userid', "postal"."userid",'streetname', "postal"."streetname"))
        END as "contactpostal" FROM "postal" WHERE "postal"."userid" = "profile"."id" -- and clause for postal's filtering/permissions
),
(
    select CASE count("phone".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(json_build_object('id', "phone"."id",'userid', "phone"."userid",'number', "phone"."number"))
        END as "contactphone" FROM "phone" WHERE "phone"."userid" = "profile"."id"  -- and clause for phone's filtering/permissions
)
from "profile"
left join "postal" on "profile"."id" = "postal"."userid"  -- and clause for postals's filtering/permissions
left join "phone" on "profile"."id" = "phone"."userid"  -- and clause for phone's filtering/permissions
group by "profile"."id", "profile"."name", "phone"."id";

It doesn't feel right and breaks count:

id            | 00000000-0000-0000-0000-000000000000
name          | Tester
count         | 2
contactpostal | [{"id": "33333333-3333-3333-3333-333333333333", "streetname": "33333", "userid": "00000000-0000-0000-0000-000000000000"}]
contactphone  | [{"id": "11111111-1111-1111-1111-111111111111", "number": "11111", "userid": "00000000-0000-0000-0000-000000000000"}, {"id": "22222222-2222-2222-2222-222222222222", "number": "22222", "userid": "00000000-0000-0000-0000-000000000000"}]

May be there is another solution. But it would be nice to have a little bit more alignment to postgres.

Additional data / screenshots

Environment:

  • CockroachDB version 19.2.0, v20.1.0-beta.3
  • Server OS: official and unstable docker image
  • Client app: cockroach sql

Thanks!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions