sql: implement JSON{,B}_{OBJECT,}_AGG#48306
Conversation
|
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. Before a member of our team reviews your PR, I have a few suggestions for tidying it up for review:
I have added a few people who may be able to assist in reviewing:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
aa70b32 to
e5dc9c6
Compare
4fcbbf4 to
253e02b
Compare
jordanlewis
left a comment
There was a problem hiding this comment.
Nice work so far @C0rWin, thanks!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @C0rWin and @jordanlewis)
pkg/sql/logictest/testdata/logic_test/aggregate, line 2466 at r1 (raw file):
---- {"Alice": {"Facebook": "Alice_fb", "Instagram": "AliceInst", "Twitter": "@Alice"}} {"Bob": {"Facebook": "Bob_fb", "LinkedIn": "Bob The Builder"}}
Can you add some tests with the error cases you wrote in here? Checks for nulls, as an example. Also, please include some examples with JSON nulls, JSON arrays, and JSON numbers.
pkg/util/json/json.go, line 246 at r1 (raw file):
} // ObjectBuilderWithCounter builds JSON object which keeps track of total size
nit: this should be a complete sentence with a period at the end. How about:
ObjectBuilderWithCounter builds a JSON object a key/value pair at a time, keeping of the memory usage of the object.
pkg/util/json/json.go, line 252 at r1 (raw file):
} // NewObjectBuilderWithCounter creates and instantiate ObjectBuilder with memory counter
nit: add a period
pkg/util/json/json.go, line 266 at r1 (raw file):
func (b *ObjectBuilderWithCounter) Add(k string, v JSON) { b.ob.Add(k, v) // Size of added JSON + size of the map kay pair and length of the key
nit: "kay" should be key.
pkg/util/json/json_test.go, line 467 at r1 (raw file):
{ input: [][]interface{}{{"key1", []interface{}{1, 2, 3, 4}}}, expected: json(`{"key1": [1, 2, 3, 4]}`),
Please add some test cases with nulls.
Certainly will do, thanks. |
|
@jordanlewis thanks for review, will address your comments and update my PR. |
jordanlewis
left a comment
There was a problem hiding this comment.
Couple more comments, but this is nearly there, thank you Artem!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @C0rWin and @jordanlewis)
pkg/sql/logictest/testdata/logic_test/aggregate, line 2470 at r2 (raw file):
#empty result returned, since both key and value is nil, hence skipped statement ok select json_object_agg(null, null);
I think even this should return an error actually... any time the first arg is null you should return the error below.
pkg/sql/logictest/testdata/logic_test/aggregate, line 2528 at r2 (raw file):
{"Alice": {"Facebook": "Developer", "Google": "Full Stack", "IBM": "Research"}} {"Bob": {"Cockroach Labs": "DB Developer", "Google": "DevOps", "Twitter": "Frontend"}} {"Charlie": null}
Are there any tests for what happens when there's a duplicate object key? Please add tests for that, thanks!
pkg/sql/sem/builtins/aggregate_builtins.go, line 2685 at r2 (raw file):
// if only key datum equal to null, return error if datum == tree.DNull {
Bring this up above the first check, it should error if we ever have a null in the first position.
|
@jordanlewis thanks I will take care to address your comments |
I have some reservation with respect to duplicate keys, as I do not think this is defined or strictly articulated. While I understand that this has to be aligned with Postgres, it still not very well defined, I will add test to reflect current way to treat it. I actually was concerned about it, see here #48841. /cc @jordanlewis |
|
@jordanlewis the way CRDB currently treats JSON object formation, i.e. order of keys and order among duplicate keys depends on insertion order, hence not really deteministic for aggregate functions. (see: pkg/util/json/json.go:355) I think that JSON object creation has to be fixed in a way that will not depend on how keys/values are inserted, unless I'm missing something. PS. IMO sorting has to be done for keys, and then for values, rather than defer insertion order. |
yuzefovich
left a comment
There was a problem hiding this comment.
Thanks for working on this! I left a bunch of nits, but in general the code is almost ready to go.
I have some reservation with respect to duplicate keys, as I do not think this is defined or strictly articulated. While I understand that this has to be aligned with Postgres, it still not very well defined ...
Here is an excerpt from Postgres docs:
The aggregate functions array_agg, json_agg, jsonb_agg, json_object_agg, jsonb_object_agg,
string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully
different result values depending on the order of the input values. This ordering is unspecified by
default, but can be controlled by writing an ORDER BY clause within the aggregate call.
So when there is no ORDER BY clause in the aggregate function, we are free to choose any order. If we deviate from Postgres in such case, it's ok, meaning that we should not be concerned about this:
the way CRDB currently treats JSON object formation, i.e. order of keys and order among duplicate keys depends on insertion order, hence not really deteministic for aggregate functions.
In fact, in the logic tests you added we already deviate from Postgres, but it's acceptable.
On a related note, we should also add json_object_agg and jsonb_object_agg to isOrderingSensetive in optbuilder/groupby.go. This will allow us to support these new aggregates with ORDER BY clause (using window functions machinery behind the scenes), and we should add the tests for that.
Reviewed 1 of 11 files at r1, 2 of 7 files at r2, 8 of 8 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @C0rWin)
pkg/sql/logictest/testdata/logic_test/aggregate, line 2638 at r3 (raw file):
statement error pq: field name must not be null select json_object_agg(null, null);
super nit: unnecessary semicolon.
pkg/sql/opt/operator.go, line 306 at r3 (raw file):
ConstNotNullAggOp, CorrOp, FirstAggOp, JsonAggOp, JsonbAggOp, MaxOp, MinOp, SqrDiffOp, StdDevOp, StringAggOp, SumOp, SumIntOp, VarianceOp, XorAggOp, PercentileDiscOp, PercentileContOp, JsonObjectAggOp, JsonbObjectAggOp:
super nit: seems like it might be time to start a new line here and below.
pkg/sql/sem/builtins/aggregate_builtins.go, line 3051 at r3 (raw file):
} // Add accumulates the transformed json into the JSON array.
nit: probably s/JSON array/JSON object/.
pkg/sql/sem/builtins/aggregate_builtins.go, line 3059 at r3 (raw file):
} // if only key datum equal to null, return error
nit: if we have a comment that lives on a separate line, we usually start it with a capital letter and end it with a period.
pkg/util/json/json.go, line 246 at r3 (raw file):
} // ObjectBuilderWithCounter builds a JSON object a key/value pair at a time, keeping of the memory usage of the object.
nit: s/keeping of the/keeping the/ and we strive for lines no longer than 80 characters.
pkg/util/json/json.go, line 252 at r3 (raw file):
} // NewObjectBuilderWithCounter creates and instantiate ObjectBuilder with memory counter.
nit: s/instantiate/instantiates/.
pkg/util/json/json.go, line 263 at r3 (raw file):
// Add appends key value pair to the sequence and updates // amount of memory allocated for the overall keys and values
nit: missing period.
pkg/util/json/json.go, line 266 at r3 (raw file):
func (b *ObjectBuilderWithCounter) Add(k string, v JSON) { b.ob.Add(k, v) // Size of added JSON + size of the map key pair and length of the key
nit: something is off with "map key pair and length of the key", maybe "Size of added JSON + overhead of storing key/value pair + the size of the key"? (Also, a missing period.)
pkg/util/json/json.go, line 276 at r3 (raw file):
} // Size returns the size in bytes of the JSON Array the builder is going to build.
nit: probably s/Array/object/.
|
@yuzefovich thanks for your feedback, will address your comments and provide updates. |
|
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
6c8c49a to
80d306b
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
The code looks good to me, but seems like you accidentally deleted a bunch of logic test cases, we should keep those :)
Reviewed 8 of 8 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @C0rWin)
pkg/sql/logictest/testdata/logic_test/aggregate, line 2570 at r3 (raw file):
query TT SELECT min(NULL::STRING COLLATE en_us), max(NULL::STRING COLLATE en_us)
Seems like you accidentally deleted a bunch of test cases here.
Ups, this seems due to usage of ```TESTFLAGS='-rewrite=true'````, while I'm not really sure what am I doing wrong, but if I am trying to run specific subset of logic tests with this flag enabled it overwrites relevant part, but also wipes ones which was not executed. |
|
|
yep, thanks, realize that now, I have hit this several times in the past by was cautions enough to catch before pushed changes, unfortunately overlooked it this time. |
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @C0rWin and @jordanlewis)
pkg/sql/logictest/testdata/logic_test/aggregate, line 2528 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Are there any tests for what happens when there's a duplicate object key? Please add tests for that, thanks!
Hm, currently we deviate from Postgres in how we handle duplicate keys. Postgres seems to keep all of them whereas we keep only the last one.
PG:
yuzefovich=# CREATE TABLE profiles (
userid int not null,
property text not null,
value text not null
);
CREATE TABLE
yuzefovich=# INSERT INTO profiles VALUES (1, 'email', 'foo'), (1, 'email', 'bar');
INSERT 0 2
yuzefovich=# SELECT json_object_agg(property, value) FROM profiles GROUP BY userid ORDER BY userid;
json_object_agg
--------------------------------------
{ "email" : "foo", "email" : "bar" }
(1 row)
CRDB:
root@127.0.0.1:50547/defaultdb> SELECT json_object_agg(property, value) FROM profiles GROUP BY userid ORDER BY userid;
json_object_agg
--------------------
{"email": "bar"}
(1 row)
I believe it is an existing behavior of jsonObjectBuilder, and this PR simply adds another case for when we use that object. @jordanlewis I think it should be ok to merge this, what do you think?
@yuzefovich please take a look on @jordanlewis answer here #48841
|
|
Yes, @yuzefovich we don't try to emulate the |
yuzefovich
left a comment
There was a problem hiding this comment.
I left last couple of nits, and once those are addressed, I'll merge this.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @C0rWin and @jordanlewis)
pkg/sql/opt/optbuilder/groupby.go, line 828 at r5 (raw file):
return b.factory.ConstructPercentileCont(args[0], args[1]) case "json_object_agg": return b.factory.ConstructJsonbObjectAgg(args[0], args[1])
nit: the calls are mismatched with the respective cases.
pkg/sql/sem/builtins/aggregate_builtins.go, line 3059 at r5 (raw file):
} // If only key datum equal to null, return error.
nit: there is something slightly off with the comment, I'd probably replace it with something like "If the key datum is NULL, return an error."
pkg/sql/sem/builtins/aggregate_builtins.go, line 3081 at r5 (raw file):
} // Result returns an DJSON from the array of JSON.
nit: s/an DJSON/a DJSON/.
pkg/util/json/json.go, line 246 at r3 (raw file):
Previously, yuzefovich wrote…
nit:
s/keeping of the/keeping the/and we strive for lines no longer than 80 characters.
Ping.
This commit adds implementation of json{,b}_object_agg built in
function, by adding a new construction which augments constructions
and allows to keep track on memory usage. Links json{,b}_object_agg
built in function to enable support of queries which aggregates
name/value pairs into JSON object.
Signed-off-by: Artem Barger <bartem@il.ibm.com>
Release note (sql change): enable aggregate queries which collect
key/value pairs into JSON object.
@yuzefovich thanks, done. |
yuzefovich
left a comment
There was a problem hiding this comment.
Thanks for your contribution!
bors r=yuzefovich
Reviewed 3 of 3 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @C0rWin and @jordanlewis)
Build succeeded |
Fixes #19978
This PR adds built in aggregate function allowing to augment query result set into json object.
Adds implementation of json{,b}_object_agg built in function, by adding a new construction which augments constructions and allows to keep track on memory usage. Finally, links json{,b}_object_agg built in function to enable support of queries which aggregates name/value pairs into JSON object.
Release note (sql change): enable aggregate queries which collect key/value pairs into JSON object.