Skip to content

sql: implement JSON{,B}_{OBJECT,}_AGG#48306

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
C0rWin:json_obj_agg
Jun 16, 2020
Merged

sql: implement JSON{,B}_{OBJECT,}_AGG#48306
craig[bot] merged 1 commit intocockroachdb:masterfrom
C0rWin:json_obj_agg

Conversation

@C0rWin
Copy link
Copy Markdown
Contributor

@C0rWin C0rWin commented May 1, 2020

Fixes #19978

This PR adds built in aggregate function allowing to augment query result set into json object.

Adds implementation of json{,b}_object_agg built in function, by adding a new construction which augments constructions and allows to keep track on memory usage. Finally, links json{,b}_object_agg built in function to enable support of queries which aggregates name/value pairs into JSON object.

Release note (sql change): enable aggregate queries which collect key/value pairs into JSON object.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 1, 2020

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

Before a member of our team reviews your PR, I have a few suggestions for tidying it up for review:

  • We generally try and keep pull requests to one commit. Please squash your commits, and re-push with --force.
  • Please ensure your pull request description contains a release note - this can be the same as the one in your commit message.
  • When CI has completed, please ensure no errors have appeared.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the X-blathers-triaged blathers was able to find an owner label May 1, 2020
@blathers-crl blathers-crl bot requested a review from jordanlewis May 1, 2020 23:29
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@C0rWin C0rWin marked this pull request as draft May 2, 2020 00:00
@C0rWin C0rWin force-pushed the json_obj_agg branch 6 times, most recently from aa70b32 to e5dc9c6 Compare May 3, 2020 14:18
@C0rWin C0rWin marked this pull request as ready for review May 3, 2020 14:27
@C0rWin C0rWin force-pushed the json_obj_agg branch 2 times, most recently from 4fcbbf4 to 253e02b Compare May 7, 2020 21:56
Copy link
Copy Markdown
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far @C0rWin, thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @C0rWin and @jordanlewis)


pkg/sql/logictest/testdata/logic_test/aggregate, line 2466 at r1 (raw file):

----
{"Alice": {"Facebook": "Alice_fb", "Instagram": "AliceInst", "Twitter": "@Alice"}}
{"Bob": {"Facebook": "Bob_fb", "LinkedIn": "Bob The Builder"}}

Can you add some tests with the error cases you wrote in here? Checks for nulls, as an example. Also, please include some examples with JSON nulls, JSON arrays, and JSON numbers.


pkg/util/json/json.go, line 246 at r1 (raw file):

}

// ObjectBuilderWithCounter builds JSON object which keeps track of total size

nit: this should be a complete sentence with a period at the end. How about:

ObjectBuilderWithCounter builds a JSON object a key/value pair at a time, keeping of the memory usage of the object.


pkg/util/json/json.go, line 252 at r1 (raw file):

}

// NewObjectBuilderWithCounter creates and instantiate ObjectBuilder with memory counter

nit: add a period


pkg/util/json/json.go, line 266 at r1 (raw file):

func (b *ObjectBuilderWithCounter) Add(k string, v JSON) {
	b.ob.Add(k, v)
	// Size of added JSON + size of the map kay pair and length of the key

nit: "kay" should be key.


pkg/util/json/json_test.go, line 467 at r1 (raw file):

		{
			input:    [][]interface{}{{"key1", []interface{}{1, 2, 3, 4}}},
			expected: json(`{"key1": [1, 2, 3, 4]}`),

Please add some test cases with nulls.

@C0rWin C0rWin requested a review from a team as a code owner May 10, 2020 12:29
@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented May 10, 2020

Can you add some tests with the error cases you wrote in here? Checks for nulls, as an example. Also, please include some examples with JSON nulls, JSON arrays, and JSON numbers.

Certainly will do, thanks.

@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented May 10, 2020

@jordanlewis thanks for review, will address your comments and update my PR.

@C0rWin C0rWin requested review from jordanlewis and removed request for a team May 14, 2020 00:07
Copy link
Copy Markdown
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments, but this is nearly there, thank you Artem!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @C0rWin and @jordanlewis)


pkg/sql/logictest/testdata/logic_test/aggregate, line 2470 at r2 (raw file):

#empty result returned, since both key and value is nil, hence skipped
statement ok
select json_object_agg(null, null);

I think even this should return an error actually... any time the first arg is null you should return the error below.


pkg/sql/logictest/testdata/logic_test/aggregate, line 2528 at r2 (raw file):

{"Alice": {"Facebook": "Developer", "Google": "Full Stack", "IBM": "Research"}}
{"Bob": {"Cockroach Labs": "DB Developer", "Google": "DevOps", "Twitter": "Frontend"}}
{"Charlie": null}

Are there any tests for what happens when there's a duplicate object key? Please add tests for that, thanks!


pkg/sql/sem/builtins/aggregate_builtins.go, line 2685 at r2 (raw file):

	// if only key datum equal to null, return error
	if datum == tree.DNull {

Bring this up above the first check, it should error if we ever have a null in the first position.

@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented May 22, 2020

@jordanlewis thanks I will take care to address your comments

@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented May 23, 2020

Are there any tests for what happens when there's a duplicate object key? Please add tests for that, thanks!

I have some reservation with respect to duplicate keys, as I do not think this is defined or strictly articulated. While I understand that this has to be aligned with Postgres, it still not very well defined, I will add test to reflect current way to treat it. I actually was concerned about it, see here #48841. /cc @jordanlewis

@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented May 23, 2020

@jordanlewis the way CRDB currently treats JSON object formation, i.e. order of keys and order among duplicate keys depends on insertion order, hence not really deteministic for aggregate functions. (see: pkg/util/json/json.go:355)

func (s *pairSorter) unique() {
	// If there are any duplicate keys, then in sorted order it will have
	// two pairs with rank i and i + 1 whose keys are same.
	// For sorting based on comparisons, if two unique elements (pair.k, order)
	// have rank i and i + 1, they have to compare once to figure out their
	// relative order in the final position i and i + 1. So if there are any
	// equal elements, then the sort must have compared them at some point.
	if !s.hasNonUnique {
		return
	}
	top := 0
	for i := 1; i < len(s.pairs); i++ {
		if s.pairs[top].k != s.pairs[i].k {
			top++
			if top != i {
				s.pairs[top] = s.pairs[i]
			}
		}
	}
	s.pairs = s.pairs[:top+1]
}

I think that JSON object creation has to be fixed in a way that will not depend on how keys/values are inserted, unless I'm missing something.

PS. IMO sorting has to be done for keys, and then for values, rather than defer insertion order.
PPS. Given non-deterministic outcome, would you still expect to add test case for duplicate keys?

@cockroachdb cockroachdb deleted a comment from blathers-crl bot Jun 1, 2020
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Jun 1, 2020
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Jun 1, 2020
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I left a bunch of nits, but in general the code is almost ready to go.

I have some reservation with respect to duplicate keys, as I do not think this is defined or strictly articulated. While I understand that this has to be aligned with Postgres, it still not very well defined ...

Here is an excerpt from Postgres docs:

The aggregate functions array_agg, json_agg, jsonb_agg, json_object_agg, jsonb_object_agg,
string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully
different result values depending on the order of the input values. This ordering is unspecified by
default, but can be controlled by writing an ORDER BY clause within the aggregate call.

So when there is no ORDER BY clause in the aggregate function, we are free to choose any order. If we deviate from Postgres in such case, it's ok, meaning that we should not be concerned about this:

the way CRDB currently treats JSON object formation, i.e. order of keys and order among duplicate keys depends on insertion order, hence not really deteministic for aggregate functions.

In fact, in the logic tests you added we already deviate from Postgres, but it's acceptable.

On a related note, we should also add json_object_agg and jsonb_object_agg to isOrderingSensetive in optbuilder/groupby.go. This will allow us to support these new aggregates with ORDER BY clause (using window functions machinery behind the scenes), and we should add the tests for that.

Reviewed 1 of 11 files at r1, 2 of 7 files at r2, 8 of 8 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @C0rWin)


pkg/sql/logictest/testdata/logic_test/aggregate, line 2638 at r3 (raw file):


statement error pq: field name must not be null
select json_object_agg(null, null);

super nit: unnecessary semicolon.


pkg/sql/opt/operator.go, line 306 at r3 (raw file):

		ConstNotNullAggOp, CorrOp, FirstAggOp, JsonAggOp, JsonbAggOp,
		MaxOp, MinOp, SqrDiffOp, StdDevOp, StringAggOp, SumOp, SumIntOp,
		VarianceOp, XorAggOp, PercentileDiscOp, PercentileContOp, JsonObjectAggOp, JsonbObjectAggOp:

super nit: seems like it might be time to start a new line here and below.


pkg/sql/sem/builtins/aggregate_builtins.go, line 3051 at r3 (raw file):

}

// Add accumulates the transformed json into the JSON array.

nit: probably s/JSON array/JSON object/.


pkg/sql/sem/builtins/aggregate_builtins.go, line 3059 at r3 (raw file):

	}

	// if only key datum equal to null, return error

nit: if we have a comment that lives on a separate line, we usually start it with a capital letter and end it with a period.


pkg/util/json/json.go, line 246 at r3 (raw file):

}

// ObjectBuilderWithCounter builds a JSON object a key/value pair at a time, keeping of the memory usage of the object.

nit: s/keeping of the/keeping the/ and we strive for lines no longer than 80 characters.


pkg/util/json/json.go, line 252 at r3 (raw file):

}

// NewObjectBuilderWithCounter creates and instantiate ObjectBuilder with memory counter.

nit: s/instantiate/instantiates/.


pkg/util/json/json.go, line 263 at r3 (raw file):

// Add appends key value pair to the sequence and updates
// amount of memory allocated for the overall keys and values

nit: missing period.


pkg/util/json/json.go, line 266 at r3 (raw file):

func (b *ObjectBuilderWithCounter) Add(k string, v JSON) {
	b.ob.Add(k, v)
	// Size of added JSON + size of the map key pair and length of the key

nit: something is off with "map key pair and length of the key", maybe "Size of added JSON + overhead of storing key/value pair + the size of the key"? (Also, a missing period.)


pkg/util/json/json.go, line 276 at r3 (raw file):

}

// Size returns the size in bytes of the JSON Array the builder is going to build.

nit: probably s/Array/object/.

@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented Jun 9, 2020

@yuzefovich thanks for your feedback, will address your comments and provide updates.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Jun 9, 2020

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the O-community Originated from the community label Jun 9, 2020
@C0rWin C0rWin force-pushed the json_obj_agg branch 2 times, most recently from 6c8c49a to 80d306b Compare June 13, 2020 01:52
@C0rWin C0rWin requested review from jordanlewis and yuzefovich June 13, 2020 01:53
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me, but seems like you accidentally deleted a bunch of logic test cases, we should keep those :)

Reviewed 8 of 8 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @C0rWin)


pkg/sql/logictest/testdata/logic_test/aggregate, line 2570 at r3 (raw file):


query TT
SELECT min(NULL::STRING COLLATE en_us), max(NULL::STRING COLLATE en_us)

Seems like you accidentally deleted a bunch of test cases here.

@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented Jun 13, 2020

The code looks good to me, but seems like you accidentally deleted a bunch of logic test cases, we should keep those :)

Reviewed 8 of 8 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @C0rWin)

pkg/sql/logictest/testdata/logic_test/aggregate, line 2570 at r3 (raw file):


query TT
SELECT min(NULL::STRING COLLATE en_us), max(NULL::STRING COLLATE en_us)

Seems like you accidentally deleted a bunch of test cases here.

Ups, this seems due to usage of ```TESTFLAGS='-rewrite=true'````, while I'm not really sure what am I doing wrong, but if I am trying to run specific subset of logic tests with this flag enabled it overwrites relevant part, but also wipes ones which was not executed.

@C0rWin C0rWin requested a review from yuzefovich June 13, 2020 20:11
@RaduBerinde
Copy link
Copy Markdown
Member

-rewrite doesn't work well when you run only some subtests in a file.

@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented Jun 14, 2020

-rewrite doesn't work well when you run only some subtests in a file.

yep, thanks, realize that now, I have hit this several times in the past by was cautions enough to catch before pushed changes, unfortunately overlooked it this time.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @C0rWin and @jordanlewis)


pkg/sql/logictest/testdata/logic_test/aggregate, line 2528 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Are there any tests for what happens when there's a duplicate object key? Please add tests for that, thanks!

Hm, currently we deviate from Postgres in how we handle duplicate keys. Postgres seems to keep all of them whereas we keep only the last one.

PG:

yuzefovich=# CREATE TABLE profiles (
  userid int not null,
  property text not null,
  value text not null
);
CREATE TABLE
yuzefovich=# INSERT INTO profiles VALUES (1, 'email', 'foo'), (1, 'email', 'bar');
INSERT 0 2
yuzefovich=# SELECT json_object_agg(property, value) FROM profiles GROUP BY userid ORDER BY userid;
           json_object_agg            
--------------------------------------
 { "email" : "foo", "email" : "bar" }
(1 row)

CRDB:

root@127.0.0.1:50547/defaultdb> SELECT json_object_agg(property, value) FROM profiles GROUP BY userid ORDER BY userid;
  json_object_agg
--------------------
  {"email": "bar"}
(1 row)

I believe it is an existing behavior of jsonObjectBuilder, and this PR simply adds another case for when we use that object. @jordanlewis I think it should be ok to merge this, what do you think?

@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented Jun 14, 2020

I believe it is an existing behavior of jsonObjectBuilder, and this PR simply adds another case for when we use that object. @jordanlewis I think it should be ok to merge this, what do you think?

@yuzefovich please take a look on @jordanlewis answer here #48841

Use jsonb_build_object, in Postgres the json ones aren't semantic json, just a text format. We emulate the jsonb ones as the json type is not really that useful in Postgres.

@jordanlewis
Copy link
Copy Markdown
Member

Yes, @yuzefovich we don't try to emulate the json_xyz builtins, just the jsonb_xyz ones (though we alias our json_xyz to the jsonb_xyz before for convenience, perhaps that was a bad idea in retrospect).

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool, :lgtm:

I left last couple of nits, and once those are addressed, I'll merge this.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @C0rWin and @jordanlewis)


pkg/sql/opt/optbuilder/groupby.go, line 828 at r5 (raw file):

		return b.factory.ConstructPercentileCont(args[0], args[1])
	case "json_object_agg":
		return b.factory.ConstructJsonbObjectAgg(args[0], args[1])

nit: the calls are mismatched with the respective cases.


pkg/sql/sem/builtins/aggregate_builtins.go, line 3059 at r5 (raw file):

	}

	// If only key datum equal to null, return error.

nit: there is something slightly off with the comment, I'd probably replace it with something like "If the key datum is NULL, return an error."


pkg/sql/sem/builtins/aggregate_builtins.go, line 3081 at r5 (raw file):

}

// Result returns an DJSON from the array of JSON.

nit: s/an DJSON/a DJSON/.


pkg/util/json/json.go, line 246 at r3 (raw file):

Previously, yuzefovich wrote…

nit: s/keeping of the/keeping the/ and we strive for lines no longer than 80 characters.

Ping.

This commit adds implementation of json{,b}_object_agg built in
function, by adding a new construction which augments constructions
and allows to keep track on memory usage. Links json{,b}_object_agg
built in function to enable support of queries which aggregates
name/value pairs into JSON object.

Signed-off-by: Artem Barger <bartem@il.ibm.com>

Release note (sql change): enable aggregate queries which collect
key/value pairs into JSON object.
@C0rWin C0rWin requested a review from yuzefovich June 15, 2020 20:43
@C0rWin
Copy link
Copy Markdown
Contributor Author

C0rWin commented Jun 15, 2020

I left last couple of nits, and once those are addressed, I'll merge this.

@yuzefovich thanks, done.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

bors r=yuzefovich

Reviewed 3 of 3 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @C0rWin and @jordanlewis)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 16, 2020

Build succeeded

@craig craig bot merged commit 96feb13 into cockroachdb:master Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: implement JSON{,B}_{OBJECT,}_AGG

5 participants