Skip to content

sql: fix the pg text representation of tuples/arrays#28151

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20180801-pp-array-tuple
Aug 2, 2018
Merged

sql: fix the pg text representation of tuples/arrays#28151
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20180801-pp-array-tuple

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Aug 1, 2018

Forked off #28143.
Fixes #25522.

This patch fixes the conversion of arrays and tuples to text for the
purpose of emitting the arrays/tuples back to a client through pgwire.
This now properly supports nested arrays, arrays containing tuples,
tuples containing arrays, etc. Labels in tuples are never emitted
through pgwire.

Release note (bug fix): CockroachDB supports a wider range of tuple
and array values in query results.

@knz knz requested review from a team, jordanlewis and justinj August 1, 2018 16:01
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@knz knz force-pushed the 20180801-pp-array-tuple branch 2 times, most recently from d9cef74 to ca2ec25 Compare August 1, 2018 22:17
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 1, 2018

Now with tests. I checked that all the combinations format the same as postgres.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 1, 2018

welp I spoke too fast. Fixing.

@knz knz force-pushed the 20180801-pp-array-tuple branch 2 times, most recently from 7c2ecbb to 211795d Compare August 1, 2018 23:36
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 1, 2018

There we go. Actually once I understood how this works it's rather regular/simple.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 1, 2018

RFAL

Copy link
Copy Markdown
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if i > 0 {
sep := ""
// TODO(justin): add a test for nested arrays.
for _, d := range v.Array {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well just do if i > 0 { WriteByte(' ') } ?

// Used internally in combination with FmtArrays defined below.
fmtWithinArray
// Used internally in combination with FmtPgwireText defined below.
pgwireFormat
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not especially convinced that this shouldn't be its own method or function (on Datum) at this point, rather than a flag on Format, but I don't feel particularly strongly.

Copy link
Copy Markdown
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but some comments.

Reviewed 13 of 13 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/logictest/testdata/logic_test/tuple, line 712 at r1 (raw file):

----
r
("(""(1,""""2"""",3)"",""(4,""""5"""")"",""(6)"")","(7,8)","(""9"")")

This is so ugly. :(


pkg/sql/sem/tree/datum.go, line 413 at r1 (raw file):

func (d *DBool) Format(ctx *FmtCtx) {
	if ctx.HasFlags(pgwireFormat) {
		v := byte('f')

Why store this value?

if bool(*d) {
  ctx.WriteByte('t')
} else {
  ctx.WriteByte('f')
}
return

pkg/sql/sem/tree/datum.go, line 2734 at r1 (raw file):

	ctx.WriteByte('(')
	comma := ""
	for _, v := range d.D {

Not sure if this is better, but it reads a bit nicer I think. Also this way you don't set the comma variable each time.

for i, v := range d.D {
  if i > 1 {
    ctx.WriteString(", ")
  }
  ctx.FormatNode(v)
}

I really want a bool for first and last and I'm kinda surprised that they were not included in golang's range operator.


pkg/sql/sem/tree/datum.go, line 3026 at r1 (raw file):

	}

	ctx.WriteString("ARRAY[")

Here too,


pkg/sql/sem/tree/pgwire_encode.go, line 37 at r1 (raw file):

	// string printer called pgwireFormatStringInTuple().
	ctx.WriteByte('(')
	comma := ""

See my other comments, I'd rather a check on the iteration count than setting comma on each pass. For all of the commas in this file.


pkg/sql/sem/tree/pgwire_encode.go, line 57 at r1 (raw file):

func pgwireFormatStringInTuple(buf *bytes.Buffer, in string) {
	// TODO(knz): to be fully pg-compliant, this function should avoid

For all of these, this seems pretty easy to do. Is there a reason to not just use a different buffer then copy from one to the other with or without the double quotes?


pkg/sql/sem/tree/pgwire_encode.go, line 111 at r1 (raw file):

func pgwireFormatStringInArray(buf *bytes.Buffer, in string) {
	// TODO(knz): to be fully pg-compliant, this function should avoid

same here


pkg/util/stringencoding/string_encoding.go, line 86 at r1 (raw file):

// EncodeChar is used internally to write out a character from
// a larger string to a buffer.
func EncodeChar(buf *bytes.Buffer, entireString string, currentRune rune, currentIdx int) {

This function looks like it could use a unit test.

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/logictest/testdata/logic_test/tuple, line 712 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

This is so ugly. :(

It is ugly but it is also correct. See my other comment below.


pkg/sql/pgwire/types.go, line 165 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

might as well just do if i > 0 { WriteByte(' ') } ?

This way is actually faster / more efficient. Vivek taught me this trick. I'm using it as a general-purpose pattern now.


pkg/sql/sem/tree/datum.go, line 413 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Why store this value?

if bool(*d) {
  ctx.WriteByte('t')
} else {
  ctx.WriteByte('f')
}
return

I don't care either way. Changed.


pkg/sql/sem/tree/datum.go, line 2734 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Not sure if this is better, but it reads a bit nicer I think. Also this way you don't set the comma variable each time.

for i, v := range d.D {
  if i > 1 {
    ctx.WriteString(", ")
  }
  ctx.FormatNode(v)
}

I really want a bool for first and last and I'm kinda surprised that they were not included in golang's range operator.

This way is actually faster / more efficient. Vivek taught me this trick. I'm using it as a general-purpose pattern now.


pkg/sql/sem/tree/datum.go, line 3026 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Here too,

This way is actually faster / more efficient. Vivek taught me this trick. I'm using it as a general-purpose pattern now.


pkg/sql/sem/tree/format.go, line 87 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

I'm not especially convinced that this shouldn't be its own method or function (on Datum) at this point, rather than a flag on Format, but I don't feel particularly strongly.

This combination still allows up to combine other flags (e.g. anonymize).


pkg/sql/sem/tree/pgwire_encode.go, line 37 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

See my other comments, I'd rather a check on the iteration count than setting comma on each pass. For all of the commas in this file.

ditto my other comment


pkg/sql/sem/tree/pgwire_encode.go, line 57 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

For all of these, this seems pretty easy to do. Is there a reason to not just use a different buffer then copy from one to the other with or without the double quotes?

Because performance.

@justinj has decided to do it this way originally and reasoned as follows: it is correct to add the double quotes when they are not required. Removing them is only "niceness" to the human user. I'm not revisiting this here -- we can make a separate PR for that.


pkg/sql/sem/tree/pgwire_encode.go, line 111 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

same here

ditto


pkg/util/stringencoding/string_encoding.go, line 86 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

This function looks like it could use a unit test.

None of these functions are tested directly in this package. Their unit tests live in format_test.go in pkg/sql/sem/tree and pkg/sql/lex.

@knz knz force-pushed the 20180801-pp-array-tuple branch from 211795d to 7071d03 Compare August 2, 2018 16:02
This patch fixes the conversion of arrays, tuples and bools to text
for the purpose of emitting the values back to a client through pgwire
with the text protocol. This now properly supports nested arrays,
arrays containing tuples, tuples containing arrays, etc. The boolean
values get serialized properly to 't' and 'f' like in PostgreSQL.

Release note (bug fix): CockroachDB supports a wider range of tuple
and array values in query results.
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/logictest/testdata/logic_test/srfs, line 543 at r2 (raw file):

SELECT ((i.keys).*, 123) FROM (SELECT information_schema._pg_expandarray(ARRAY[3,2,1]) AS keys) AS i
----
((3, 1),123)

Jordan just noticed there is a bug here, but unrelated to this PR. Filed as #28203

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 2, 2018

PTAL with Justin's offline comments in mind?

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 2, 2018

I'll hold off this while Bram prepares his all-eng prez.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 2, 2018

Bram gave a 👍 offline!

thanks bram

@knz knz dismissed BramGruneir’s stale review August 2, 2018 18:19

green light

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 2, 2018

bors r+

craig bot pushed a commit that referenced this pull request Aug 2, 2018
28151: sql: fix the pg text representation of tuples/arrays r=knz a=knz

Forked off #28143.
Fixes #25522.

This patch fixes the conversion of arrays and tuples to text for the
purpose of emitting the arrays/tuples back to a client through pgwire.
This now properly supports nested arrays, arrays containing tuples,
tuples containing arrays, etc. Labels in tuples are never emitted
through pgwire.

Release note (bug fix): CockroachDB supports a wider range of tuple
and array values in query results.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 2, 2018

Build succeeded

@craig craig bot merged commit 200d2a6 into cockroachdb:master Aug 2, 2018
@knz knz deleted the 20180801-pp-array-tuple branch August 2, 2018 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: Pretty Printing of Tuples does not match postgres

4 participants