sql: fix substring(byte[]) to treat input as raw bytes without escaping by rafiss · Pull Request #58265 · cockroachdb/cockroach

rafiss · 2020-12-24T00:51:28Z

Release note (bug fix): The substring function on byte arrays would
treat its input as unicode code points, which would cause the wrong
bytes to be returned. Now it only operates on the raw bytes.

Release note (bug fix): The substring(byte[]) functions were not able to
interpret bytes that had the \ character since it was treating it as
the beginning of an escape sequence. This is now fixed.

Release note (bug fix): The substring function on byte arrays would treat its input as unicode code points, which would cause the wrong bytes to be returned. Now it only operates on the raw bytes.

Release note (bug fix): The substring(byte[]) functions were not able to interpret bytes that had the `\` character since it was treating it as the beginning of an escape sequence. This is now fixed.

cockroach-teamcity · 2020-12-24T00:51:36Z

This change is

knz

Second commit LGTM. First commit unsure - I take it that you checked that pg does the same - but I'll let another reviewer approve.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

rafiss · 2020-12-28T17:42:39Z

Yeah, PG does this here: text_substring versus bytea_substring

solongordon

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rafiss, and @solongordon)

pkg/sql/sem/builtins/builtins.go, line 4930 at r1 (raw file):

	end := start + length
	// Check for integer overflow.

These are kind of funky cases. If it's easy enough, please add tests and verify we match Postgres behavior.

rafiss

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

pkg/sql/sem/builtins/builtins.go, line 4930 at r1 (raw file):
Sure. I have been adding these tests, but while comparing to Postgres, I think I discovered a bug/mistake in PG.

The PG source code https://github.com/postgres/postgres/blob/472e518a44eacd9caac7d618f1b6451672ca4481/src/backend/utils/adt/varlena.c#L884-L891

                        int			E = S + length;
			/*
			 * A negative value for L is the only way for the end position to
			 * be before the start. SQL99 says to throw an error.
			 */
			if (E < S)
				ereport(ERROR,
						(errcode(ERRCODE_SUBSTRING_ERROR),
						 errmsg("negative substring length not allowed")));

But that comment is wrong -- the end position could be before the start if L is not negative and the addition overflows.

I'm gonna go ahead and intentionally diverge from PG in that case to avoid this confusing behavior.

> select substr('string', 2147483646, 2147483646);
2021-01-04 12:23:00.298 EST [85734] ERROR:  negative substring length not allowed

I submitted a bug report to PG as well.

solongordon · 2021-01-04T17:58:29Z

Nice catch. Do you think we should consider just erroring out in the overflow case? But with a more accurate error than Postgres?

rafiss · 2021-01-04T18:06:42Z

The PG devs seem to be pretty responsive on the bug tracker, so I'll see how they want to handle it and decide if we should do the same. I would lean towards keeping our current behavior of just going to the end of the string.

e.g. these should be the same.

> select substring('string', 2, 100);
tring

> select substring('string', 2, 9223372036854775807);
tring

rafiss · 2021-01-04T21:47:59Z

It's being discussed in the PG list here: https://www.postgresql.org/message-id/16804-f4eeeb6c11ba71d4%40postgresql.org

rafiss · 2021-01-19T20:55:36Z

PG devs landed on doing the same behavior we already implement, so merging this https://www.postgresql.org/message-id/3219376.1609795681%40sss.pgh.pa.us

bors r=solongordon

craig · 2021-01-19T22:03:17Z

Build succeeded:

GitHub CI (Cockroach)

rafiss added 2 commits December 23, 2020 19:35

sql: fix substring(byte[]) to treat input as raw bytes

0afa2b9

Release note (bug fix): The substring function on byte arrays would treat its input as unicode code points, which would cause the wrong bytes to be returned. Now it only operates on the raw bytes.

sql: fix substring(byte[]) to stop trying to escape raw bytes

c464506

Release note (bug fix): The substring(byte[]) functions were not able to interpret bytes that had the `\` character since it was treating it as the beginning of an escape sequence. This is now fixed.

rafiss requested review from a team and jordanlewis December 24, 2020 00:51

rafiss mentioned this pull request Dec 24, 2020

could not parse "\\000x\\0ffyz" as type bytes: invalid bytea escape sequence #57367

Closed

knz reviewed Dec 28, 2020

View reviewed changes

solongordon approved these changes Jan 3, 2021

View reviewed changes

rafiss commented Jan 4, 2021

View reviewed changes

craig bot merged commit 47eb9f3 into cockroachdb:master Jan 19, 2021

rafiss mentioned this pull request Jan 20, 2021

release-20.2: sql: fix substring(byte[]) to treat input as raw bytes without escaping #59170

Merged

rafiss deleted the fix-byte-substring branch February 5, 2021 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: fix substring(byte[]) to treat input as raw bytes without escaping#58265

sql: fix substring(byte[]) to treat input as raw bytes without escaping#58265
craig[bot] merged 2 commits intocockroachdb:masterfrom
rafiss:fix-byte-substring

rafiss commented Dec 24, 2020

Uh oh!

cockroach-teamcity commented Dec 24, 2020

Uh oh!

knz left a comment

Uh oh!

rafiss commented Dec 28, 2020

Uh oh!

solongordon left a comment

Uh oh!

rafiss left a comment •

edited

Loading

Uh oh!

solongordon commented Jan 4, 2021

Uh oh!

rafiss commented Jan 4, 2021

Uh oh!

rafiss commented Jan 4, 2021

Uh oh!

rafiss commented Jan 19, 2021

Uh oh!

craig bot commented Jan 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rafiss commented Dec 24, 2020

Uh oh!

cockroach-teamcity commented Dec 24, 2020

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

rafiss commented Dec 28, 2020

Uh oh!

solongordon left a comment

Choose a reason for hiding this comment

Uh oh!

rafiss left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

solongordon commented Jan 4, 2021

Uh oh!

rafiss commented Jan 4, 2021

Uh oh!

rafiss commented Jan 4, 2021

Uh oh!

rafiss commented Jan 19, 2021

Uh oh!

craig bot commented Jan 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rafiss left a comment •

edited

Loading