Skip to content

pgrepl: implement the parser for the PostgreSQL replication protocol#104938

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
otan-cockroach:sigh
Jun 19, 2023
Merged

pgrepl: implement the parser for the PostgreSQL replication protocol#104938
craig[bot] merged 2 commits intocockroachdb:masterfrom
otan-cockroach:sigh

Conversation

@otan
Copy link
Copy Markdown
Contributor

@otan otan commented Jun 15, 2023

This commit implements parsing the PG replication protocol syntax which is derived from the .y file from PG itself. We had to invent some new lexing protocols as the lexing mechanism is different to standard plpgsql/pgsql.

This is in preparation for supporting pglogical within CRDB.

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-26486
Release note: None

@otan otan requested a review from ZhouXing19 June 15, 2023 06:26
@otan otan requested a review from a team as a code owner June 15, 2023 06:26
@otan otan requested a review from a team June 15, 2023 06:26
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@otan otan force-pushed the sigh branch 9 times, most recently from 626bcb9 to 15bc9af Compare June 16, 2023 03:33
@otan otan requested a review from a team as a code owner June 16, 2023 03:33
@otan otan requested review from renatolabs and srosenberg and removed request for a team June 16, 2023 03:33
@otan otan force-pushed the sigh branch 5 times, most recently from 13b0298 to bd28b57 Compare June 16, 2023 07:35
Copy link
Copy Markdown
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Mostly nit comments and had 2 questions

Reviewed 2 of 14 files at r3, 38 of 38 files at r5, 38 of 38 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan, @renatolabs, and @srosenberg)


pkg/sql/pgrepl/pgreplparser/lexer.go line 84 at r6 (raw file):

	case ch == doubleQuote:
		l.scanUntilEndQuote(lval, doubleQuote, lexbase.NormalizeString, IDENT)
	case unicode.IsDigit(ch):

nit: This case overlaps a lot with the isID(ch, true) case for the LSN check. I wonder if they can share that part of code.


pkg/sql/pgrepl/pgreplparser/lexer.go line 91 at r6 (raw file):

			switch {
			case unicode.IsDigit(nextCh):
			case isHexDigit(nextCh) || nextCh == '/':

if unicode.IsDigit(nextCh) == true, is isHexDigit(nextCh) always true? If so, shouldn't we remove the case unicode.IsDigit(nextCh)? Otherwise, these digits will never be parsed as an LSN.


pkg/sql/pgrepl/pgreplparser/lexer.go line 150 at r6 (raw file):

}

func isID(ch rune, firstCh bool) bool {

I had some trouble understanding what ID means here. Could you add some comments to this func, and maybe have 200 and 377 as const var?


pkg/sql/pgrepl/pgreplparser/lexer.go line 161 at r6 (raw file):

}

func (l *lexer) checkLSN(lval *pgreplSymType, startPos int) bool {

nit: it would be helpful to have comments explaining the expected format of LSN -- two hexadecimal numbers of up to 8 digits each, separated by a slash.


pkg/sql/pgrepl/pgreplparser/pgrepl.y line 1 at r6 (raw file):

%{

nit: for the whole yacc file formatting, tabs -> spaces


pkg/sql/pgrepl/pgreplparser/pgrepl.y line 218 at r6 (raw file):

/*
 * SHOW setting
 * NOTE(otan): we omit K_SHOW as this should be handled by the existing protocol.

Could you say more about this? Does it mean replication config params can only be queried via the pg protocol? Why?


pkg/sql/pgrepl/pgreplparser/pgrepl.y line 376 at r6 (raw file):

			K_TIMELINE_HISTORY UCONST
				{
					$$.val = &pgrepltree.TimelineHistory{Timeline: $2.numVal()}

nit: pg has a check for positive argument here. Do we want to have one as well?


pkg/sql/pgrepl/pgreplparser/pgrepl.y line 400 at r6 (raw file):

			K_TIMELINE UCONST
				{
					$$.val = $2.numVal();

nit: ditto, positive arg check


pkg/sql/pgrepl/pgreplparser/testdata/lexer/lsn.ddt line 39 at r6 (raw file):

bA101001010101010/f
----
ERROR: at or near "bA101001010101010/f": syntax error: error decoding LSN: strconv.ParseUint: parsing "bA101001010101010": value out of range

nit: maybe add a case that ends with '/'?


pkg/sql/pgrepl/pgreplparser/testdata/parser/start_replication.ddt line 57 at r6 (raw file):

----
START_REPLICATION SLOT slot_1 LOGICAL A0010/1011 (opt 'opt', "xXx" 'a') -- normalized!
START_REPLICATION SLOT slot_1 LOGICAL A0010/1011 (opt '_', "xXx" '_') -- literals removed

nit: maybe adding a case with an invalid LSN?

This commit adds a basic utility for the PG LSN functionality.

Release note: None
@otan otan force-pushed the sigh branch 2 times, most recently from c1d1eaf to 628d336 Compare June 19, 2023 00:16
@otan otan requested a review from ZhouXing19 June 19, 2023 03:32
Copy link
Copy Markdown
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan, @renatolabs, and @srosenberg)


pkg/sql/pgrepl/pgreplparser/pgrepl.y line 220 at r8 (raw file):

/*
 * SHOW setting
 * NOTE(otan): we omit K_SHOW as the fallback from being being parsed by

nit: remove one of the "being"


pkg/sql/pgrepl/pgreplparser/testdata/lexer/lsn.ddt line 37 at r8 (raw file):

# ends with a /
lex
00ff/

This doesn't seem to be supported by pg:

postgres=# SELECT '00ff/'::pg_LSN;
ERROR:  22P02: invalid input syntax for type pg_lsn: "00ff/"
LINE 1: SELECT '00ff/'::pg_LSN;
               ^
LOCATION:  pg_lsn_in, pg_lsn.c:74

postgres=# SELECT '00ff/00'::pg_LSN;
 pg_lsn 
--------
 FF/0
(1 row)

This commit implements parsing the [PG replication protocol syntax](https://www.postgresql.org/docs/current/protocol-replication.html)
which is derived from the `.y` file from PG itself. We had to invent
some new lexing protocols as the lexing mechanism is different to
standard plpgsql/pgsql.

This is in preparation for supporting pglogical within CRDB.

Release note: None
Copy link
Copy Markdown
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: :

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @otan, @renatolabs, and @srosenberg)

@otan
Copy link
Copy Markdown
Contributor Author

otan commented Jun 19, 2023

bors r=ZhouXing19

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 19, 2023

Build succeeded:

@craig craig bot merged commit 9633594 into cockroachdb:master Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants