pgrepl: implement the parser for the PostgreSQL replication protocol#104938
pgrepl: implement the parser for the PostgreSQL replication protocol#104938craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
626bcb9 to
15bc9af
Compare
13b0298 to
bd28b57
Compare
ZhouXing19
left a comment
There was a problem hiding this comment.
Nice work! Mostly nit comments and had 2 questions
Reviewed 2 of 14 files at r3, 38 of 38 files at r5, 38 of 38 files at r6, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @otan, @renatolabs, and @srosenberg)
pkg/sql/pgrepl/pgreplparser/lexer.go line 84 at r6 (raw file):
case ch == doubleQuote: l.scanUntilEndQuote(lval, doubleQuote, lexbase.NormalizeString, IDENT) case unicode.IsDigit(ch):
nit: This case overlaps a lot with the isID(ch, true) case for the LSN check. I wonder if they can share that part of code.
pkg/sql/pgrepl/pgreplparser/lexer.go line 91 at r6 (raw file):
switch { case unicode.IsDigit(nextCh): case isHexDigit(nextCh) || nextCh == '/':
if unicode.IsDigit(nextCh) == true, is isHexDigit(nextCh) always true? If so, shouldn't we remove the case unicode.IsDigit(nextCh)? Otherwise, these digits will never be parsed as an LSN.
pkg/sql/pgrepl/pgreplparser/lexer.go line 150 at r6 (raw file):
} func isID(ch rune, firstCh bool) bool {
I had some trouble understanding what ID means here. Could you add some comments to this func, and maybe have 200 and 377 as const var?
pkg/sql/pgrepl/pgreplparser/lexer.go line 161 at r6 (raw file):
} func (l *lexer) checkLSN(lval *pgreplSymType, startPos int) bool {
nit: it would be helpful to have comments explaining the expected format of LSN -- two hexadecimal numbers of up to 8 digits each, separated by a slash.
pkg/sql/pgrepl/pgreplparser/pgrepl.y line 1 at r6 (raw file):
%{
nit: for the whole yacc file formatting, tabs -> spaces
pkg/sql/pgrepl/pgreplparser/pgrepl.y line 218 at r6 (raw file):
/* * SHOW setting * NOTE(otan): we omit K_SHOW as this should be handled by the existing protocol.
Could you say more about this? Does it mean replication config params can only be queried via the pg protocol? Why?
pkg/sql/pgrepl/pgreplparser/pgrepl.y line 376 at r6 (raw file):
K_TIMELINE_HISTORY UCONST { $$.val = &pgrepltree.TimelineHistory{Timeline: $2.numVal()}
nit: pg has a check for positive argument here. Do we want to have one as well?
pkg/sql/pgrepl/pgreplparser/pgrepl.y line 400 at r6 (raw file):
K_TIMELINE UCONST { $$.val = $2.numVal();
nit: ditto, positive arg check
pkg/sql/pgrepl/pgreplparser/testdata/lexer/lsn.ddt line 39 at r6 (raw file):
bA101001010101010/f ---- ERROR: at or near "bA101001010101010/f": syntax error: error decoding LSN: strconv.ParseUint: parsing "bA101001010101010": value out of range
nit: maybe add a case that ends with '/'?
pkg/sql/pgrepl/pgreplparser/testdata/parser/start_replication.ddt line 57 at r6 (raw file):
---- START_REPLICATION SLOT slot_1 LOGICAL A0010/1011 (opt 'opt', "xXx" 'a') -- normalized! START_REPLICATION SLOT slot_1 LOGICAL A0010/1011 (opt '_', "xXx" '_') -- literals removed
nit: maybe adding a case with an invalid LSN?
This commit adds a basic utility for the PG LSN functionality. Release note: None
c1d1eaf to
628d336
Compare
ZhouXing19
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @otan, @renatolabs, and @srosenberg)
pkg/sql/pgrepl/pgreplparser/pgrepl.y line 220 at r8 (raw file):
/* * SHOW setting * NOTE(otan): we omit K_SHOW as the fallback from being being parsed by
nit: remove one of the "being"
pkg/sql/pgrepl/pgreplparser/testdata/lexer/lsn.ddt line 37 at r8 (raw file):
# ends with a / lex 00ff/
This doesn't seem to be supported by pg:
postgres=# SELECT '00ff/'::pg_LSN;
ERROR: 22P02: invalid input syntax for type pg_lsn: "00ff/"
LINE 1: SELECT '00ff/'::pg_LSN;
^
LOCATION: pg_lsn_in, pg_lsn.c:74
postgres=# SELECT '00ff/00'::pg_LSN;
pg_lsn
--------
FF/0
(1 row)
This commit implements parsing the [PG replication protocol syntax](https://www.postgresql.org/docs/current/protocol-replication.html) which is derived from the `.y` file from PG itself. We had to invent some new lexing protocols as the lexing mechanism is different to standard plpgsql/pgsql. This is in preparation for supporting pglogical within CRDB. Release note: None
ZhouXing19
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @otan, @renatolabs, and @srosenberg)
|
bors r=ZhouXing19 |
|
Build succeeded: |
This commit implements parsing the PG replication protocol syntax which is derived from the
.yfile from PG itself. We had to invent some new lexing protocols as the lexing mechanism is different to standard plpgsql/pgsql.This is in preparation for supporting pglogical within CRDB.
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-26486
Release note: None