Switch parser from pingcap/tidb to vitess sqlparser#15
Merged
Conversation
Motivation surfaced when verifying `make schema` against the sample DBs: sakila uses GEOMETRY columns and SPATIAL indexes that pingcap's parser cannot parse — its grammar reserves the type code (`mysql.TypeGeometry = 0xff`) but has no production that creates a column with the type. The deeper reason is design intent: pingcap parses what TiDB *executes* (a subset of MySQL by design); vitess sits between application and MySQL, so it must parse anything MySQL accepts. Side effects of the swap: - Stripped binary 21M → 15M (~26% smaller). pingcap pulls TiDB's full types/charset/timezone/decimal stack; vitess sqlparser is lighter even with its grpc/protobuf transitive deps (most stay dead-code- eliminated). - Same Apache 2.0 license. - API differences absorbed inside parser/: SplitStatementToPieces + per-statement Parse, Walk(visit, node) instead of Accept(visitor), CREATE INDEX shows up as *AlterTable + AddIndexDefinition. Reshape: - parser/parser.go rewritten on vitess types. columnTypeString builds the canonical form by hand from typed fields (vitess's sqlparser.String(ColumnType) inlines NOT NULL / AUTO_INCREMENT and isn't usable for diff). - parser/view.go rewritten with vitess Walk visitors: stripQualifiers (drop db./table. prefixes from ColName and TableName), stripRedundantAliases (clear AsName on every SelectField), unwrapParens (no-op on vitess but kept as a hook). - parser.normalizeDefaultExpr collapses vitess's `current_timestamp()` back to `CURRENT_TIMESTAMP` so it matches the catalog form. - catalog.normalizeColumnDefault wraps bareword defaults of ENUM / SET / CHAR / temporal columns in single quotes (the catalog returns `G` not `'G'` for enum defaults; vitess can't parse the former). Sample-DB round-trip status: - Chinook: dump → plan = no drift ✓ - employees: dump → plan = no drift ✓ (with view-on-view topo sort) - sakila: dump → plan = no drift ✓ (was previously blocked by GEOMETRY) YAML fixtures updated to vitess's restored SELECT format (`select id, ` + "`name`" + ` from users` instead of pingcap's `select ` + "`id`,`name`" + ` from ` + "`db`.`users`"). AGENTS.md replaces the "Why tidb/pkg/parser" section with a "Why vitess sqlparser" rationale; TODO.md drops the geometry / WITH CASCADED entries since both are resolved or no longer apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15 +/- ##
==========================================
+ Coverage 37.79% 38.27% +0.48%
==========================================
Files 24 24
Lines 1593 1586 -7
==========================================
+ Hits 602 607 +5
+ Misses 875 858 -17
- Partials 116 121 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
winebarrel
added a commit
that referenced
this pull request
May 2, 2026
Three inline comments — one is a real ordering bug, two are leftover
edge cases for the line-scan comment skipper.
diff/tables.go + diff_all.go (BUG fix)
- TableDiffResult gains a RenameStmts bucket. Table renames
(`ALTER TABLE … RENAME TO …`) now go there instead of into
Stmts, and diff_all.go schedules them between view drops and
FK drops. Without this, a migration that both renames a table
and drops an FK on it would emit:
ALTER TABLE new_name DROP FOREIGN KEY fk; -- (FK drops first)
ALTER TABLE old_name RENAME TO new_name;
The FK drop runs against new_name which doesn't exist yet, and
apply errors out. Renames must precede FK drops so subsequent
ALTERs target an existing name. (Copilot #3.)
parser/directive.go
- stripLeadingBlockComment helper handles the `/* note */ <sql>`
shape: a line that starts with a single-line block comment but
has SQL after `*/`. Both ExtractStmtRenameFrom and
ExtractInlineRenames now call it. For stmt extraction, hitting
SQL after the comment stops the leading-block scan (the `*/`
line is the first SQL line). For inline extraction, the SQL
remainder is reprocessed as the target line so a pending
directive can attach to it (or trip sawSQL on the opener).
(Copilot #1, #2.)
Tests
- Existing tests using res.Stmts for table renames updated to use
res.RenameStmts.
- YAML apply fixtures (rename_table, rename_column, rename_index)
continue to pass — diff_all.go emits the rename in the expected
spot in the merged plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Surfaced when verifying
make schema: sakila usesGEOMETRYcolumns andSPATIAL INDEXwhich pingcap's parser cannot parse — its grammar reserves the type code (mysql.TypeGeometry = 0xff) but has no production that creates a column with the type. The deeper reason is design intent: pingcap parses what TiDB executes (a subset of MySQL by design); vitess sits between application and MySQL so it must parse anything MySQL accepts.Side effects of the swap
parser/:SplitStatementToPieces+ per-statementParse,Walk(visit, node)instead ofAccept(visitor),CREATE INDEXshows up as*AlterTable+AddIndexDefinition.Reshape
parser/parser.gorewritten on vitess types.columnTypeStringbuilds the canonical form by hand becausesqlparser.String(ColumnType)inlines NOT NULL / AUTO_INCREMENT and isn't usable for diff.parser/view.gorewritten with vitessWalkvisitors:stripQualifiers,stripRedundantAliases,unwrapParens.parser.normalizeDefaultExprcollapses vitess'scurrent_timestamp()back toCURRENT_TIMESTAMP.catalog.normalizeColumnDefaultwraps bareword defaults (Gfor ENUM) in single quotes since vitess rejects the bareword form.Sample-DB round-trip status
YAML fixtures updated to match vitess's restored SELECT format.
Test plan
make test(parser + diff + catalog round-trip + YAML harness)make test-scenariogolangci-lint🤖 Generated with Claude Code