Skip to content

Switch parser from pingcap/tidb to vitess sqlparser#15

Merged
winebarrel merged 1 commit into
mainfrom
vitess-parser
May 2, 2026
Merged

Switch parser from pingcap/tidb to vitess sqlparser#15
winebarrel merged 1 commit into
mainfrom
vitess-parser

Conversation

@winebarrel

Copy link
Copy Markdown
Owner

Summary

Surfaced when verifying make schema: sakila uses GEOMETRY columns and SPATIAL INDEX which pingcap's parser cannot parse — its grammar reserves the type code (mysql.TypeGeometry = 0xff) but has no production that creates a column with the type. The deeper reason is design intent: pingcap parses what TiDB executes (a subset of MySQL by design); vitess sits between application and MySQL so it must parse anything MySQL accepts.

Side effects of the swap

  • Stripped binary 21 MB → 15 MB (~26% smaller). Pingcap pulls TiDB's full types/charset/timezone/decimal stack; vitess sqlparser is lighter even with its grpc/protobuf transitive deps.
  • Same Apache 2.0 license.
  • API differences absorbed inside parser/: SplitStatementToPieces + per-statement Parse, Walk(visit, node) instead of Accept(visitor), CREATE INDEX shows up as *AlterTable + AddIndexDefinition.

Reshape

  • parser/parser.go rewritten on vitess types. columnTypeString builds the canonical form by hand because sqlparser.String(ColumnType) inlines NOT NULL / AUTO_INCREMENT and isn't usable for diff.
  • parser/view.go rewritten with vitess Walk visitors: stripQualifiers, stripRedundantAliases, unwrapParens.
  • parser.normalizeDefaultExpr collapses vitess's current_timestamp() back to CURRENT_TIMESTAMP.
  • catalog.normalizeColumnDefault wraps bareword defaults (G for ENUM) in single quotes since vitess rejects the bareword form.

Sample-DB round-trip status

DB dump → plan
Chinook ✓ no drift
employees ✓ no drift (view-on-view topo sort exercised)
sakila ✓ no drift (previously blocked by GEOMETRY)

YAML fixtures updated to match vitess's restored SELECT format.

Test plan

  • make test (parser + diff + catalog round-trip + YAML harness)
  • make test-scenario
  • golangci-lint
  • All three sample DBs round-trip without drift
  • Stripped binary measured at 15 MB
  • CI green on this PR

🤖 Generated with Claude Code

Motivation surfaced when verifying `make schema` against the sample DBs:
sakila uses GEOMETRY columns and SPATIAL indexes that pingcap's parser
cannot parse — its grammar reserves the type code (`mysql.TypeGeometry =
0xff`) but has no production that creates a column with the type. The
deeper reason is design intent: pingcap parses what TiDB *executes*
(a subset of MySQL by design); vitess sits between application and
MySQL, so it must parse anything MySQL accepts.

Side effects of the swap:
- Stripped binary 21M → 15M (~26% smaller). pingcap pulls TiDB's full
  types/charset/timezone/decimal stack; vitess sqlparser is lighter
  even with its grpc/protobuf transitive deps (most stay dead-code-
  eliminated).
- Same Apache 2.0 license.
- API differences absorbed inside parser/: SplitStatementToPieces +
  per-statement Parse, Walk(visit, node) instead of Accept(visitor),
  CREATE INDEX shows up as *AlterTable + AddIndexDefinition.

Reshape:
- parser/parser.go rewritten on vitess types. columnTypeString builds
  the canonical form by hand from typed fields (vitess's
  sqlparser.String(ColumnType) inlines NOT NULL / AUTO_INCREMENT and
  isn't usable for diff).
- parser/view.go rewritten with vitess Walk visitors:
  stripQualifiers (drop db./table. prefixes from ColName and TableName),
  stripRedundantAliases (clear AsName on every SelectField),
  unwrapParens (no-op on vitess but kept as a hook).
- parser.normalizeDefaultExpr collapses vitess's `current_timestamp()`
  back to `CURRENT_TIMESTAMP` so it matches the catalog form.
- catalog.normalizeColumnDefault wraps bareword defaults of ENUM /
  SET / CHAR / temporal columns in single quotes (the catalog returns
  `G` not `'G'` for enum defaults; vitess can't parse the former).

Sample-DB round-trip status:
- Chinook: dump → plan = no drift ✓
- employees: dump → plan = no drift ✓ (with view-on-view topo sort)
- sakila: dump → plan = no drift ✓ (was previously blocked by
  GEOMETRY)

YAML fixtures updated to vitess's restored SELECT format
(`select id, ` + "`name`" + ` from users` instead of pingcap's
`select ` + "`id`,`name`" + ` from ` + "`db`.`users`").

AGENTS.md replaces the "Why tidb/pkg/parser" section with a
"Why vitess sqlparser" rationale; TODO.md drops the geometry / WITH
CASCADED entries since both are resolved or no longer apply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@winebarrel winebarrel enabled auto-merge May 2, 2026 04:53
@codecov

codecov Bot commented May 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 39.88920% with 217 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.27%. Comparing base (0dede1e) to head (038819c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
parser/parser.go 47.12% 112 Missing and 26 partials ⚠️
parser/view.go 25.92% 59 Missing and 1 partial ⚠️
catalog/tables.go 0.00% 19 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #15      +/-   ##
==========================================
+ Coverage   37.79%   38.27%   +0.48%     
==========================================
  Files          24       24              
  Lines        1593     1586       -7     
==========================================
+ Hits          602      607       +5     
+ Misses        875      858      -17     
- Partials      116      121       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@winebarrel winebarrel merged commit de97247 into main May 2, 2026
4 checks passed
@winebarrel winebarrel deleted the vitess-parser branch May 2, 2026 04:55
winebarrel added a commit that referenced this pull request May 2, 2026
Three inline comments — one is a real ordering bug, two are leftover
edge cases for the line-scan comment skipper.

diff/tables.go + diff_all.go (BUG fix)
  - TableDiffResult gains a RenameStmts bucket. Table renames
    (`ALTER TABLE … RENAME TO …`) now go there instead of into
    Stmts, and diff_all.go schedules them between view drops and
    FK drops. Without this, a migration that both renames a table
    and drops an FK on it would emit:
        ALTER TABLE new_name DROP FOREIGN KEY fk;   -- (FK drops first)
        ALTER TABLE old_name RENAME TO new_name;
    The FK drop runs against new_name which doesn't exist yet, and
    apply errors out. Renames must precede FK drops so subsequent
    ALTERs target an existing name. (Copilot #3.)

parser/directive.go
  - stripLeadingBlockComment helper handles the `/* note */ <sql>`
    shape: a line that starts with a single-line block comment but
    has SQL after `*/`. Both ExtractStmtRenameFrom and
    ExtractInlineRenames now call it. For stmt extraction, hitting
    SQL after the comment stops the leading-block scan (the `*/`
    line is the first SQL line). For inline extraction, the SQL
    remainder is reprocessed as the target line so a pending
    directive can attach to it (or trip sawSQL on the opener).
    (Copilot #1, #2.)

Tests
  - Existing tests using res.Stmts for table renames updated to use
    res.RenameStmts.
  - YAML apply fixtures (rename_table, rename_column, rename_index)
    continue to pass — diff_all.go emits the rename in the expected
    spot in the merged plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant