Skip to content

Refactor SQL string parsing to utilize Vitess formatting#16

Merged
winebarrel merged 2 commits into
mainfrom
vitess-parser
May 2, 2026
Merged

Refactor SQL string parsing to utilize Vitess formatting#16
winebarrel merged 2 commits into
mainfrom
vitess-parser

Conversation

@winebarrel

Copy link
Copy Markdown
Owner

This pull request refactors how default values, comments, and type strings are parsed and normalized in the catalog and parser packages. The main goal is to improve compatibility with the Vitess SQL parser, ensure correct round-tripping of schema information, and simplify string handling. The most important changes are grouped below:

Parser compatibility and normalization improvements:

  • Refactored normalizeColumnDefault in catalog/tables.go to use Vitess's parser for classifying default values, quoting only when necessary (e.g., unquoted string literals that Vitess would otherwise parse as column references). This approach is more robust and future-proof, with a note about possible future type-specific handling.
  • Updated import statements to remove unused strconv from catalog/tables.go and add it to parser/parser.go where now needed. [1] [2]

String handling simplification:

  • Changed comment parsing in parseColumnDef and table option parsing in applyTableOption to use the unquoted, escape-resolved value from Vitess's Literal.Val instead of manually unquoting or using sqlparser.String. [1] [2]
  • Removed the now-unnecessary unquote helper function from parser/parser.go.

Type string formatting:

  • Simplified columnTypeString to leverage Vitess's formatting, stripping out options and charset/collation fields before formatting to match the catalog's output. Also normalizes enum/set value lists to remove extra spaces for consistency with MySQL's information_schema.

winebarrel and others added 2 commits May 2, 2026 13:56
Rather than reconstructing each column type manually from
ColumnType.{Type, Length, Scale, Unsigned, Zerofill, EnumValues},
temporarily blank out Options (NOT NULL / AUTO_INCREMENT / DEFAULT …)
and Charset (CHARACTER SET / COLLATE) on the node and call
sqlparser.String. The vitess formatter handles every column type and
modifier we care about; we only need to squeeze the extra space inside
ENUM/SET value lists ("'a', 'b'" → "'a','b'") to match
information_schema's COLUMN_TYPE format.

Smaller, fewer cases, and any future ColumnType fields vitess decides
to support get picked up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… do it

Three follow-up cleanups in the same spirit as the columnTypeString
swap:

- parser.parseColumnDef: opts.Comment is *sqlparser.Literal, and
  Literal.Val is the unquoted, escape-resolved value. Drop the
  sqlparser.String + TrimPrefix/TrimSuffix dance and read .Val
  directly. Same for applyTableOption COMMENT.
- parser.applyTableOption AUTO_INCREMENT: replace fmt.Sscanf("%d")
  with strconv.ParseUint(..., 10, 64) — same range, explicit error
  handling.
- catalog.normalizeColumnDefault: instead of enumerating string-like
  column types and applying ad-hoc heuristics (already-quoted? has
  parens? parses as float? matches CURRENT_*/NULL/TRUE/FALSE?),
  parse `SELECT <def>` with vitess and check the resulting expression
  shape. If it's a *ColName, MySQL stored a quoteless string literal
  → wrap. Anything else (Literal, NullVal, BoolVal, CurTimeFuncExpr,
  FuncExpr, BinaryExpr, …) is already valid SQL and passes through.
  The typeName parameter is kept on the signature for forward-
  compatibility (BIT/BLOB defaults may need special handling later).

Removes the unquote helper that's no longer used. Net diff: -8 lines,
fewer special cases, more correct (handles e.g. embedded escaped
quotes that the previous unquote got wrong).

All tests / scenarios / sample-DB round-trips still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@winebarrel winebarrel enabled auto-merge May 2, 2026 05:04
@winebarrel winebarrel merged commit 6c4ffd0 into main May 2, 2026
2 checks passed
@winebarrel winebarrel deleted the vitess-parser branch May 2, 2026 05:06
@codecov

codecov Bot commented May 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 26.08696% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.66%. Comparing base (038819c) to head (d378397).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
catalog/tables.go 0.00% 11 Missing ⚠️
parser/parser.go 50.00% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   38.27%   38.66%   +0.39%     
==========================================
  Files          24       24              
  Lines        1586     1570      -16     
==========================================
  Hits          607      607              
+ Misses        858      843      -15     
+ Partials      121      120       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

winebarrel added a commit that referenced this pull request May 2, 2026
Three inline comments — two on the line-classification flow, one on
backtick-quoted identifier handling.

parser/directive.go
  - reduceLeadingBlocks() iteratively strips zero-or-more leading
    `/* … */` blocks from a line, returning either the remainder or
    a flag indicating the block didn't close on this line. Both
    ExtractStmtRenameFrom and ExtractInlineRenames now feed every
    line through it before classification, so:
      * `/* header */ -- myschema:renamed-from old` (block then
        directive on the same line) is recognised as a directive,
        not lost as "first SQL line".
      * `/* note */ /* note2 */ name VARCHAR(64)` (two leading
        blocks) reduces correctly.
    The old single-shot stripLeadingBlockComment helper that ran
    only after a dedicated `/* …` branch is removed in favour of
    this unified reduce-then-classify flow. (Copilot #1, #2.)

  - classifyInlineLine grew a backtick-aware fallback path. Index
    and constraint names that are backtick-quoted AND contain
    whitespace (e.g. `KEY \`weird name\` (col)`) get split by
    strings.Fields and were mis-classified, surfacing as
    "target index not found". Two new helpers run before the
    Fields-based path:
      * backtickedNameAfterPrefix(line, prefixes) tries to match
        any of the given keyword prefixes followed by a backticked
        identifier and returns the unquoted name.
      * stripUntilAfterBacktickedName lets the CONSTRAINT branch
        peek past the backticked name to detect a trailing `UNIQUE`
        and route to inlineKindIndex.
    These cover KEY / INDEX / UNIQUE [KEY|INDEX] /
    FULLTEXT [KEY|INDEX] / SPATIAL [KEY|INDEX] / CONSTRAINT shapes
    with backticked-with-whitespace names. (Copilot #3.)

Tests
  - parser/directive_test.go:
    * TestExtractStmtRenameFromBlockCommentBeforeDirectiveSameLine
    * TestExtractInlineRenamesBlockCommentBeforeDirectiveSameLine
    * TestExtractInlineRenamesBacktickedIndexNameWithSpace (KEY,
      UNIQUE KEY, and CONSTRAINT-with-CHECK forms)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant