Refactor SQL string parsing to utilize Vitess formatting#16
Merged
Conversation
Rather than reconstructing each column type manually from
ColumnType.{Type, Length, Scale, Unsigned, Zerofill, EnumValues},
temporarily blank out Options (NOT NULL / AUTO_INCREMENT / DEFAULT …)
and Charset (CHARACTER SET / COLLATE) on the node and call
sqlparser.String. The vitess formatter handles every column type and
modifier we care about; we only need to squeeze the extra space inside
ENUM/SET value lists ("'a', 'b'" → "'a','b'") to match
information_schema's COLUMN_TYPE format.
Smaller, fewer cases, and any future ColumnType fields vitess decides
to support get picked up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… do it
Three follow-up cleanups in the same spirit as the columnTypeString
swap:
- parser.parseColumnDef: opts.Comment is *sqlparser.Literal, and
Literal.Val is the unquoted, escape-resolved value. Drop the
sqlparser.String + TrimPrefix/TrimSuffix dance and read .Val
directly. Same for applyTableOption COMMENT.
- parser.applyTableOption AUTO_INCREMENT: replace fmt.Sscanf("%d")
with strconv.ParseUint(..., 10, 64) — same range, explicit error
handling.
- catalog.normalizeColumnDefault: instead of enumerating string-like
column types and applying ad-hoc heuristics (already-quoted? has
parens? parses as float? matches CURRENT_*/NULL/TRUE/FALSE?),
parse `SELECT <def>` with vitess and check the resulting expression
shape. If it's a *ColName, MySQL stored a quoteless string literal
→ wrap. Anything else (Literal, NullVal, BoolVal, CurTimeFuncExpr,
FuncExpr, BinaryExpr, …) is already valid SQL and passes through.
The typeName parameter is kept on the signature for forward-
compatibility (BIT/BLOB defaults may need special handling later).
Removes the unquote helper that's no longer used. Net diff: -8 lines,
fewer special cases, more correct (handles e.g. embedded escaped
quotes that the previous unquote got wrong).
All tests / scenarios / sample-DB round-trips still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16 +/- ##
==========================================
+ Coverage 38.27% 38.66% +0.39%
==========================================
Files 24 24
Lines 1586 1570 -16
==========================================
Hits 607 607
+ Misses 858 843 -15
+ Partials 121 120 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
winebarrel
added a commit
that referenced
this pull request
May 2, 2026
Three inline comments — two on the line-classification flow, one on
backtick-quoted identifier handling.
parser/directive.go
- reduceLeadingBlocks() iteratively strips zero-or-more leading
`/* … */` blocks from a line, returning either the remainder or
a flag indicating the block didn't close on this line. Both
ExtractStmtRenameFrom and ExtractInlineRenames now feed every
line through it before classification, so:
* `/* header */ -- myschema:renamed-from old` (block then
directive on the same line) is recognised as a directive,
not lost as "first SQL line".
* `/* note */ /* note2 */ name VARCHAR(64)` (two leading
blocks) reduces correctly.
The old single-shot stripLeadingBlockComment helper that ran
only after a dedicated `/* …` branch is removed in favour of
this unified reduce-then-classify flow. (Copilot #1, #2.)
- classifyInlineLine grew a backtick-aware fallback path. Index
and constraint names that are backtick-quoted AND contain
whitespace (e.g. `KEY \`weird name\` (col)`) get split by
strings.Fields and were mis-classified, surfacing as
"target index not found". Two new helpers run before the
Fields-based path:
* backtickedNameAfterPrefix(line, prefixes) tries to match
any of the given keyword prefixes followed by a backticked
identifier and returns the unquoted name.
* stripUntilAfterBacktickedName lets the CONSTRAINT branch
peek past the backticked name to detect a trailing `UNIQUE`
and route to inlineKindIndex.
These cover KEY / INDEX / UNIQUE [KEY|INDEX] /
FULLTEXT [KEY|INDEX] / SPATIAL [KEY|INDEX] / CONSTRAINT shapes
with backticked-with-whitespace names. (Copilot #3.)
Tests
- parser/directive_test.go:
* TestExtractStmtRenameFromBlockCommentBeforeDirectiveSameLine
* TestExtractInlineRenamesBlockCommentBeforeDirectiveSameLine
* TestExtractInlineRenamesBacktickedIndexNameWithSpace (KEY,
UNIQUE KEY, and CONSTRAINT-with-CHECK forms)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors how default values, comments, and type strings are parsed and normalized in the catalog and parser packages. The main goal is to improve compatibility with the Vitess SQL parser, ensure correct round-tripping of schema information, and simplify string handling. The most important changes are grouped below:
Parser compatibility and normalization improvements:
normalizeColumnDefaultincatalog/tables.goto use Vitess's parser for classifying default values, quoting only when necessary (e.g., unquoted string literals that Vitess would otherwise parse as column references). This approach is more robust and future-proof, with a note about possible future type-specific handling.strconvfromcatalog/tables.goand add it toparser/parser.gowhere now needed. [1] [2]String handling simplification:
parseColumnDefand table option parsing inapplyTableOptionto use the unquoted, escape-resolved value from Vitess'sLiteral.Valinstead of manually unquoting or usingsqlparser.String. [1] [2]unquotehelper function fromparser/parser.go.Type string formatting:
columnTypeStringto leverage Vitess's formatting, stripping out options and charset/collation fields before formatting to match the catalog's output. Also normalizes enum/set value lists to remove extra spaces for consistency with MySQL's information_schema.