Refactor SQL string parsing to utilize Vitess formatting by winebarrel · Pull Request #16 · winebarrel/myschema

winebarrel · 2026-05-02T05:04:08Z

This pull request refactors how default values, comments, and type strings are parsed and normalized in the catalog and parser packages. The main goal is to improve compatibility with the Vitess SQL parser, ensure correct round-tripping of schema information, and simplify string handling. The most important changes are grouped below:

Parser compatibility and normalization improvements:

Refactored normalizeColumnDefault in catalog/tables.go to use Vitess's parser for classifying default values, quoting only when necessary (e.g., unquoted string literals that Vitess would otherwise parse as column references). This approach is more robust and future-proof, with a note about possible future type-specific handling.
Updated import statements to remove unused strconv from catalog/tables.go and add it to parser/parser.go where now needed. [1] [2]

String handling simplification:

Changed comment parsing in parseColumnDef and table option parsing in applyTableOption to use the unquoted, escape-resolved value from Vitess's Literal.Val instead of manually unquoting or using sqlparser.String. [1] [2]
Removed the now-unnecessary unquote helper function from parser/parser.go.

Type string formatting:

Simplified columnTypeString to leverage Vitess's formatting, stripping out options and charset/collation fields before formatting to match the catalog's output. Also normalizes enum/set value lists to remove extra spaces for consistency with MySQL's information_schema.

Rather than reconstructing each column type manually from ColumnType.{Type, Length, Scale, Unsigned, Zerofill, EnumValues}, temporarily blank out Options (NOT NULL / AUTO_INCREMENT / DEFAULT …) and Charset (CHARACTER SET / COLLATE) on the node and call sqlparser.String. The vitess formatter handles every column type and modifier we care about; we only need to squeeze the extra space inside ENUM/SET value lists ("'a', 'b'" → "'a','b'") to match information_schema's COLUMN_TYPE format. Smaller, fewer cases, and any future ColumnType fields vitess decides to support get picked up automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… do it Three follow-up cleanups in the same spirit as the columnTypeString swap: - parser.parseColumnDef: opts.Comment is *sqlparser.Literal, and Literal.Val is the unquoted, escape-resolved value. Drop the sqlparser.String + TrimPrefix/TrimSuffix dance and read .Val directly. Same for applyTableOption COMMENT. - parser.applyTableOption AUTO_INCREMENT: replace fmt.Sscanf("%d") with strconv.ParseUint(..., 10, 64) — same range, explicit error handling. - catalog.normalizeColumnDefault: instead of enumerating string-like column types and applying ad-hoc heuristics (already-quoted? has parens? parses as float? matches CURRENT_*/NULL/TRUE/FALSE?), parse `SELECT <def>` with vitess and check the resulting expression shape. If it's a *ColName, MySQL stored a quoteless string literal → wrap. Anything else (Literal, NullVal, BoolVal, CurTimeFuncExpr, FuncExpr, BinaryExpr, …) is already valid SQL and passes through. The typeName parameter is kept on the signature for forward- compatibility (BIT/BLOB defaults may need special handling later). Removes the unquote helper that's no longer used. Net diff: -8 lines, fewer special cases, more correct (handles e.g. embedded escaped quotes that the previous unquote got wrong). All tests / scenarios / sample-DB round-trips still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-02T05:07:33Z

Codecov Report

❌ Patch coverage is 26.08696% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.66%. Comparing base (038819c) to head (d378397).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
catalog/tables.go	0.00%	11 Missing ⚠️
parser/parser.go	50.00%	5 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   38.27%   38.66%   +0.39%     
==========================================
  Files          24       24              
  Lines        1586     1570      -16     
==========================================
  Hits          607      607              
+ Misses        858      843      -15     
+ Partials      121      120       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Three inline comments — two on the line-classification flow, one on backtick-quoted identifier handling. parser/directive.go - reduceLeadingBlocks() iteratively strips zero-or-more leading `/* … */` blocks from a line, returning either the remainder or a flag indicating the block didn't close on this line. Both ExtractStmtRenameFrom and ExtractInlineRenames now feed every line through it before classification, so: * `/* header */ -- myschema:renamed-from old` (block then directive on the same line) is recognised as a directive, not lost as "first SQL line". * `/* note */ /* note2 */ name VARCHAR(64)` (two leading blocks) reduces correctly. The old single-shot stripLeadingBlockComment helper that ran only after a dedicated `/* …` branch is removed in favour of this unified reduce-then-classify flow. (Copilot #1, #2.) - classifyInlineLine grew a backtick-aware fallback path. Index and constraint names that are backtick-quoted AND contain whitespace (e.g. `KEY \`weird name\` (col)`) get split by strings.Fields and were mis-classified, surfacing as "target index not found". Two new helpers run before the Fields-based path: * backtickedNameAfterPrefix(line, prefixes) tries to match any of the given keyword prefixes followed by a backticked identifier and returns the unquoted name. * stripUntilAfterBacktickedName lets the CONSTRAINT branch peek past the backticked name to detect a trailing `UNIQUE` and route to inlineKindIndex. These cover KEY / INDEX / UNIQUE [KEY|INDEX] / FULLTEXT [KEY|INDEX] / SPATIAL [KEY|INDEX] / CONSTRAINT shapes with backticked-with-whitespace names. (Copilot #3.) Tests - parser/directive_test.go: * TestExtractStmtRenameFromBlockCommentBeforeDirectiveSameLine * TestExtractInlineRenamesBlockCommentBeforeDirectiveSameLine * TestExtractInlineRenamesBacktickedIndexNameWithSpace (KEY, UNIQUE KEY, and CONSTRAINT-with-CHECK forms) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

winebarrel and others added 2 commits May 2, 2026 13:56

winebarrel enabled auto-merge May 2, 2026 05:04

winebarrel merged commit 6c4ffd0 into main May 2, 2026
2 checks passed

winebarrel deleted the vitess-parser branch May 2, 2026 05:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor SQL string parsing to utilize Vitess formatting#16

Refactor SQL string parsing to utilize Vitess formatting#16
winebarrel merged 2 commits into
mainfrom
vitess-parser

winebarrel commented May 2, 2026

Uh oh!

Uh oh!

codecov Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

winebarrel commented May 2, 2026

Uh oh!

Uh oh!

codecov Bot commented May 2, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant