catalog/diff: surface CHARACTER SET / COLLATE diffs at table and column level#40
Conversation
…mn level Closes the Medium-priority "charset diff at the table and column level" entry. Both layers were already loaded into model.Table / model.Column but the diff intentionally skipped them — silent drift on every CHARSET change. Diff - diff/tables.go now emits `ALTER TABLE … DEFAULT CHARSET=… [COLLATE=…]` when desired and current diverge. Engine and Comment stay un-diffed (separate scope). - columnEqual no longer skips CharacterSet / Collation. They go through the same normalisation on both sides so an "inherited" column reads as nil → equal to a parser-side column the user didn't spell out. Normalisation (parser ↔ catalog symmetry) - New model/charset.go owns the MySQL 8.0+ charset → default-collation table and the CollapseDefaultCollation helper. Both sides call it so `CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci` and bare `CHARSET=utf8mb4` describe the same MySQL state. - catalog joins information_schema.COLLATIONS to populate Table.Charset from TABLE_COLLATION (single source of truth, no fragile prefix splitting). Per-column CHARACTER_SET_NAME / COLLATION_NAME that match either the table default or the charset's default collation collapse to nil. - parser/parseColumnDef now reads cd.Type.Charset.Name (vitess holds it on a separate struct, not in opts) so column-level CHARACTER SET isn't dropped on the floor. Previously parser saw it as nil and the catalog as non-nil → endless MODIFY COLUMN drift. Documented limitation - AGENTS.md operational rules + new CAVEATS.md section pin the one rough edge: changing DEFAULT CHARSET on a table with pre-existing string columns converges in two applies. The first changes the table default; the second emits MODIFY COLUMN per string column to inherit the new default (MySQL freezes the old charset onto each COLUMN row when the table default flips). The change_table_charset fixture has verify_no_drift: false to honour this with the reasoning spelled out in the header comment. - A future `--convert-charset` flag / directive can opt into single- apply convergence via `ALTER TABLE … CONVERT TO CHARACTER SET …`; filed as a Medium TODO. Tests - catalog: existing tests survive thanks to the symmetric normalisation. - parser: TestParseTableOptions updated (utf8mb4_0900_ai_ci collapses to nil); TestParseTableOptionsExplicitNonDefaultCollation pins that a non-default collation does survive. - end-to-end: testdata/apply/change_column_charset.yml (single MODIFY COLUMN, verify_no_drift: true), testdata/apply/ change_table_charset.yml (single ALTER TABLE, verify_no_drift: false documents the two-apply convergence). - testdata/dump fixtures (4 of them) updated for the new shape: table-level DEFAULT CHARSET=utf8mb4 instead of per-column CHARACTER SET clauses + table-only COLLATE. Verified live on MySQL 8.0 + 9.4: round-trip plan / apply / dump, scenario suite, lint all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #40 +/- ##
==========================================
- Coverage 74.02% 73.58% -0.45%
==========================================
Files 27 28 +1
Lines 2264 2347 +83
==========================================
+ Hits 1676 1727 +51
- Misses 425 453 +28
- Partials 163 167 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PR #40's first cut tested the charset/collation behaviour end-to-end through the YAML harness, which left two pieces under-pinned: - model/charset.go (new file) had no direct unit test, so a future refactor of DefaultCollationOf / CollapseDefaultCollation could silently change the table or the collapse semantics. - parseColumnDef now reads cd.Type.Charset.Name from vitess. That's the exact code path that was broken before this PR (parser saw nil while catalog returned the column charset, leading to endless MODIFY COLUMN drift), so it deserves a direct regression test rather than relying on the YAML harness alone. Add: - TestDefaultCollationOf — covers the four most-used charsets, the utf8 ↔ utf8mb3 alias, and the "unknown charset → empty string" contract. - TestCollapseDefaultCollation — covers both-nil, one-nil, default- match (collapsed), non-default (kept), unrelated charset's default (kept), and unknown-charset (kept) shapes. - TestParseColumnCharacterSet — pins that explicit `CHARACTER SET` on a column lands on model.Column.CharacterSet, that the COLLATE half is captured when given, that a non-default collation survives, and that the charset's default collation is collapsed to nil on the parser side too. Comment block calls out why this path matters (the catalog-vs-parser drift it prevents). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Surfaces table- and column-level MySQL CHARACTER SET / COLLATE drift in myschema by loading/normalizing charset+collation consistently in parser and catalog, and emitting the appropriate ALTER TABLE / MODIFY COLUMN statements.
Changes:
- Add shared default-collation collapsing (
model.CollapseDefaultCollation) and apply it on both parser and catalog paths so implicit MySQL defaults don’t cause churn. - Emit table-level charset/collation diffs (
ALTER TABLE … DEFAULT CHARSET=… [COLLATE=…]) and include column charset/collation in column equality comparisons to surface real drift. - Update end-to-end fixtures and docs to reflect the new round-trip shape and the documented two-apply convergence behavior for table default charset changes.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
catalog/tables.go |
Load table charset via information_schema.COLLATIONS join and normalize column/table charset+collation inheritance/defaults. |
diff/tables.go |
Emit table-level charset DDL and compare column charset/collation in columnEqual. |
parser/parser.go |
Parse column CHARACTER SET from vitess AST and collapse default collations on parsed tables/columns. |
parser/parser_test.go |
Adjust table-option expectations for collapsed default collation and add explicit non-default collation test. |
model/charset.go |
Introduce default-collation lookup table and CollapseDefaultCollation helper shared across layers. |
testdata/apply/change_table_charset.yml |
Add fixture for table default charset change (documented 2-apply convergence). |
testdata/apply/change_column_charset.yml |
Add fixture for explicit per-column charset change (single-apply convergence). |
testdata/dump/basic.yml |
Update dump output expectations to prefer table-level DEFAULT CHARSET over redundant per-column clauses. |
testdata/dump/include_filter.yml |
Same as above for include-filter dump output. |
testdata/dump/view.yml |
Same as above for view-related dump output. |
testdata/dump/with_index_and_fk.yml |
Same as above for tables with indexes/FKs. |
TODO.md |
Replace previous charset/collation diff TODO with a follow-up “one-shot CONVERT” TODO. |
CAVEATS.md |
Document the two-apply convergence behavior when changing table default charset with existing string columns. |
AGENTS.md |
Add operational note pointing to the charset two-apply caveat. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ation + map-completeness test All four nits were valid (or near-valid): 1, 2. Parser preserved the user's original SQL casing, so `CHARSET=UTF8MB4` would never compare equal to the catalog's canonical lower-case `utf8mb4` and tables/columns would diff forever. Lower-case the column-level CHARACTER SET, the column- level COLLATE, and both halves of the table-level option in parser/parser.go so parser-side and catalog-side values are in the same form before any comparison. 3. `model.DefaultCollationOf` and `CollapseDefaultCollation` did a direct map lookup, so a mixed-case input from any future caller would silently bypass normalisation. Lower-case `charset` inside DefaultCollationOf and use `strings.EqualFold` for the coll comparison in CollapseDefaultCollation. Defensive — the parser already lower-cases above, but a pure helper that depends on call-site discipline is fragile. 4. Reviewer claimed `cp874` and `latin9` were missing from the default-collations map, which would cause endless drift on columns using them. Verified live against MySQL 8.0 + 9.4: those charsets do not exist on either server (the map already covers all 41 stock charsets reported by information_schema). Add a `TestCharsetDefaultCollationsCoverServer` integration test that walks `information_schema.CHARACTER_SETS` and asserts every entry resolves through `model.DefaultCollationOf`. The test pins the current completeness and will fail loudly if a future MySQL release ships a new charset (or the project bumps the baseline) so the map can be filled in before drift starts. Unit-side coverage: - TestDefaultCollationOf: add mixed-case rows (`UTF8MB4`, `Latin1`) to pin case-insensitive lookup. - TestCollapseDefaultCollation: append a case-insensitive collapse case (`UTF8MB4` + `UTF8MB4_0900_AI_CI` → nil) so the EqualFold branch in the helper has direct coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… side Two related convergence bugs surfaced in the second Copilot pass; both boil down to "the desired side spelled out only one half of the charset/collation pair, but the other side carries both". #1 parser/parser.go column-level COLLATE-only Per-column collation collapse only used `c.CharacterSet`, which is nil when the user wrote just `COLLATE …` on the column. The catalog side resolves the effective charset (column charset, falling back to the table default) before collapsing, so a redundantly-spelled default collation collapsed on the catalog side but survived on the parser side, drifting forever as MODIFY COLUMN. Fix: use the same effective-charset fallback in the parser pass (column.CharacterSet ?? table.Charset) before calling CollapseDefaultCollation. #2 diff/tables.go don't-care semantics for table-level diff `tableCharsetCollationSQL` did a strict ptrEq on both fields. When desired specifies only `COLLATE=…` (Charset=nil), the catalog's non-nil Charset never matched and the same `ALTER TABLE … COLLATE=…` got emitted on every plan. Symmetrically, a desired `DEFAULT CHARSET=…` with no COLLATE would have looped if we'd taken that path on the Collation field. Fix: treat a desired field set to nil as "the user didn't say, so anything matches". The result is convergent semantics for either side specified alone, with the existing both-non-nil and both-nil paths unchanged. Tests - TestParseColumnCollateOnlyInheritsTableCharset: COLLATE-only column with the table-default charset's default collation collapses to nil; a non-default COLLATE on a COLLATE-only column survives. Pins #1. - testdata/plan/charset_collate_only_no_changes.yml: desired SQL has `COLLATE=utf8mb4_unicode_ci` only (no DEFAULT CHARSET), matches the catalog state, plan is empty. Pins #2 end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two nits in this pass: #1 model/charset.go map "missing cp1252" — false alarm. Same shape as Copilot's earlier `cp874` / `latin9` claim: the identifier doesn't actually exist in MySQL 8.0+. Verified live on 8.0 + 9.4: information_schema.CHARACTER_SETS lists cp1250, cp1251, cp1256, cp1257 — there is no cp1252. The map already covers all 41 stock charsets (verified via TestCharsetDefaultCollationsCoverServer on both servers). No code change. #2 diff/tables.go docstring inaccuracy — fixed. `tableCharsetCollationSQL`'s docstring claimed the emitted DDL was always `ALTER TABLE … DEFAULT CHARSET=… COLLATE=…`, but the function only spells out the clauses the desired side actually set (either / both, depending on what differed). Reword the docstring to match: `… DEFAULT CHARSET=…` alone, `… COLLATE=…` alone, or both together — three shapes covered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
"reset to default" semantics Two more convergence holes the previous pass left open. Both stem from over-reaching the simplifications I made earlier. #1 parser table-level COLLATE-only Last pass added effective-charset fallback for column-level COLLATE- only, but the *table-level* equivalent was still left bare: CREATE TABLE t (...) COLLATE=utf8mb4_0900_ai_ci; (no DEFAULT CHARSET) → parser had t.Charset=nil so CollapseDefaultCollation couldn't fire, while catalog (which always knows the effective charset via information_schema.COLLATIONS) did collapse — endless `ALTER TABLE … COLLATE=…` loops. Add `model.CharsetOfCollation(coll)` (splits on the first `_`, with the `binary` self-named special case) and a parser-side `effectiveCharsetForCollation` helper that walks the chain (declared charset → table default → derived from collation name) before calling CollapseDefaultCollation. Used for both table-level and column-level normalisation. #2 diff don't-care semantics was too loose Last pass treated `desired.Collation == nil` as "don't care", which silently swallowed a real intent: dropping COLLATE from desired SQL while keeping DEFAULT CHARSET means "reset to the charset's default collation". With the catch-all don't-care, that case never emitted an ALTER even when the catalog carried an explicit non-default collation. Refine the comparison: - desired.Collation != nil → strict ptrEq (unchanged). - desired.Collation == nil && desired.Charset != nil → catalog must be at default (current.Collation == nil after collapse); otherwise emit `ALTER TABLE … DEFAULT CHARSET=…` so MySQL resets. - desired.Charset == nil treated as "no opinion on charset". - Both nil → handled by the existing early return. Tests - model: TestCharsetOfCollation pins prefix split, the `binary` special case, mixed-case input, and the no-underscore fallback. - parser: TestParseTableOptionsCollateOnly has two sub-tests — the default collation collapses to nil, a non-default one survives. - diff: testdata/plan/charset_drop_explicit_collation.yml — catalog has explicit COLLATE=utf8mb4_unicode_ci, desired drops it but keeps DEFAULT CHARSET=utf8mb4 → plan emits a single `ALTER TABLE … DEFAULT CHARSET=utf8mb4` (no COLLATE). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Closes the Medium-priority "Character set / collation diff at the table and column level" entry. Both layers were already loaded into
model.Table/model.Columnbut the diff intentionally skipped them — silent drift on every CHARSET change.What changes diff-side
diff/tables.gonow emitsALTER TABLE … DEFAULT CHARSET=… [COLLATE=…]when desired and current diverge. Engine and Comment stay un-diffed (separate scope).columnEqualno longer skipsCharacterSet/Collation. They go through the same normalisation on both sides so an "inherited" column reads asnil→ equal to a parser-side column the user didn't spell out.Normalisation (parser ↔ catalog symmetry)
model/charset.goowns the MySQL 8.0+ charset → default-collation table and theCollapseDefaultCollationhelper. Both sides call it soCHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ciand bareCHARSET=utf8mb4describe the same MySQL state.catalogjoinsinformation_schema.COLLATIONSto populateTable.CharsetfromTABLE_COLLATION(single source of truth, no fragile prefix splitting). Per-columnCHARACTER_SET_NAME/COLLATION_NAMEthat match either the table default or the charset's default collation collapse tonil.parser/parseColumnDefnow readscd.Type.Charset.Name(vitess holds it on a separate struct, not inopts) so column-levelCHARACTER SETisn't dropped on the floor. Previously parser saw it asniland the catalog as non-nil → endlessMODIFY COLUMNdrift.Documented limitation
AGENTS.mdoperational rules + a newCAVEATS.mdsection pin the one rough edge: changingDEFAULT CHARSETon a table with pre-existing string columns converges in two applies. The first changes the table default; the second emitsMODIFY COLUMNper string column to inherit the new default. This is the honest expression of MySQL's own behaviour —ALTER TABLE … DEFAULT CHARSET=…doesn't rewrite per-column charset metadata. Thechange_table_charsetfixture hasverify_no_drift: falseto honour this with the reasoning in the header comment.A future
--convert-charsetflag / directive can opt into single-apply convergence viaALTER TABLE … CONVERT TO CHARACTER SET …(rewrites stored bytes, heavyweight); filed as a Medium TODO.Tests
TestParseTableOptionsupdated (utf8mb4_0900_ai_cicollapses tonil); newTestParseTableOptionsExplicitNonDefaultCollationpins that a non-default collation does survive.testdata/apply/change_column_charset.yml— singleMODIFY COLUMN,verify_no_drift: truetestdata/apply/change_table_charset.yml— singleALTER TABLE,verify_no_drift: falsedocuments the two-apply convergenceDEFAULT CHARSET=utf8mb4instead of per-columnCHARACTER SETclauses + table-onlyCOLLATE.Verified live on MySQL 8.0 + 9.4: round-trip plan / apply / dump, scenario suite, lint all green.
Test plan
make test(8.0)make test-mysql9(9.4)make test-scenario(8.0)make lint🤖 Generated with Claude Code