Skip to content

Fix problems when altering a column from binary to varbinary#1628

Merged
meiji163 merged 4 commits intogithub:masterfrom
jorendorff:fix-binary-column-padding
Feb 10, 2026
Merged

Fix problems when altering a column from binary to varbinary#1628
meiji163 merged 4 commits intogithub:masterfrom
jorendorff:fix-binary-column-padding

Conversation

@jorendorff
Copy link
Contributor

@jorendorff jorendorff commented Feb 9, 2026

Description

Fixes #1627.

The issue is that binary values are incorrectly read from the binlog. This PR fixes it by changing a previously-implemented fix to apply to all binary columns, not only primary keys.

Two new tests are added: a unit test for the function being changed, and a localtest. (Oddly, the existing test in localtests/varbinary doesn't seem to actually involve any varbinary columns.)

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

I wasn't able to figure out how to run script/build or script/cibuild. They both try to download and use a vendored Go installation, but this doesn't work for me. Figured it out, it was trivial - my version of Go was too new.

MySQL's binlog strips trailing 0x00 bytes from binary(N) columns.
PR github#915 fixed this for unique key columns only, but the same issue
affects all binary columns in INSERT/UPDATE operations.

Remove the isUniqueKeyColumn condition so all binary(N) columns
are padded to their declared length.

Fixes a variation of github#909 where the affected column is not a primary key.
Copilot AI review requested due to automatic review settings February 9, 2026 23:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses incorrect handling of BINARY(N) values read from MySQL binlog events during migrations that alter a column to VARBINARY(M), where trailing 0x00 bytes can be stripped and thus written incorrectly to the ghost table. The fix generalizes the existing padding logic so it applies to all binary columns, not just unique-key columns.

Changes:

  • Apply trailing-zero padding for all BinaryColumnType columns during argument conversion (not only unique key columns).
  • Add unit tests covering binary padding behavior in convertArg.
  • Add a new local integration test (localtests/binary-to-varbinary) to reproduce the binary -> varbinary binlog truncation scenario.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
go/sql/types.go Expands binary padding logic to apply to all binary columns.
go/sql/types_test.go Adds unit tests verifying padding behavior for truncated/full binary values.
localtests/binary-to-varbinary/create.sql Creates a repro scenario with binlog-driven inserts/updates involving trailing-zero binary values.
localtests/binary-to-varbinary/extra_args Supplies an --alter that changes data from binary(20) to varbinary(32).
Comments suppressed due to low confidence (1)

go/sql/types.go:87

  • The padding logic builds a bytes.Buffer directly from arg2Bytes. Since bytes.NewBuffer can reuse the provided slice as its backing store, Write may mutate the input byte slice in-place (depending on capacity). Now that this code runs for all binary columns, it would be safer to avoid mutating the input by copying into a new buffer/slice before padding (and ideally keep the return type as []byte to avoid unnecessary string conversions).
				buf := bytes.NewBuffer(arg2Bytes)
				for i := uint(0); i < (this.BinaryOctetLength - uint(size)); i++ {
					buf.Write([]byte{0})
				}
				arg = buf.String()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

In this case, the input is binary, and the column type is `binary`. So the output should be binary,
not text.
@meiji163
Copy link
Contributor

meiji163 commented Feb 10, 2026

Thanks for the fix. I do wonder why the author of #915 chose to apply the padding only to the unique key 🤔

@meiji163 meiji163 merged commit aadbb79 into github:master Feb 10, 2026
9 checks passed
@jorendorff jorendorff deleted the fix-binary-column-padding branch February 10, 2026 18:16
@look
Copy link
Member

look commented Feb 10, 2026

Thanks for the fix. I do wonder why the author of #915 chose to apply the padding only to the unique key 🤔

I think this comment has a clue:

Filling a high number zero bytes may affect processing efficiency. after thinking about it, we can fix it more lighter as below, other scenarios will not cause data consistency problems. 40bc5ae

  • only handle the binary column in the unique key
  • only handle update/delete dml event

Guess that wasn't quite true...

meiji163 pushed a commit that referenced this pull request Feb 23, 2026
* Fix binary column trailing zero stripping for non-key columns

MySQL's binlog strips trailing 0x00 bytes from binary(N) columns.
PR #915 fixed this for unique key columns only, but the same issue
affects all binary columns in INSERT/UPDATE operations.

Remove the isUniqueKeyColumn condition so all binary(N) columns
are padded to their declared length.

Fixes a variation of #909 where the affected column is not a primary key.

* Simplify by removing isUniqueKeyColumn now that it's no longer used.

* In convertArg, don't convert binary data to strings.

In this case, the input is binary, and the column type is `binary`. So the output should be binary,
not text.

* fix a lint
meiji163 added a commit that referenced this pull request Mar 5, 2026
* Execute hook on every batch insert retry

Co-authored-by: Bastian Bartmann <bastian.bartmann@shopify.com>

* Expose the last error message to the onBatchCopyRetry hook

Co-authored-by: Bastian Bartmann <bastian.bartmann@shopify.com>

* Remove double retries

CalculateNextIterationRangeEndValues needs to be recomputed on every retry in case of configuration (e.g. chunk-size)
changes were made by onBatchCopyRetry hooks.

* include dev.yml (temp for Shopify)

* Update doc/hooks.md

* Remove dev.yml

* Fix retry issue where MigrationIterationRangeMinValues advances before insert completes

- extract MigrationContext.SetNextIterationRangeValues outside of applyCopyRowsFunc, so that it doesn't run on retries
- add an integration test for Migrator with retry hooks

Co-authored-by: Bastian Bartmann <bastian.bartmann@shopify.com>

* Add localtest that expects gh-ost to fail on exhausted retries

* Rename method

* fmt and lint

* gofmt

* Fix problems when altering a column from `binary` to `varbinary` (#1628)

* Fix binary column trailing zero stripping for non-key columns

MySQL's binlog strips trailing 0x00 bytes from binary(N) columns.
PR #915 fixed this for unique key columns only, but the same issue
affects all binary columns in INSERT/UPDATE operations.

Remove the isUniqueKeyColumn condition so all binary(N) columns
are padded to their declared length.

Fixes a variation of #909 where the affected column is not a primary key.

* Simplify by removing isUniqueKeyColumn now that it's no longer used.

* In convertArg, don't convert binary data to strings.

In this case, the input is binary, and the column type is `binary`. So the output should be binary,
not text.

* fix a lint

* Fix 4 trigger handling bugs (#1626)

* fix: remove double-transformation in trigger length validation

ValidateGhostTriggerLengthBelowMaxLength was calling GetGhostTriggerName
on an already-transformed name, adding the suffix twice. This caused valid
trigger names (ghost name <= 64 chars) to be falsely rejected.

The caller in inspect.go:627 already transforms the name via
GetGhostTriggerName before passing it, so the validation function
should check the length as-is.

Unit tests updated to reflect the correct call pattern: transform first
with GetGhostTriggerName, then validate the result. Added boundary tests
for exactly 64 and 65 char names.

* fix: return error from trigger creation during atomic cut-over

During atomic cut-over, if CreateTriggersOnGhost failed, the error was
logged but not returned. The migration continued and completed without
triggers, silently losing them.

The two-step cut-over (line 793) already correctly returns the error.
This aligns the atomic cut-over to do the same.

* fix: check trigger name uniqueness per schema, not per table

validateGhostTriggersDontExist was filtering by event_object_table,
only checking if the ghost trigger name existed on the original table.
MySQL trigger names are unique per schema, so a trigger with the same
name on any other table would block CREATE TRIGGER but pass validation.

Remove the event_object_table filter to check trigger_name + trigger_schema
only, matching MySQL's uniqueness constraint.

* fix: use parameterized query in GetTriggers to prevent SQL injection

GetTriggers used fmt.Sprintf with string interpolation for database and
table names, causing SQL syntax errors with special characters and
potential SQL injection. Switched to parameterized query with ? placeholders,
matching the safe pattern already used in inspect.go:553-559.

* test: add regression tests for trigger handling bugs

Add two integration tests:
- trigger-long-name-validation: verifies 60-char trigger names
  (64-char ghost name) are not falsely rejected by double-transform
- trigger-ghost-name-conflict: verifies validation detects ghost
  trigger name conflicts on other tables in the same schema

* style: gofmt context_test.go

---------

Co-authored-by: Yakir Gibraltar <yakir.g@taboola.com>
Co-authored-by: meiji163 <meiji163@github.com>

* fix update of LastIterationRange values

---------

Co-authored-by: Jan Grodowski <jan.grodowski@shopify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Errors after altering a column from binary to varbinary

4 participants