Online DDL: ensure message is valid utf8 in updateMigrationMessage()#11914
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
If we already know length to be a potential issue, should we already always truncate the string then anyway? And do that change here as well? |
go/vt/vttablet/onlineddl/executor.go
Outdated
| if err != nil { | ||
|
|
||
| update := func(message string) error { | ||
| message = strings.ToValidUTF8(message, "") |
There was a problem hiding this comment.
Do we want to use � as the replacement here since that's the canonical UTF-8 replacement character (and also what a console / log today would already print).
See also https://www.fileformat.info/info/unicode/char/fffd/index.htm
There was a problem hiding this comment.
@dbussink sounds good. I'm down for maintenance, would you like to push that change?
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
truncating length to |
|
@systay Looks like your recent fixes for the planner are triggered in the upgrade tests (from https://github.com/vitessio/vitess/actions/runs/3648727440/jobs/6203228592): Not sure if that would be fixed if this PR is updated with latest The issue is not related to the changes here in this PR though. |
| message = strings.ToValidUTF8(message, "�") | ||
| if len(message) > maxlen { | ||
| message = message[0:maxlen] | ||
| } |
There was a problem hiding this comment.
@shlomi-noach Realizing that we should change the order here I think. Since slicing this can then break the last character if the slicing happens on a word boundary and it still creates invalidly encoded data.
There was a problem hiding this comment.
Since message is a TEXT, 16383 would be a safe maxlen. But we still then need to flip the order, so first truncate and then convert to valid UTF-8. It could lead to a � at the end if then we break the last potentially valid unicode char, but I think that's a totally ok trade off if the message is that long already anyway.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
#11923 should have fixed this |
Description
Fixes #11913
This PR ensures to clean up non-utf8 characters from the error message reported in
_vt.schema_migrations. If writing the message still fails for whatever reason (the text is too long?), we opt for an alternative, generic error message.Related Issue(s)
#11913
#6926
Checklist
Deployment Notes