Overview of the Issue
When a query is executed that can be directly evaluated by the evalengine without having to go to a vttablet instance, we lack encoding information in the result. The result is also not transcoded to the proper connection character set / collation either.
Reproduction Steps
Run the following query against Vitess, see also the separate --column-type-info flag to show the returned type information (including the collation).
Vitess
mysql -u root --socket=/tmp/mysql.sock --column-type-info
mysql> select 'test';
Field 1: `:vtg1 /* VARCHAR */`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: VAR_STRING
Collation: ? (0)
Length: 0
Max_length: 4
Decimals: 0
Flags:
+---------------------+
| :vtg1 /* VARCHAR */ |
+---------------------+
| test |
+---------------------+
1 row in set (0.00 sec)
What can be seen here, is that the Collation information is lacking. When running this against MySQL:
MySQL
mysql> select 'test';
Field 1: `test`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: VAR_STRING
Collation: utf8mb4_0900_ai_ci (255)
Length: 16
Max_length: 4
Decimals: 31
Flags: NOT_NULL
+------+
| test |
+------+
| test |
+------+
1 row in set (0.00 sec)
Here it can be seen that the collation is properly set with utf8mb4_0900_ai_ci. This bug can break clients that depend on the collation information to determine if things are internally a UTF8 string or binary data.
This bug though hides another underlying bug. It's not just the collation that is not set, it's also not transcoding to that client collation. Take a look at the following example in MySQL:
MySQL
mysql> select _latin1 0xff, collation(_latin1 0xff);
Field 1: `_latin1 0xff`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: VAR_STRING
Collation: utf8mb4_0900_ai_ci (255)
Length: 4
Max_length: 2
Decimals: 31
Flags: NOT_NULL
Field 2: `collation(_latin1 0xff)`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: VAR_STRING
Collation: utf8mb4_0900_ai_ci (255)
Length: 256
Max_length: 17
Decimals: 31
Flags:
+--------------+-------------------------+
| _latin1 0xff | collation(_latin1 0xff) |
+--------------+-------------------------+
| ÿ | latin1_swedish_ci |
+--------------+-------------------------+
1 row in set (0.00 sec)
What can be seen here, is that even though the expression _latin1 0xff has a collation of latin1_swedish_ci, it is translated to utf8mb4_0900_ai_ci as that is the connection collation. So the output here is a UTF-8 character of ÿ (which is encoded in latin1 the value 255).
When running this against Vitess:
Vitess
mysql> select _latin1 0xff, collation(_latin1 0xff);
Field 1: `_latin1 :vtg1 /* HEXNUM */`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: VAR_STRING
Collation: ? (0)
Length: 0
Max_length: 4
Decimals: 0
Flags:
Field 2: `collation(_latin1 :vtg1 /* HEXNUM */)`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: VAR_STRING
Collation: ? (0)
Length: 0
Max_length: 17
Decimals: 0
Flags:
+----------------------------+---------------------------------------+
| _latin1 :vtg1 /* HEXNUM */ | collation(_latin1 :vtg1 /* HEXNUM */) |
+----------------------------+---------------------------------------+
| 0xff | latin1_swedish_ci |
+----------------------------+---------------------------------------+
1 row in set (0.00 sec)
Here you can see that while Vitess knows the correct collation of the expression, it doesn't actually transcode the value when returning it. It also doesn't show the collation information for the expression. So there's another deeper bug that we don't do this transcoding for results.
Binary Version
All known Vitess versions
Operating System and Environment details
Log Fragments
No response
Overview of the Issue
When a query is executed that can be directly evaluated by the
evalenginewithout having to go to avttabletinstance, we lack encoding information in the result. The result is also not transcoded to the proper connection character set / collation either.Reproduction Steps
Run the following query against Vitess, see also the separate
--column-type-infoflag to show the returned type information (including the collation).Vitess
What can be seen here, is that the
Collationinformation is lacking. When running this against MySQL:MySQL
Here it can be seen that the collation is properly set with
utf8mb4_0900_ai_ci. This bug can break clients that depend on the collation information to determine if things are internally a UTF8 string or binary data.This bug though hides another underlying bug. It's not just the collation that is not set, it's also not transcoding to that client collation. Take a look at the following example in MySQL:
MySQL
What can be seen here, is that even though the expression
_latin1 0xffhas a collation oflatin1_swedish_ci, it is translated toutf8mb4_0900_ai_cias that is the connection collation. So the output here is a UTF-8 character ofÿ(which is encoded inlatin1the value255).When running this against Vitess:
Vitess
Here you can see that while Vitess knows the correct collation of the expression, it doesn't actually transcode the value when returning it. It also doesn't show the collation information for the expression. So there's another deeper bug that we don't do this transcoding for results.
Binary Version
Operating System and Environment details
Log Fragments
No response