Skip to content

Bug Report: Expression evaluated in EvalEngine lacks encoding information on result #13493

@dbussink

Description

@dbussink

Overview of the Issue

When a query is executed that can be directly evaluated by the evalengine without having to go to a vttablet instance, we lack encoding information in the result. The result is also not transcoded to the proper connection character set / collation either.

Reproduction Steps

Run the following query against Vitess, see also the separate --column-type-info flag to show the returned type information (including the collation).

Vitess

mysql -u root --socket=/tmp/mysql.sock --column-type-info

mysql> select 'test';
Field   1:  `:vtg1 /* VARCHAR */`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  ? (0)
Length:     0
Max_length: 4
Decimals:   0
Flags:      


+---------------------+
| :vtg1 /* VARCHAR */ |
+---------------------+
| test                |
+---------------------+
1 row in set (0.00 sec)

What can be seen here, is that the Collation information is lacking. When running this against MySQL:

MySQL

mysql> select 'test';
Field   1:  `test`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8mb4_0900_ai_ci (255)
Length:     16
Max_length: 4
Decimals:   31
Flags:      NOT_NULL 


+------+
| test |
+------+
| test |
+------+
1 row in set (0.00 sec)

Here it can be seen that the collation is properly set with utf8mb4_0900_ai_ci. This bug can break clients that depend on the collation information to determine if things are internally a UTF8 string or binary data.

This bug though hides another underlying bug. It's not just the collation that is not set, it's also not transcoding to that client collation. Take a look at the following example in MySQL:

MySQL

mysql> select _latin1 0xff, collation(_latin1 0xff);
Field   1:  `_latin1 0xff`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8mb4_0900_ai_ci (255)
Length:     4
Max_length: 2
Decimals:   31
Flags:      NOT_NULL 

Field   2:  `collation(_latin1 0xff)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8mb4_0900_ai_ci (255)
Length:     256
Max_length: 17
Decimals:   31
Flags:      


+--------------+-------------------------+
| _latin1 0xff | collation(_latin1 0xff) |
+--------------+-------------------------+
| ÿ            | latin1_swedish_ci       |
+--------------+-------------------------+
1 row in set (0.00 sec)

What can be seen here, is that even though the expression _latin1 0xff has a collation of latin1_swedish_ci, it is translated to utf8mb4_0900_ai_ci as that is the connection collation. So the output here is a UTF-8 character of ÿ (which is encoded in latin1 the value 255).

When running this against Vitess:

Vitess

mysql> select _latin1 0xff, collation(_latin1 0xff);
Field   1:  `_latin1 :vtg1 /* HEXNUM */`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  ? (0)
Length:     0
Max_length: 4
Decimals:   0
Flags:      

Field   2:  `collation(_latin1 :vtg1 /* HEXNUM */)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  ? (0)
Length:     0
Max_length: 17
Decimals:   0
Flags:      


+----------------------------+---------------------------------------+
| _latin1 :vtg1 /* HEXNUM */ | collation(_latin1 :vtg1 /* HEXNUM */) |
+----------------------------+---------------------------------------+
| 0xff                       | latin1_swedish_ci                     |
+----------------------------+---------------------------------------+
1 row in set (0.00 sec)

Here you can see that while Vitess knows the correct collation of the expression, it doesn't actually transcode the value when returning it. It also doesn't show the collation information for the expression. So there's another deeper bug that we don't do this transcoding for results.

Binary Version

All known Vitess versions

Operating System and Environment details

-

Log Fragments

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions