Skip to content

Wrong behavior when handle strcmp with default collation #5366

@solotzg

Description

@solotzg

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

drop table if exists t;
create table t (a varchar(100)) CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
alter table t set tiflash replica 1;

insert into t values('1   '), ('1\n'), ('1');
select hex(min(a)) from t;

MySQL version

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.29    |
+-----------+
1 row in set (0.00 sec)

2. What did you expect to see? (Required)

MySQL [test]> select hex(min(a)) from t;
+-------------+
| hex(min(a)) |
+-------------+
| 31202020    |
+-------------+

3. What did you see instead (Required)

MySQL [test]> select hex(min(a)) from t;
+-------------+
| hex(min(a)) |
+-------------+
| 31          |
+-------------+

4. What is your TiFlash version? (Required)

master

TiDB behavior

For mysql

mysql> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.00 sec)

mysql> select strcmp('1\0', '1');
+--------------------+
| strcmp('1\0', '1') |
+--------------------+
|                 -1 |
+--------------------+
1 row in set (0.00 sec)

For tidb

MySQL [test]> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select strcmp('1\0', '1');
+--------------------+
| strcmp('1\0', '1') |
+--------------------+
|                  1 |
+--------------------+
1 row in set (0.00 sec)

MySQL cmp '1\0' and '1':

  • get min length of both string is 1;
  • '1' and '1' are equal;
  • the remain str in '1\0' is '\0', cmp '\0' with whit space ' ';
  • cmp '\0' and ' ', same like cmp 0x0 and 0x20, got -1;

According to https://docs.pingcap.com/tidb/dev/character-set-and-collation, the tidb choose to remove tail space.

Bug in tiflash

reinterpret_cast<const char *>(&parent.chars[parent.offsetAt(lhs)]), parent.sizeAt(lhs),
reinterpret_cast<const char *>(&parent.chars[parent.offsetAt(rhs)]), parent.sizeAt(rhs));

tiflash show remove tail '\0'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions