Skip to content

make utf8mb4_unicode_ci default collation for new mysql tables#1875

Merged
dereuromark merged 2 commits intocakephp:0.nextfrom
MasterOdin:mysql_default_collation
Sep 30, 2020
Merged

make utf8mb4_unicode_ci default collation for new mysql tables#1875
dereuromark merged 2 commits intocakephp:0.nextfrom
MasterOdin:mysql_default_collation

Conversation

@MasterOdin
Copy link
Copy Markdown
Member

@MasterOdin MasterOdin commented Sep 1, 2020

closes #1763

This makes utf8mb4_general_ci the default collation for MySQL tables over utf8_general_ci. The prior default made sense 7+ years ago when computing power for servers was more limited, but nowadays, the performance gained is increasingly minimal over the necessary deal of handling the greater nuances of capitalization of different languages and their characters by default that the original utf8 schema lacked for MySQL.

@MasterOdin
Copy link
Copy Markdown
Member Author

Something I failed to think about within the original PR is that for a primary key within InnoDB for MySQL, the max number of bytes for 5.7 is 255 chars for utf8 (which use 3 bytes per character) and 191 for utf8mb4. This makes transitioning a bit more difficult for the default case of something like:

        $table = $this->table('table1', ['id' => false, 'primary_key' => ['column1']]);

        $table->addColumn('column1', 'string')
            ->addColumn('column2', 'integer')
            ->create();

Should the default of string be made 191 instead of 255? Only if it's the primary key? Require explicitly setting the length for primary key strings?

Comment thread docs/en/configuration.rst Outdated
@MasterOdin
Copy link
Copy Markdown
Member Author

For right now, I'm leaving this open, but not necessarily coming back to it for a period as I would like to re-examine how other migration software handle default limits for string / varchar type, especially as it might concern to primary keys.

@dereuromark
Copy link
Copy Markdown
Member

Just use

charset: utf8mb4
collation: utf8mb4_unicode_ci

as commented above and we can merge this

@MasterOdin MasterOdin changed the title make utf8mb4_general_ci default collation for new mysql tables make utf8mb4_unicode_ci default collation for new mysql tables Sep 30, 2020
@MasterOdin
Copy link
Copy Markdown
Member Author

@dereuromark done. I decided to just insert a note in the documentation about having to explicitly set the primary key length for MySQL 5.7 and below when using the string type with utf8mb4_unicode_ci, instead of adjusting the default limit, as I view that as being a much larger BC break as it would probably mean changing the default for all adapters and such to keep them equal. Hopefully 5.7 is fazed out at a decent clip for 8.0+ such that this isn't an issue for too long.

Said note should probably be replicated into the changelog notes as well on release of 0.13 just to increasingly drive awareness of that fact.

@dereuromark dereuromark merged commit 6ef9016 into cakephp:0.next Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants