Skip to content

Use utf8mb4 character set by default for MySQL database#33608

Merged
jeremy merged 4 commits intorails:masterfrom
yahonda:use_utf8mb4
Sep 11, 2018
Merged

Use utf8mb4 character set by default for MySQL database#33608
jeremy merged 4 commits intorails:masterfrom
yahonda:use_utf8mb4

Conversation

@yahonda
Copy link
Member

@yahonda yahonda commented Aug 13, 2018

Summary

This pull request implements #33596. It includes these changes:

  • Replace utf8 character set with utf8mb4 to support supplementary characters including emoji
  • Removed utf8_unicode_ci collation from Active Record unit test databases to let MySQL server use the default collation for the character set
  • Bump the minimum version of MySQL to 5.7.9 and MariaDB to 10.2.2 to support utf8mb4 character set and 3072 bytes key length with InnoDB
  • Addressed Specified key was too long; max key length is 1000 bytes for MyISAM table in the test by using InnoDB storage engine
  • CI against MySQL 5.7

@rails-bot
Copy link

r? @kamipo

(@rails-bot has picked a reviewer for you, use r? to override)

@yahonda
Copy link
Member Author

yahonda commented Aug 14, 2018

@jeremy I have opened a work in progress PR for #33596. There is one failure with PostgreSQL 9.2 https://travis-ci.org/rails/rails/jobs/415702539 . I don't think it is relevant to my pull request.

@yahonda
Copy link
Member Author

yahonda commented Aug 14, 2018

Restarted CI by changing the last commit hash and found the failure with PostgreSQL 9.2 https://travis-ci.org/rails/rails/jobs/415715778 needs addressed by changing .travis.yml not to upgrade MySQL server if PostgreSQL 9.2 is configured or something like that.

Bottom line: All of CI against MySQL 5.7 is green.

@yahonda
Copy link
Member Author

yahonda commented Aug 14, 2018

Another idea is dropping PostgreSQL 9.2 support for Rails 6 since PostgreSQL 9.2 itself already EOLed https://www.postgresql.org/support/versioning/ .

@yahonda yahonda force-pushed the use_utf8mb4 branch 24 times, most recently from 3d98df6 to f36428c Compare August 16, 2018 13:42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that lib/active_record/tasks/mysql_database_tasks.rb is using config[:encoding] as default charset, whereas we fall back to utf8mb4 here regardless of configured encoding.

def creation_options
  Hash.new.tap do |options|
    options[:charset]     = configuration["encoding"]   if configuration.include? "encoding"

It's preexisting behavior, but should we do the same here? e.g. options[:charset] || @config[:encoding] || 'utf8mb4'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we had named this :charset instead of :encoding long ago 😅

Followed this instruction and changed root password to empty string.
https://docs.travis-ci.com/user/database-setup/#MySQL-57
to support utf8mb4 character set and `innodb_default_row_format`

MySQL 5.7.9 introduces `innodb_default_row_format` to support 3072 byte length index by default.
Users do not have to change MySQL database configuration to support Rails string type.

https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_default_row_format

https://dev.mysql.com/doc/refman/5.7/en/innodb-restrictions.html
> If innodb_large_prefix is enabled (the default),
> the index key prefix limit is 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format.

* Bump the minimum version of MariaDB to 10.2.2
MariaDB 10.2.2 is the first version of MariaDB supporting `innodb_default_row_format`
Also MariaDB says "MySQL 5.7 is compatible with MariaDB 10.2".

- innodb_default_row_format
https://mariadb.com/kb/en/library/xtradbinnodb-server-system-variables/#innodb_default_row_format

- "MariaDB versus MySQL - Compatibility"
https://mariadb.com/kb/en/library/mariadb-vs-mysql-compatibility/
> MySQL 5.7 is compatible with MariaDB 10.2

- "Supported Character Sets and Collations"
https://mariadb.com/kb/en/library/supported-character-sets-and-collations/
* Use utf8mb4 character set

`utf8mb4` character set supports supplementary characters including emoji.
`utf8` character set with 3-Byte encoding is not enough to support them.

There was a downside of 4-Byte length character set with MySQL 5.5 and 5.6:

"ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes"
for Rails string data type which is mapped to varchar(255) type.

MySQL 5.7 supports 3072 byte key prefix length by default.

* Remove `DEFAULT COLLATE` from Active Record unit test databases

There should be no "one size fits all" collation in MySQL 5.7.
Let MySQL server choose the default collation for Active Record
unit test databases.

Users can choose their best collation for their databases
by setting `options[:collation]` based on their requirements.

* InnoDB FULLTEXT indexes support since MySQL 5.6
it does not have to use MyISAM storage engine whose maximum key length is 1000 bytes.
Using MyISAM storag engine with utf8mb4 character set would cause
"Specified key was too long; max key length is 1000 bytes"

https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html

* References

"10.9.1 The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding)"
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html

"10.9.2 The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding)"
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html

"14.8.1.7 Limits on InnoDB Tables"
https://dev.mysql.com/doc/refman/5.7/en/innodb-restrictions.html
> If innodb_large_prefix is enabled (the default), the index key prefix limit is 3072 bytes
> for InnoDB tables that use DYNAMIC or COMPRESSED row format.
@jeremy jeremy added this to the 6.0.0 milestone Sep 11, 2018
Copy link
Member

@jeremy jeremy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Thank you for this contribution @yahonda!

@jeremy jeremy merged commit d54d0c9 into rails:master Sep 11, 2018
@yahonda
Copy link
Member Author

yahonda commented Sep 11, 2018

Thanks for merging.
Let me have some time to work on a pull request to fall back to utf8 if utf8mb4 is not available by version or by configuration.

yahonda added a commit to yahonda/rails that referenced this pull request Sep 13, 2018
…pported

Once rails#33608 merged If users create a new database using MySQL 5.1.x, it will fail to create databases
since MySQL 5.1 does not know `utf8mb4` character set.

This pull request removes `encoding: utf8mb4` from `mysql.yml.tt`
to let create_database method handles default character set by MySQL server version.

`supports_longer_index_key_prefix?` method will need to validate if MySQL 5.5 and 5.6 server configured
correctly to support longer index key prefix, but not yet.
@yahonda yahonda deleted the use_utf8mb4 branch September 23, 2018 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants