Use utf8mb4 character set by default for MySQL database#33608
Use utf8mb4 character set by default for MySQL database#33608jeremy merged 4 commits intorails:masterfrom
Conversation
|
r? @kamipo (@rails-bot has picked a reviewer for you, use r? to override) |
|
@jeremy I have opened a work in progress PR for #33596. There is one failure with PostgreSQL 9.2 https://travis-ci.org/rails/rails/jobs/415702539 . I don't think it is relevant to my pull request. |
|
Restarted CI by changing the last commit hash and found the failure with PostgreSQL 9.2 https://travis-ci.org/rails/rails/jobs/415715778 needs addressed by changing .travis.yml not to upgrade MySQL server if PostgreSQL 9.2 is configured or something like that. Bottom line: All of CI against MySQL 5.7 is green. |
|
Another idea is dropping PostgreSQL 9.2 support for Rails 6 since PostgreSQL 9.2 itself already EOLed https://www.postgresql.org/support/versioning/ . |
3d98df6 to
f36428c
Compare
There was a problem hiding this comment.
I noticed that lib/active_record/tasks/mysql_database_tasks.rb is using config[:encoding] as default charset, whereas we fall back to utf8mb4 here regardless of configured encoding.
def creation_options
Hash.new.tap do |options|
options[:charset] = configuration["encoding"] if configuration.include? "encoding"It's preexisting behavior, but should we do the same here? e.g. options[:charset] || @config[:encoding] || 'utf8mb4'
activerecord/test/config.example.yml
Outdated
There was a problem hiding this comment.
I wish we had named this :charset instead of :encoding long ago 😅
Followed this instruction and changed root password to empty string. https://docs.travis-ci.com/user/database-setup/#MySQL-57
to support utf8mb4 character set and `innodb_default_row_format` MySQL 5.7.9 introduces `innodb_default_row_format` to support 3072 byte length index by default. Users do not have to change MySQL database configuration to support Rails string type. https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_default_row_format https://dev.mysql.com/doc/refman/5.7/en/innodb-restrictions.html > If innodb_large_prefix is enabled (the default), > the index key prefix limit is 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format. * Bump the minimum version of MariaDB to 10.2.2 MariaDB 10.2.2 is the first version of MariaDB supporting `innodb_default_row_format` Also MariaDB says "MySQL 5.7 is compatible with MariaDB 10.2". - innodb_default_row_format https://mariadb.com/kb/en/library/xtradbinnodb-server-system-variables/#innodb_default_row_format - "MariaDB versus MySQL - Compatibility" https://mariadb.com/kb/en/library/mariadb-vs-mysql-compatibility/ > MySQL 5.7 is compatible with MariaDB 10.2 - "Supported Character Sets and Collations" https://mariadb.com/kb/en/library/supported-character-sets-and-collations/
* Use utf8mb4 character set `utf8mb4` character set supports supplementary characters including emoji. `utf8` character set with 3-Byte encoding is not enough to support them. There was a downside of 4-Byte length character set with MySQL 5.5 and 5.6: "ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes" for Rails string data type which is mapped to varchar(255) type. MySQL 5.7 supports 3072 byte key prefix length by default. * Remove `DEFAULT COLLATE` from Active Record unit test databases There should be no "one size fits all" collation in MySQL 5.7. Let MySQL server choose the default collation for Active Record unit test databases. Users can choose their best collation for their databases by setting `options[:collation]` based on their requirements. * InnoDB FULLTEXT indexes support since MySQL 5.6 it does not have to use MyISAM storage engine whose maximum key length is 1000 bytes. Using MyISAM storag engine with utf8mb4 character set would cause "Specified key was too long; max key length is 1000 bytes" https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html * References "10.9.1 The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding)" https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html "10.9.2 The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding)" https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html "14.8.1.7 Limits on InnoDB Tables" https://dev.mysql.com/doc/refman/5.7/en/innodb-restrictions.html > If innodb_large_prefix is enabled (the default), the index key prefix limit is 3072 bytes > for InnoDB tables that use DYNAMIC or COMPRESSED row format.
bf0d709 to
1b047bf
Compare
This reverts commit e2b1ec5.
|
Thanks for merging. |
…pported Once rails#33608 merged If users create a new database using MySQL 5.1.x, it will fail to create databases since MySQL 5.1 does not know `utf8mb4` character set. This pull request removes `encoding: utf8mb4` from `mysql.yml.tt` to let create_database method handles default character set by MySQL server version. `supports_longer_index_key_prefix?` method will need to validate if MySQL 5.5 and 5.6 server configured correctly to support longer index key prefix, but not yet.
Summary
This pull request implements #33596. It includes these changes:
utf8character set withutf8mb4to support supplementary characters including emojiutf8_unicode_cicollation from Active Record unit test databases to let MySQL server use the default collation for the character setutf8mb4character set and 3072 bytes key length with InnoDBSpecified key was too long; max key length is 1000 bytesfor MyISAM table in the test by using InnoDB storage engine