Fix incorrect version, etc in "Collation and Unicode support" #6657

srutzky · 2021-08-02T15:37:56Z

This fixes #6618

Main purpose of this update is to correct the note stating: "Starting with SQL Server 2014 (12.x), all new _140 collations automatically support supplementary characters.". I accidentally introduced that error back in a5dd5b5#diff-bd26ae1d32f0764b643c46bbd2af1259a58e375debebe0d71b2cdb1583dfd754R153 on 2017-10-24, due to inconsistent naming conventions (at least at that time) for the include files. That was supposed to have been "SQL Server 2017 (14.x)".

Some of the include files had version numbers in their names referring to the internal version number (e.g. 11 = version 11.x / 110 — this is SQL Server 2012), while other include files had version numbers in their names referring to the common version number / product name (e.g. 14 = SQL Server 2014; this is version 12.x / 120).
Fixed data type used for CONVERT in server collation query. It was varchar, which has two problems: 1) the base datatype is nvarchar, and even though only standard ASCII characters are used, it's still best to match the base data type, and 2) it's a bad practice to not specify a max size for variable length types as the default is situation-dependent. Here the default is 30 (instead of 1) which is still a problem given that as of SQL Server 2019, 1476 out of 5508 (27%) collation names are over 30 characters long and are thus silently truncated by using varchar.

Please see "Arguments" list in documentation for SERVERPROPERTY.
```
SELECT [name], LEN([name]) AS [NameLength]
FROM   sys.fn_helpcollations()
WHERE  LEN([name]) > 30
ORDER BY 2, 1;
-- 1476
```
Fixed data type used for CONVERT in database collation query. It was varchar(50), which, while better than just varchar, still has two problems: 1) the base datatype is nvarchar, and even though only standard ASCII characters are used, it's still best to match the base data type, and 2) while 50 is better than the default 30, this is still a problem given that as of SQL Server 2019, 48 collation names are over 50 characters long and are thus silently truncated by using varchar(50).

Please see "Arguments" list in documentation for DATABASEPROPERTYEX.
```
SELECT [name], LEN([name]) AS [NameLength]
FROM   sys.fn_helpcollations()
WHERE  LEN([name]) > 50
ORDER BY 2, 1;
-- 48
```
Fixed the max number of code points for BMP and all of Unicode to be 65,536 and 1,114,112, respectively. Both were 1 less than that (in a few places), most likely using the max addressable code point value in each case, and not accounting for U+0000 (i.e. range vs quantity).
Removed "_140" from note regarding new collations automatically supporting supplementary characters. The starting SQL Server version is already mentioned, and making this version specific means one more place to update if/when a new series of collations is introduced (one more place that can be overlooked, leaving misleading documentation).
For consistency, removed "_" prefix from two remaining collation version number references that had them. The vast majority of collation version number references do not use that prefix, so now none of them do.
Under "GB18030 support", changed "100 level" to be "version 100" for consistency.
Under "Japanese collations ...", added "BIN and BIN2" for accuracy.
Under "Japanese collations ...", updated query to list new collations: 1) actual column names are not capitalized in the DB, and 2) since the Japanese collations were the only collations added in SQL Server 2017, using COLLATIONPROPERTY(name, 'Version') is a more deterministic method of filtering as it doesn't rely on string parsing yet is logically equivalent.

If not using COLLATIONPROPERTY, then the following is preferred:
```
SELECT name, description
FROM   sys.fn_helpcollations()
WHERE  name LIKE N'%[_]140[_]%';
```
Added "UCS-2" to keywords (meta data) as it was the only one missing from those combinations.

Take care,
Solomon...
https://SqlQuantumLift.com/
https://SqlQuantumLeap.com/
https://SQLsharp.com/

1. Main purpose of this update is to correct the note stating: "Starting with SQL Server 2014 (12.x), all new _140 collations automatically support supplementary characters.". I accidentally introduced that error back in MicrosoftDocs@a5dd5b5#diff-bd26ae1d32f0764b643c46bbd2af1259a58e375debebe0d71b2cdb1583dfd754R153 on 2017-10-24, due to inconsistent naming conventions (at least at that time) for the include files. That was supposed to have been "SQL Server 2017 (14.x)". 2. Fixed data type used for `CONVERT` in server collation query. It was `varchar`, which has two problems: 1) the base datatype is `nvarchar`, and even though only standard ASCII characters are used, it's still best to match the base data type, and 2) it's a bad practice to not specify a max size for variable length types as the default is situation-dependent. Here the default is 30 (instead of 1) which is still a problem given that as of SQL Server 2019, 1476 out of 5508 (27%) collation names are over 30 characters long and are thus silently truncated by using `varchar`. 3. Fixed data type used for `CONVERT` in database collation query. It was `varchar(50)`, which, while better than just `varchar`, still has two problems: 1) the base datatype is `nvarchar`, and even though only standard ASCII characters are used, it's still best to match the base data type, and 2) while 50 is better than the default 30, this is still a problem given that as of SQL Server 2019, 48 collation names are over 50 characters long and are thus silently truncated by using `varchar(50)`. 4. Fixed the max number of code points for BMP and all of Unicode to be 65,536 and 1,114,112, respectively. Both were 1 less than that, most likely using the max addressable code point value in each case, and not accounting for U+0000 (i.e. range vs quantity). 5. Removed "_140" from note regarding new collations automatically supporting supplementary characters. The starting SQL Server version is already mentioned, and making this version specific means one more place to update if/when a new series of collations is introduced (one more place that can be overlooked, leaving misleading documentation). 6. For consistency, removed "_" prefix from two remaining collation version number references that had them. The vast majority of collation version number references do not use that prefix, so now none of them do. 7. Under "GB18030 support", changed "100 level" to be "version 100" for consistency. 8. Under "Japanese collations ...", added "BIN and BIN2" for accuracy. 9. Under "Japanese collations ...", updated query to list new collations: 1) actual column names are not capitalized in the DB, and 2) since the Japanese collations were the only collations added in SQL Server 2017, using `COLLATIONPROPERTY(name, 'Version')` is a more deterministic method of filtering as it doesn't rely on string parsing yet is logically equivalent. 10. Added "UCS-2" to keywords (meta data) as it was the only one missing from those combinations.

PRMerger8 · 2021-08-02T15:38:12Z

@srutzky : Thanks for your contribution! The author(s) have been notified to review your proposed change.

Added notes in at least 3 places about built-in supplementary character support in new version 140 collations. Some grammatical improvements. Fixed formatting and rearranged bullet points as several items somehow got grouped into one of them and were no longer separate items.

PRMerger20 added the do-not-merge label Aug 2, 2021

PRMerger8 requested a review from pmasl August 2, 2021 15:38

PRMerger8 assigned pmasl Aug 2, 2021

PRMerger8 added Change sent to author sql/prod labels Aug 2, 2021

ktoliver added the aq-pr-triaged tracking label for the PR review team label Aug 2, 2021

pmasl approved these changes Nov 12, 2021

View reviewed changes

ktoliver merged commit 863e783 into MicrosoftDocs:live Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect version, etc in "Collation and Unicode support" #6657

Fix incorrect version, etc in "Collation and Unicode support" #6657

Uh oh!

srutzky commented Aug 2, 2021 •

edited

Loading

Uh oh!

PRMerger8 commented Aug 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix incorrect version, etc in "Collation and Unicode support" #6657

Fix incorrect version, etc in "Collation and Unicode support" #6657

Uh oh!

Conversation

srutzky commented Aug 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PRMerger8 commented Aug 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

srutzky commented Aug 2, 2021 •

edited

Loading