Skip to content

Conversation

@srutzky
Copy link
Contributor

@srutzky srutzky commented Aug 2, 2021

This fixes #6618

  1. Main purpose of this update is to correct the note stating: "Starting with SQL Server 2014 (12.x), all new _140 collations automatically support supplementary characters.". I accidentally introduced that error back in a5dd5b5#diff-bd26ae1d32f0764b643c46bbd2af1259a58e375debebe0d71b2cdb1583dfd754R153 on 2017-10-24, due to inconsistent naming conventions (at least at that time) for the include files. That was supposed to have been "SQL Server 2017 (14.x)".

    Some of the include files had version numbers in their names referring to the internal version number (e.g. 11 = version 11.x / 110 — this is SQL Server 2012), while other include files had version numbers in their names referring to the common version number / product name (e.g. 14 = SQL Server 2014; this is version 12.x / 120).

  2. Fixed data type used for CONVERT in server collation query. It was varchar, which has two problems: 1) the base datatype is nvarchar, and even though only standard ASCII characters are used, it's still best to match the base data type, and 2) it's a bad practice to not specify a max size for variable length types as the default is situation-dependent. Here the default is 30 (instead of 1) which is still a problem given that as of SQL Server 2019, 1476 out of 5508 (27%) collation names are over 30 characters long and are thus silently truncated by using varchar.

    Please see "Arguments" list in documentation for SERVERPROPERTY.

    SELECT [name], LEN([name]) AS [NameLength]
    FROM   sys.fn_helpcollations()
    WHERE  LEN([name]) > 30
    ORDER BY 2, 1;
    -- 1476
  3. Fixed data type used for CONVERT in database collation query. It was varchar(50), which, while better than just varchar, still has two problems: 1) the base datatype is nvarchar, and even though only standard ASCII characters are used, it's still best to match the base data type, and 2) while 50 is better than the default 30, this is still a problem given that as of SQL Server 2019, 48 collation names are over 50 characters long and are thus silently truncated by using varchar(50).

    Please see "Arguments" list in documentation for DATABASEPROPERTYEX.

    SELECT [name], LEN([name]) AS [NameLength]
    FROM   sys.fn_helpcollations()
    WHERE  LEN([name]) > 50
    ORDER BY 2, 1;
    -- 48
  4. Fixed the max number of code points for BMP and all of Unicode to be 65,536 and 1,114,112, respectively. Both were 1 less than that (in a few places), most likely using the max addressable code point value in each case, and not accounting for U+0000 (i.e. range vs quantity).

  5. Removed "_140" from note regarding new collations automatically supporting supplementary characters. The starting SQL Server version is already mentioned, and making this version specific means one more place to update if/when a new series of collations is introduced (one more place that can be overlooked, leaving misleading documentation).

  6. For consistency, removed "_" prefix from two remaining collation version number references that had them. The vast majority of collation version number references do not use that prefix, so now none of them do.

  7. Under "GB18030 support", changed "100 level" to be "version 100" for consistency.

  8. Under "Japanese collations ...", added "BIN and BIN2" for accuracy.

  9. Under "Japanese collations ...", updated query to list new collations: 1) actual column names are not capitalized in the DB, and 2) since the Japanese collations were the only collations added in SQL Server 2017, using COLLATIONPROPERTY(name, 'Version') is a more deterministic method of filtering as it doesn't rely on string parsing yet is logically equivalent.

    If not using COLLATIONPROPERTY, then the following is preferred:

    SELECT name, description
    FROM   sys.fn_helpcollations()
    WHERE  name LIKE N'%[_]140[_]%';
  10. Added "UCS-2" to keywords (meta data) as it was the only one missing from those combinations.

Take care,
Solomon...
https://SqlQuantumLift.com/
https://SqlQuantumLeap.com/
https://SQLsharp.com/

1. Main purpose of this update is to correct the note stating: "Starting with SQL Server 2014 (12.x), all new _140 collations automatically support supplementary characters.". I accidentally introduced that error back in MicrosoftDocs@a5dd5b5#diff-bd26ae1d32f0764b643c46bbd2af1259a58e375debebe0d71b2cdb1583dfd754R153 on 2017-10-24, due to inconsistent naming conventions (at least at that time) for the include files. That was supposed to have been "SQL Server 2017 (14.x)".

2. Fixed data type used for `CONVERT` in server collation query. It was `varchar`, which has two problems: 1) the base datatype is `nvarchar`, and even though only standard ASCII characters are used, it's still best to match the base data type, and 2) it's a bad practice to not specify a max size for variable length types as the default is situation-dependent. Here the default is 30 (instead of 1) which is still a problem given that as of SQL Server 2019, 1476 out of 5508 (27%) collation names are over 30 characters long and are thus silently truncated by using `varchar`.

3. Fixed data type used for `CONVERT` in database collation query. It was `varchar(50)`, which, while better than just `varchar`, still has two problems: 1) the base datatype is `nvarchar`, and even though only standard ASCII characters are used, it's still best to match the base data type, and 2) while 50 is better than the default 30, this is still a problem given that as of SQL Server 2019, 48 collation names are over 50 characters long and are thus silently truncated by using `varchar(50)`.

4. Fixed the max number of code points for BMP and all of Unicode to be 65,536 and 1,114,112, respectively. Both were 1 less than that, most likely using the max addressable code point value in each case, and not accounting for U+0000 (i.e. range vs quantity).

5. Removed "_140" from note regarding new collations automatically supporting supplementary characters. The starting SQL Server version is already mentioned, and making this version specific means one more place to update if/when a new series of collations is introduced (one more place that can be overlooked, leaving misleading documentation).

6. For consistency, removed "_" prefix from two remaining collation version number references that had them. The vast majority of collation version number references do not use that prefix, so now none of them do.

7. Under "GB18030 support", changed "100 level" to be "version 100" for consistency.

8. Under "Japanese collations ...", added "BIN and BIN2" for accuracy.

9. Under "Japanese collations ...", updated query to list new collations: 1) actual column names are not capitalized in the DB, and 2) since the Japanese collations were the only collations added in SQL Server 2017, using `COLLATIONPROPERTY(name, 'Version')` is a more deterministic method of filtering as it doesn't rely on string parsing yet is logically equivalent.

10.  Added "UCS-2" to keywords (meta data) as it was the only one missing from those combinations.
@PRMerger8
Copy link
Contributor

@srutzky : Thanks for your contribution! The author(s) have been notified to review your proposed change.

@PRMerger8 PRMerger8 requested a review from pmasl August 2, 2021 15:38
srutzky referenced this pull request Aug 2, 2021
Added notes in at least 3 places about built-in supplementary character support in new version 140 collations.

Some grammatical improvements.

Fixed formatting and rearranged bullet points as several items somehow got grouped into one of them and were no longer separate items.
@ktoliver ktoliver added the aq-pr-triaged tracking label for the PR review team label Aug 2, 2021
@ktoliver ktoliver merged commit 863e783 into MicrosoftDocs:live Nov 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mistake version number in documentation?

5 participants