SQL Server collations define the rules for how character data gets sorted and compared across the platform. Configuring appropriate collations is vital for enabling globalized applications that handle multiple languages correctly.

In this extensive 2600+ word guide, we will cover all key aspects around working with collations in SQL Server.

Contents

Collation Overview

Collations in SQL Server provide sorting and comparison rules tailored for character data across different languages/scripts used in applications.

Key characteristics defined by collation include:

  • Case sensitivity for handling upper and lower case letters
  • Accent sensitivity for managing letters with diacritics and accents
  • Width sensitivity for comparing wide & narrow variants of East Asian characters
  • Code page that maps character data to numeric codes
  • Sort order rules like dictionary order or stroke order

SQL Server provides a wide variety of inbuilt collations supporting various major languages like French, Greek, Chinese, Arabic etc. Custom applications typically require appropriate collation configuration for sorting and comparing textual data correctly.

*Collations in SQL Server handle text data for global applications*

The collation used by a database can be specified at the time of creation. Columns with character data types can override the database defaults and define their own collations as per application semantics.

Collation Naming Conventions

SQL Server defines collations using well structured names that describe the collation capabilities:

Latin1_General_[Pref]_[CI|CS]_[AI|AS]_[KI|KS]_[WS]_[SC|SO] 

Let‘s analyze each component of the collation name:

  • Latin1/Greek/Cyrillic/Chinese_PRC/Japanese/Korean etc.: Specifies the language or script
  • General: Linguistic rules for the language
  • Pref: Preferred collation type for the language
  • CI: Case Insensitive, CS: Case Sensitive
  • AI: Accent Insensitive, AS: Accent Sensitive
  • KI: Kana Insensitive, KS: Kana Sensitive (Japanese only)
  • WS: Width Sensitive (Double byte chars only)
  • SC: Sort by Code point, SO: Sort by Stroke/radical (East Asian chars only)

Here are some examples that demonstrate the naming patterns:

1. Latin1_General_100_CI_AI_SC 
   -> Case insensitive, Accent insensitive collation for Latin1 language

2. Japanese_Bushu_Kakusu_140_CS_AS_KS_WS_SO
   -> Case sensitive, Accent sensitive collation for Japanese based on radical sorting  

3. Chinese_PRC_Stroke_Order_100_CS_AS_WS
   -> Case sensitive, Accent sensitive collation for Chinese with stroke order sorting

The detailed naming conventions allow precise interpretation of collation behaviors.

Collation Precedence Rules

SQL Server follows a hierarchy while determining which collation to use for text processing:

1. Db object definition
Explicit collation specified during table/view creation

2. Db column definition
Explicit collation specified during column creation

3. Db default collation
Default collation assigned to the containing database

4. Instance default collation
Default collation specified for SQL instance

Any explicit collation takes highest precedence while implicit inheritances follow in order.

Viewing Active Collations

Finding the currently active collations configured across databases and columns helps with debugging text processing issues.

List all databases with respective collations:

SELECT name, collation_name  
FROM sys.databases;

Get collation for a specific database:

SELECT DATABASEPROPERTYEX(‘db_name‘,‘Collation‘);  

Find columns with non-default collations:

SELECT DISTINCT t.name AS table_name, 
   c.name AS column_name,
   ds.collation_name
FROM sys.columns AS c  
INNER JOIN sys.tables AS t 
   ON t.object_id = c.object_id
INNER JOIN sys.database_scoped_configurations AS ds  
   ON ds.configuration_id = c.collation_name
WHERE ds.collation_name <> ds.value;

These dynamically generated views and functions contain collation metadata for tuning.

Case & Accent Sensitivity

Choosing appropriate case/accent sensitivity is crucial for globalized applications handling multiple languages.

Case insensitive collations treat data similarly irrespective of letter casing used. This improves search performance but accuracy suffers:

City = ‘LondoN‘ 
Matches ‘London‘ in case insensitive collation 

Case sensitive collations differentiate between upper and lower casing. Slower performance but results are precise:

City = ‘LondoN‘
No match for ‘London‘ in case sensitive collation

Here is an example with Hungarian text showing accent insensitive behavior:

DECLARE @text1 NVARCHAR(50) = N‘felhő‘;  
DECLARE @text2 NVARCHAR(50) = N‘felho‘;

SELECT CASE 
    WHEN @text1 = @text2 COLLATE Hungarian_CI_AI THEN ‘Matches‘
    ELSE ‘No Match‘
END AS Result;

--> Returns ‘Matches‘

The same text matched because accents were ignored by Hungarian_CI_AI collation.

And here is the counterpart showing accent sensitive collation:

DECLARE @text1 NVARCHAR(50) = N‘felhő‘; 
DECLARE @text2 NVARCHAR(50) = N‘felho‘;

SELECT CASE
   WHEN @text1 = @text2 COLLATE Hungarian_CI_AS THEN ‘Matches‘  
   ELSE ‘No Match‘
END AS Result; 

--> Returns ‘No Match‘

So for applications that need to differentiate between such nuances, using accent + case sensitive collations is vital.

Setting Column Collations

When creating a new table with character columns, appropriate collations can be defined explicitly:

CREATE TABLE contacts (
   id INT,
   name NVARCHAR(50) COLLATE Greek_CS_AS,  
   address NVARCHAR(250) COLLATE Latin1_General_100_CI_AI 
);

This results in mixed collations within the table driven by semantic requirements.

Modify column collation in existing table:

ALTER TABLE contacts   
ALTER COLUMN name 
   NVARCHAR(50) COLLATE French_100_CI_AI;

Verify using:

SELECT name, collation_name
FROM sys.columns WHERE object_id = OBJECT_ID(‘contacts‘);

So collations can be tuned at column level for specialized behavior.

Changing Database Collations

Updating collations for entire databases is also supported in SQL Server:

ALTER DATABASE database_name
COLLATE Greek_CS_AS;

This can be useful for standardization or after migrations.

Points to note:

  • All schema objects implicitly inherit updated collation
  • Explicitly defined column collations are not updated
  • Operations can take very long for big databases
  • Applications must be tested for impact

So alter database collations judiciously after through analysis.

Resolving Collation Conflicts

Since multiple collations get involved, mismatches can easily happen leading to inconsistent text comparisons:

SELECT * FROM Employees AS e
JOIN ContactForms AS f
ON e.FirstName = f.EmployeeName

Here JOIN comparison fails since columns have mismatched collations.

Various techniques can overcome such incompatibilities:

1. Use COLLATE clause

SELECT * FROM Employees AS e  
JOIN ContactForms AS f
ON e.FirstName = f.EmployeeName 
   COLLATE Latin1_General_100_CI_AI

COLLATE overrides default database collation.

2. Explicit CAST with COLLATE

CAST source strings to match target collation:

SELECT *   
FROM Employees AS e
JOIN ContactForms AS f ON  
   e.FirstName = CAST(f.EmployeeName AS NVARCHAR(50)   
                      COLLATE French_100_CS_AS);

3. Use Column Ordinal Positions

Avoid string comparisons by relying on ordinal positions:

SELECT * FROM Employees e     
INNER JOIN ContactForms f ON e.col1 = f.col1

Test queries with sample data after applying these techniques for eliminating collation errors.

Collations for Contained Databases

Contained databases have their own collations defined within, separate from the instance defaults.

Specify collation while creating contained database:

CREATE DATABASE contained_db
COLLATE Chinese_PRC_140_CS_AS_WS_SO
CONTAINMENT = PARTIAL;  

Later validate it without instance/server context:

SELECT DATABASEPROPERTYEX(‘contained_db‘, ‘Collation‘); 

Key pointers while working with contained database collations:

  • Partially contained databases can have different collations from instance
  • Fully contained databases always use SQL_Latin1_General_CP1_CI_AS
  • Use explicit joins/casting to match string comparisons
  • Migrate contained databases carefully on servers with collation conflicts

So give special attention to collations while designing contained databases.

Integrating Collations by Language

Since SQL Server provides extensive language specific collations, applications can define appropriate collations based on textual data language to enable proper globalized behavior.

Some examples:

1. English text with dictionary sorting

Use SQL collation:

Latin1_General_100_CI_AI_SC

OR Windows collation:

Latin1_General_100_CI_AS_KS_WS_SC

2. French text with accent + case sensitivity

Use SQL collation:

French_100_CS_AS  

OR Windows collation:

French_100_CS_AS_KS_WS_SC  

3. Arabic text with custom sorting

Use SQL collation:

Arabic_CS_AS

OR Windows collation:

Arabic_CI_AS_KS_WS  

This way appropriate language collations can be implemented that align with data characteristics.

Collation Configuration for Optimal Performance

Collations have significant performance impact for text heavy applications doing extensive string operations.

Follow these database design practices for collation optimization:

  • Use case insensitive collations for columns frequently involved in filters and joins like primary keys as they speed up comparisons.

  • Define collations only where required instead of entire tables or databases to minimize expensive overheads.

  • Choose SQL collations for uncommon languages and special cases. Stick to Windows collations for optimal speed in most applications.

  • Configure parallel index operations by enabling the alter index option for reducing collation rebuild times for large tables.

  • Reuse columns with same collations across tables through views for avoiding redundant casts and collisions.

  • Ensure identical collations among columns commonly used together in joins, grouping conditions.

Proper testing coupled with these collation configuration best practices can boost the text processing throughput substantially.

Upgrading SQL Server Collations

During major version upgrades, SQL Server often adds new collations or updates behavior of existing ones. This can impact applications:

  • Queries assuming older collation rules may break
  • Collations susceptible to more collisions with legacy systems
  • Database restores can get blocked by collation mismatches

Ensure following checks are done during SQL Server upgrades:

  1. Back up existing databases before initiating upgrade

  2. Preview database restores on new version by explicitly specifying target collations

  3. Analyze queries dependent on collation behavior and rewrite if necessary

  4. Compare old vs new collations and port databases using closest mappings

  5. Standardize contained databases to use Fixed collations like SQL_Latin1_General_CP1_CI_AS for stability

  6. Conduct sufficient tests with databases moved to new version

  7. Rewrite any code still affected after all above steps

With careful analysis and testing, potential application issues due to SQL Server collation changes can be avoided during upgrades.

References

These supplemental references provide more deep information on SQL Server collations:

MSDN Details on SQL Server Collation Fundamentals

SQLShack Guide on Collation Best Practices

BOL Reference on Windows Collation Name Formats

High Performance Techniques for Collations by Example

Conclusion

Collations play a vital role in enabling SQL Server based applications handle multiple languages correctly. With hundreds of available collation options, configuring relevant language and sensitivity rules systematically is important. Use this extensive guide to understand collation fundamentals, special cases like contained databases and troubleshooting techniques to avoid conflicts. Reach out if any clarifications needed while implementing globalized collations in SQL Server databases as per your application needs.

Similar Posts