SQL Server collations define the rules for how character data gets sorted and compared across the platform. Configuring appropriate collations is vital for enabling globalized applications that handle multiple languages correctly.
In this extensive 2600+ word guide, we will cover all key aspects around working with collations in SQL Server.
Contents
- Collation Overview
- Collation Naming Conventions
- Collation Precedence Rules
- Viewing Active Collations
- Case & Accent Sensitivity
- Setting Column Collations
- Changing Database Collations
- Resolving Collation Conflicts
- Collations for Contained Databases
- Integrating Collations by Language
- Collation Configuration for Optimal Performance
- Upgrading SQL Server Collations
- References
Collation Overview
Collations in SQL Server provide sorting and comparison rules tailored for character data across different languages/scripts used in applications.
Key characteristics defined by collation include:
- Case sensitivity for handling upper and lower case letters
- Accent sensitivity for managing letters with diacritics and accents
- Width sensitivity for comparing wide & narrow variants of East Asian characters
- Code page that maps character data to numeric codes
- Sort order rules like dictionary order or stroke order
SQL Server provides a wide variety of inbuilt collations supporting various major languages like French, Greek, Chinese, Arabic etc. Custom applications typically require appropriate collation configuration for sorting and comparing textual data correctly.

*Collations in SQL Server handle text data for global applications*
The collation used by a database can be specified at the time of creation. Columns with character data types can override the database defaults and define their own collations as per application semantics.
Collation Naming Conventions
SQL Server defines collations using well structured names that describe the collation capabilities:
Latin1_General_[Pref]_[CI|CS]_[AI|AS]_[KI|KS]_[WS]_[SC|SO]
Let‘s analyze each component of the collation name:
- Latin1/Greek/Cyrillic/Chinese_PRC/Japanese/Korean etc.: Specifies the language or script
- General: Linguistic rules for the language
- Pref: Preferred collation type for the language
- CI: Case Insensitive, CS: Case Sensitive
- AI: Accent Insensitive, AS: Accent Sensitive
- KI: Kana Insensitive, KS: Kana Sensitive (Japanese only)
- WS: Width Sensitive (Double byte chars only)
- SC: Sort by Code point, SO: Sort by Stroke/radical (East Asian chars only)
Here are some examples that demonstrate the naming patterns:
1. Latin1_General_100_CI_AI_SC
-> Case insensitive, Accent insensitive collation for Latin1 language
2. Japanese_Bushu_Kakusu_140_CS_AS_KS_WS_SO
-> Case sensitive, Accent sensitive collation for Japanese based on radical sorting
3. Chinese_PRC_Stroke_Order_100_CS_AS_WS
-> Case sensitive, Accent sensitive collation for Chinese with stroke order sorting
The detailed naming conventions allow precise interpretation of collation behaviors.
Collation Precedence Rules
SQL Server follows a hierarchy while determining which collation to use for text processing:
1. Db object definition
Explicit collation specified during table/view creation
2. Db column definition
Explicit collation specified during column creation
3. Db default collation
Default collation assigned to the containing database
4. Instance default collation
Default collation specified for SQL instance
Any explicit collation takes highest precedence while implicit inheritances follow in order.
Viewing Active Collations
Finding the currently active collations configured across databases and columns helps with debugging text processing issues.
List all databases with respective collations:
SELECT name, collation_name
FROM sys.databases;
Get collation for a specific database:
SELECT DATABASEPROPERTYEX(‘db_name‘,‘Collation‘);
Find columns with non-default collations:
SELECT DISTINCT t.name AS table_name,
c.name AS column_name,
ds.collation_name
FROM sys.columns AS c
INNER JOIN sys.tables AS t
ON t.object_id = c.object_id
INNER JOIN sys.database_scoped_configurations AS ds
ON ds.configuration_id = c.collation_name
WHERE ds.collation_name <> ds.value;
These dynamically generated views and functions contain collation metadata for tuning.
Case & Accent Sensitivity
Choosing appropriate case/accent sensitivity is crucial for globalized applications handling multiple languages.
Case insensitive collations treat data similarly irrespective of letter casing used. This improves search performance but accuracy suffers:
City = ‘LondoN‘
Matches ‘London‘ in case insensitive collation
Case sensitive collations differentiate between upper and lower casing. Slower performance but results are precise:
City = ‘LondoN‘
No match for ‘London‘ in case sensitive collation
Here is an example with Hungarian text showing accent insensitive behavior:
DECLARE @text1 NVARCHAR(50) = N‘felhő‘;
DECLARE @text2 NVARCHAR(50) = N‘felho‘;
SELECT CASE
WHEN @text1 = @text2 COLLATE Hungarian_CI_AI THEN ‘Matches‘
ELSE ‘No Match‘
END AS Result;
--> Returns ‘Matches‘
The same text matched because accents were ignored by Hungarian_CI_AI collation.
And here is the counterpart showing accent sensitive collation:
DECLARE @text1 NVARCHAR(50) = N‘felhő‘;
DECLARE @text2 NVARCHAR(50) = N‘felho‘;
SELECT CASE
WHEN @text1 = @text2 COLLATE Hungarian_CI_AS THEN ‘Matches‘
ELSE ‘No Match‘
END AS Result;
--> Returns ‘No Match‘
So for applications that need to differentiate between such nuances, using accent + case sensitive collations is vital.
Setting Column Collations
When creating a new table with character columns, appropriate collations can be defined explicitly:
CREATE TABLE contacts (
id INT,
name NVARCHAR(50) COLLATE Greek_CS_AS,
address NVARCHAR(250) COLLATE Latin1_General_100_CI_AI
);
This results in mixed collations within the table driven by semantic requirements.
Modify column collation in existing table:
ALTER TABLE contacts
ALTER COLUMN name
NVARCHAR(50) COLLATE French_100_CI_AI;
Verify using:
SELECT name, collation_name
FROM sys.columns WHERE object_id = OBJECT_ID(‘contacts‘);
So collations can be tuned at column level for specialized behavior.
Changing Database Collations
Updating collations for entire databases is also supported in SQL Server:
ALTER DATABASE database_name
COLLATE Greek_CS_AS;
This can be useful for standardization or after migrations.
Points to note:
- All schema objects implicitly inherit updated collation
- Explicitly defined column collations are not updated
- Operations can take very long for big databases
- Applications must be tested for impact
So alter database collations judiciously after through analysis.
Resolving Collation Conflicts
Since multiple collations get involved, mismatches can easily happen leading to inconsistent text comparisons:
SELECT * FROM Employees AS e
JOIN ContactForms AS f
ON e.FirstName = f.EmployeeName
Here JOIN comparison fails since columns have mismatched collations.
Various techniques can overcome such incompatibilities:
1. Use COLLATE clause
SELECT * FROM Employees AS e
JOIN ContactForms AS f
ON e.FirstName = f.EmployeeName
COLLATE Latin1_General_100_CI_AI
COLLATE overrides default database collation.
2. Explicit CAST with COLLATE
CAST source strings to match target collation:
SELECT *
FROM Employees AS e
JOIN ContactForms AS f ON
e.FirstName = CAST(f.EmployeeName AS NVARCHAR(50)
COLLATE French_100_CS_AS);
3. Use Column Ordinal Positions
Avoid string comparisons by relying on ordinal positions:
SELECT * FROM Employees e
INNER JOIN ContactForms f ON e.col1 = f.col1
Test queries with sample data after applying these techniques for eliminating collation errors.
Collations for Contained Databases
Contained databases have their own collations defined within, separate from the instance defaults.
Specify collation while creating contained database:
CREATE DATABASE contained_db
COLLATE Chinese_PRC_140_CS_AS_WS_SO
CONTAINMENT = PARTIAL;
Later validate it without instance/server context:
SELECT DATABASEPROPERTYEX(‘contained_db‘, ‘Collation‘);
Key pointers while working with contained database collations:
- Partially contained databases can have different collations from instance
- Fully contained databases always use SQL_Latin1_General_CP1_CI_AS
- Use explicit joins/casting to match string comparisons
- Migrate contained databases carefully on servers with collation conflicts
So give special attention to collations while designing contained databases.
Integrating Collations by Language
Since SQL Server provides extensive language specific collations, applications can define appropriate collations based on textual data language to enable proper globalized behavior.
Some examples:
1. English text with dictionary sorting
Use SQL collation:
Latin1_General_100_CI_AI_SC
OR Windows collation:
Latin1_General_100_CI_AS_KS_WS_SC
2. French text with accent + case sensitivity
Use SQL collation:
French_100_CS_AS
OR Windows collation:
French_100_CS_AS_KS_WS_SC
3. Arabic text with custom sorting
Use SQL collation:
Arabic_CS_AS
OR Windows collation:
Arabic_CI_AS_KS_WS
This way appropriate language collations can be implemented that align with data characteristics.
Collation Configuration for Optimal Performance
Collations have significant performance impact for text heavy applications doing extensive string operations.
Follow these database design practices for collation optimization:
-
Use case insensitive collations for columns frequently involved in filters and joins like primary keys as they speed up comparisons.
-
Define collations only where required instead of entire tables or databases to minimize expensive overheads.
-
Choose SQL collations for uncommon languages and special cases. Stick to Windows collations for optimal speed in most applications.
-
Configure parallel index operations by enabling the
alter indexoption for reducing collation rebuild times for large tables. -
Reuse columns with same collations across tables through views for avoiding redundant casts and collisions.
-
Ensure identical collations among columns commonly used together in joins, grouping conditions.
Proper testing coupled with these collation configuration best practices can boost the text processing throughput substantially.
Upgrading SQL Server Collations
During major version upgrades, SQL Server often adds new collations or updates behavior of existing ones. This can impact applications:
- Queries assuming older collation rules may break
- Collations susceptible to more collisions with legacy systems
- Database restores can get blocked by collation mismatches
Ensure following checks are done during SQL Server upgrades:
-
Back up existing databases before initiating upgrade
-
Preview database restores on new version by explicitly specifying target collations
-
Analyze queries dependent on collation behavior and rewrite if necessary
-
Compare old vs new collations and port databases using closest mappings
-
Standardize contained databases to use Fixed collations like SQL_Latin1_General_CP1_CI_AS for stability
-
Conduct sufficient tests with databases moved to new version
-
Rewrite any code still affected after all above steps
With careful analysis and testing, potential application issues due to SQL Server collation changes can be avoided during upgrades.
References
These supplemental references provide more deep information on SQL Server collations:
MSDN Details on SQL Server Collation Fundamentals
SQLShack Guide on Collation Best Practices
BOL Reference on Windows Collation Name Formats
High Performance Techniques for Collations by Example
Conclusion
Collations play a vital role in enabling SQL Server based applications handle multiple languages correctly. With hundreds of available collation options, configuring relevant language and sensitivity rules systematically is important. Use this extensive guide to understand collation fundamentals, special cases like contained databases and troubleshooting techniques to avoid conflicts. Reach out if any clarifications needed while implementing globalized collations in SQL Server databases as per your application needs.


