What's the difference and when to use what? What's the risk if I always use ToLower() and what's the risk if I always use ToLowerInvariant()?
-
5Maybe you should instead normalize strings to uppercase. See msdn.microsoft.com/en-us/library/bb386042.aspxhabakuk– habakuk2017-03-20 14:58:31 +00:00Commented Mar 20, 2017 at 14:58
-
See also stackoverflow.com/questions/3550213/…riQQ– riQQ2022-07-07 11:23:54 +00:00Commented Jul 7, 2022 at 11:23
4 Answers
Depending on the current culture, ToLower might produce a culture specific lowercase letter, that you aren't expecting. Such as producing ınfo without the dot on the i instead of info and thus mucking up string comparisons. For that reason, ToLowerInvariant should be used on any non-language-specific data. When you might have user input that might be in their native language/character-set, would generally be the only time you use ToLower.
See this question for an example of this issue: C#- ToLower() is sometimes removing dot from the letter "I"
Comments
tldr; For lowercased content that a human will read e.g. articles, posts, comments, names, places): use ToLower().
For normalised versions of identifiers, keywords, or machine-consumed strings that have a fixed-meaning that shouldn't ever changed based on who's running the code or where it is being run from: use ToLowerInvariant().
For just comparing two strings without caring about case and/or not storing the resulting string: use StringComparison.OrdinalIgnoreCase.
ToLower() uses the current thread's culture (CultureInfo.CurrentCulture) to determine casing rules. This respects language-specific conventions - different languages have different rules for what the lowercase form of a character is.
ToLowerInvariant() uses the invariant culture, which is a fixed, culture-independent set of casing rules based on English (US). It always produces the same result regardless of the user's locale or operating system language settings.
An example using the Turkish language can help highlight the difference.
| Uppercase | Lowercase | Name |
|---|---|---|
| İ | i | Dotted I |
| I | ı | Dotless I |
In Turkish, DIŞ means outside and its correct lowercase is dış (with a dotless i). If a dotted i is instead used, you get diş which means tooth - a completely different word.
Using ToLowerInvariant here would omit this distinction and ultimately, result in typos in Turkish.
// Thread culture is Turkish (tr-TR)
"DIŞ".ToLower() // "dış" (correct - means 'outside')
"DIŞ".ToLowerInvariant() // "diş" (wrong - means 'tooth')
Turkish is not the only language affected by culture-sensitive casing. Unicode's SpecialCasing.txt file details a few more languages like Greek, Lithuanian, German and various other Turkic languages like Azerbaijani and Kazakh.
e.g.
- Greek:
Σhas two lowercase forms -σ(mid-word) andς(end of word) - Lithuanian: lowercasing
Ìremoves the accent in certain contexts when combined with soft-dot rules - German: not a
ToLowerissue per se but "STRAßE".ToUpper() → "STRASSE", and round-tripping back produces a different string - Azerbaijani: same four-i system as Turkish (
I→ı,İ→i)
Now imagine you are writing an SQL parser. Somewhere you have code like:
if (operator.ToLower() == "like")
{
// handle SQL LIKE operator - does NOT work correctly in every culture
}
SQL grammar does not change when you change cultures. A French user does not write SÉLECTIONNEZ x DE books instead of SELECT x FROM books.
However, since ToLower() respects the current culture, this code will break on a machine with a culture-sensitive casing set e.g. Turkish.
// Thread culture is Turkish (tr-TR)
"LIKE".ToLower() // → "lıke" (wrong - dotless i != "like")
"LIKE".ToLowerInvariant() // → "like" (correct - dotted i == "like")
For the ToLower() version to work, a Turkish user would need to type LİKE (with a dotted capital I) - which is unreasonable.
The fix would be to use ToLowerInvariant() for protocol/grammar comparisons, or better yet, use StringComparison.OrdinalIgnoreCase:
if (operator.Equals("like", StringComparison.OrdinalIgnoreCase))
{
// handle SQL LIKE operator - works correctly in every culture
}
2 Comments
I think this can be useful:
http://msdn.microsoft.com/en-us/library/system.string.tolowerinvariant.aspx
update
If your application depends on the case of a string changing in a predictable way that is unaffected by the current culture, use the ToLowerInvariant method. The ToLowerInvariant method is equivalent to ToLower(CultureInfo.InvariantCulture). The method is recommended when a collection of strings must appear in a predictable order in a user interface control.
also
...ToLower is very similar in most places to ToLowerInvariant. The documents indicate that these methods will only change behavior with Turkish cultures. Also, on Windows systems, the file system is case-insensitive, which further limits its use...
http://www.dotnetperls.com/tolowerinvariant-toupperinvariant
hth
2 Comments
String.ToLower() uses the default culture while String.ToLowerInvariant() uses the invariant culture. So you are essentially asking the differences between invariant culture and ordinal string comparision.
1 Comment
ToLower variants; Ordinal vs. invariant just changes the "sort order" of two strings, doesn't change equality comparison.