251

What's the difference and when to use what? What's the risk if I always use ToLower() and what's the risk if I always use ToLowerInvariant()?

2

4 Answers 4

229

Depending on the current culture, ToLower might produce a culture specific lowercase letter, that you aren't expecting. Such as producing ınfo without the dot on the i instead of info and thus mucking up string comparisons. For that reason, ToLowerInvariant should be used on any non-language-specific data. When you might have user input that might be in their native language/character-set, would generally be the only time you use ToLower.

See this question for an example of this issue: C#- ToLower() is sometimes removing dot from the letter "I"

Sign up to request clarification or add additional context in comments.

Comments

101

tldr; For lowercased content that a human will read e.g. articles, posts, comments, names, places): use ToLower().

For normalised versions of identifiers, keywords, or machine-consumed strings that have a fixed-meaning that shouldn't ever changed based on who's running the code or where it is being run from: use ToLowerInvariant().

For just comparing two strings without caring about case and/or not storing the resulting string: use StringComparison.OrdinalIgnoreCase.


ToLower() uses the current thread's culture (CultureInfo.CurrentCulture) to determine casing rules. This respects language-specific conventions - different languages have different rules for what the lowercase form of a character is.

ToLowerInvariant() uses the invariant culture, which is a fixed, culture-independent set of casing rules based on English (US). It always produces the same result regardless of the user's locale or operating system language settings.


An example using the Turkish language can help highlight the difference.

Uppercase Lowercase Name
İ i Dotted I
I ı Dotless I

In Turkish, DIŞ means outside and its correct lowercase is dış (with a dotless i). If a dotted i is instead used, you get diş which means tooth - a completely different word.

Using ToLowerInvariant here would omit this distinction and ultimately, result in typos in Turkish.

// Thread culture is Turkish (tr-TR)
"DIŞ".ToLower()          // "dış"  (correct - means 'outside')
"DIŞ".ToLowerInvariant() // "diş"  (wrong - means 'tooth')

Turkish is not the only language affected by culture-sensitive casing. Unicode's SpecialCasing.txt file details a few more languages like Greek, Lithuanian, German and various other Turkic languages like Azerbaijani and Kazakh.

e.g.

  • Greek: Σ has two lowercase forms - σ (mid-word) and ς (end of word)
  • Lithuanian: lowercasing Ì removes the accent in certain contexts when combined with soft-dot rules
  • German: not a ToLower issue per se but "STRAßE".ToUpper() → "STRASSE", and round-tripping back produces a different string
  • Azerbaijani: same four-i system as Turkish (Iı, İi)

Now imagine you are writing an SQL parser. Somewhere you have code like:

if (operator.ToLower() == "like")
{
    // handle SQL LIKE operator - does NOT work correctly in every culture
}

SQL grammar does not change when you change cultures. A French user does not write SÉLECTIONNEZ x DE books instead of SELECT x FROM books.

However, since ToLower() respects the current culture, this code will break on a machine with a culture-sensitive casing set e.g. Turkish.

// Thread culture is Turkish (tr-TR)
"LIKE".ToLower()          // → "lıke"  (wrong - dotless i != "like")
"LIKE".ToLowerInvariant() // → "like"  (correct - dotted i == "like")

For the ToLower() version to work, a Turkish user would need to type LİKE (with a dotted capital I) - which is unreasonable.

The fix would be to use ToLowerInvariant() for protocol/grammar comparisons, or better yet, use StringComparison.OrdinalIgnoreCase:

if (operator.Equals("like", StringComparison.OrdinalIgnoreCase))
{
    // handle SQL LIKE operator - works correctly in every culture
}

2 Comments

I am a professional developer for many years and I knew this as "The Turkish 'I' Problem" before. However, this was by far the best and also shortest explanation I read so far. Thank you!
Very good explanation even for a Turk like me :) More information for the curious can be found at Does Your Code Pass The Turkey Test?.
49

I think this can be useful:

http://msdn.microsoft.com/en-us/library/system.string.tolowerinvariant.aspx

update

If your application depends on the case of a string changing in a predictable way that is unaffected by the current culture, use the ToLowerInvariant method. The ToLowerInvariant method is equivalent to ToLower(CultureInfo.InvariantCulture). The method is recommended when a collection of strings must appear in a predictable order in a user interface control.

also

...ToLower is very similar in most places to ToLowerInvariant. The documents indicate that these methods will only change behavior with Turkish cultures. Also, on Windows systems, the file system is case-insensitive, which further limits its use...

http://www.dotnetperls.com/tolowerinvariant-toupperinvariant

hth

2 Comments

@danyolgiax Please can you elaborate further? cant infer its usablity from msdn's link. Thanks
yes ToLowerInvariant is not working in Turkish as expected. İ becomes ı
30

String.ToLower() uses the default culture while String.ToLowerInvariant() uses the invariant culture. So you are essentially asking the differences between invariant culture and ordinal string comparision.

1 Comment

No he isn't. "Ordinal" is a third option - a slightly different way to "ignore" current culture. The distinction isn't relevant in discussing ToLower variants; Ordinal vs. invariant just changes the "sort order" of two strings, doesn't change equality comparison.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.