Character comparison is a fundamental concept in C programming. At its core, it involves analyzing two character values to determine if they are equal or to establish some ordering between them. While seemingly simple on the surface, character comparisons in C have nuances that every developer should understand.

In this comprehensive guide, we‘ll cover the ins and outs of character comparison in C, including:

  • How characters are represented in C
  • Comparison operators for checking equality
  • Functions for lexicographic ordering
  • ASCII and compiler optimizations
  • Best practices for robust comparisons

Character Representation

To compare characters meaningfully, we must first understand how they are stored in C.

The char datatype represents a single character in the execution character set. This is typically ASCII on modern systems, which assigns integer values from 0 to 127 to represent letters, digits, punctuation, and control codes.

For instance, the letter ‘A‘ is stored as the integer 65, ‘B‘ as 66, and so on. By comparing the integer values, C is able to determine ordering and equality for characters.

C allows char variables to be signed or unsigned. But regardless, their size is always 1 byte, meaning they can store integer values from -128 to 127 for signed or 0 to 255 for unsigned.

Here are some examples of char declaration and initialization:

char letter1 = ‘A‘; // Stores 65  
char letter2 = 66;  // Also stores ‘B‘
unsigned char number = 151; 
signed char negative = -34;

The key takeaway is that characters in C are really just integer codes used to represent symbols. Comparing characters is equivalent to comparing their underlying integer values.

Equality Comparison

The most basic form of character comparison is checking for equality. This determines whether two char values represent the same character.

C provides standard comparison operators that can evaluate character equality:

if (ch1 == ch2) {
  // ch1 and ch2 are equal 
}

if (ch1 != ch2) {
  // ch1 and ch2 are NOT equal
}

The == operator checks if the integer values stored in ch1 and ch2 are the same. The != operator conversely checks for inequality.

For example:

char a = ‘X‘; 
char b = ‘Y‘;

if (a == b) {
   printf("Equal"); // This will NOT print
}

if (a != b) {
   printf("Not equal"); // This will print
}

a and b contain different integer codes (88 vs 89), so == evaluates to false and != evaluates to true.

One common pitfall is incorrectly using the assignment = operator rather than the equality == operator in a comparison:

if(ch1 = ch2) { // Wrong, assigns rather than compare 
  ...
}

This assigns ch2 into ch1 rather than comparing, which can lead to subtle bugs.

Ordering Comparisons

Another common need is checking if one character comes before or after another alphabetically or numerically.

C provides 6 relational operators that produce a boolean true/false result indicating ordering between operands:

Operator Meaning
< Less than
<= Less than or equal to
> Greater than
>= Greater than or equal to
== Equal to (equality)
!= Not equal to (inequality)

These leverage the integer codes underlying chars to determine ordering.

For instance:

char a = ‘A‘; // Code 65
char z = ‘Z‘; // Code 90 

if (a < z) {
  printf("a is before z"); // Prints
}

if (z > a) {
   printf("z is after a"); // Also prints   
}

The comparison a < z succeeds because the integer code for ‘A‘ (65) is less than the code for ‘Z‘ (90). The opposite z > a also holds true for the same reason.

We can use this approach to build functions that determine ordering between characters:

int charOrder(char a, char b) {

  if (a < b) {
    return -1; // a comes before b
  } else if (a > b) {
    return 1; // a comes after b
  } else {
    return 0; // a and b are equal
  }

}

This charOrder function encapsulates the comparison logic, returning -1, 0, or 1 to indicate the ordering of a and b.

Comparing Strings

The above techniques work when comparing single char values. But in many cases, we need to compare entire strings.

To compare strings alphabetically, C provides the strcmp function. It takes two strings as arguments and returns an integer indicating their ordering:

int result = strcmp(str1, str2);

The return value works like this:

  • 0 means both strings are identical
  • <0 means str1 comes before str2 alphabetically
  • 0 means str1 comes after str2 alphabetically

For example:

char *name1 = "John"; 
char *name2 = "Sam";

int order = strcmp(name1, name2); // order = -1

This determines that "John" should come before "Sam" alphabetically because J is before S.

The comparison logic:

  1. Compare first chars
  2. If different, that determines order
  3. If same, compare next chars
  4. Repeat until strings differ or end

There is also strncmp to compare only the first n characters.

Leveraging ASCII Ordering

Character ordering comparisons rely on the ASCII standard adopted by C compilers. ASCII assigns codes such that:

  1. Digits come before upper case letters
  2. Upper case letters come before lower case letters
  3. Lower case letters ordered alphabetically

This means ‘A‘ < ‘a‘, since ‘A‘ is 65 and ‘a‘ is 97. And ‘9‘ < ‘A‘ since 9 is 57 while ‘A‘ starts at 65.

programmers leverage these rules to simplify coding logic in comparisons:

bool isLowerCase(char ch) {
  return ch > ‘Z‘; // True if lower case
}

bool isAlpha(char ch) {
   return (ch >= ‘A‘ && ch <= ‘Z‘) ||  
          (ch >= ‘a‘ && ch <= ‘z‘); // Check alpha range
}

Many functions in C‘s standard library rely on ASCII ordering as well.

However, ASCII itself is not mandated by the C standard. So portable code should not assume particular integer values.

Locale Issues

One complication comes from locales which define custom character sets optimized for languages and regions.

The default C locale uses regular ASCII. But alternate locales may use extended character sets such as Unicode where the integer values differ.

This means ordering could vary across locales. For instance, ?? could compare less than Z on a Scandinavian system.

To handle issues with localized character sets, utilize:

  • setlocale to configure locale
  • islower, isupper, etc for portable checks
  • Wide characters and wctype.h

Thankfully, most modern environments use Unicode UTF-8 which closely matches ASCII ordering rules for the basic English alphabet. So issues are rare with typical system configurations.

Best Practices

Here are some key best practices when comparing C characters and strings:

  • Use == and > style operators for readability
  • Leverage strcmp and related string functions over manual traversal
  • Understand implications of sign, unsigned with chars
  • Use helper functions like isalpha, isdigit over magic numbers
  • Specify UTF-8 encoding or use wide chars for unicode support
  • Always check strings are not NULL before comparing
  • Make case insensitive comparisons explicit with tolower/toupper

Following these guidelines will help avoid common pitfalls when comparing chars in C programs.

The key is understanding exactly what lies beneath simple character comparisons and using that knowledge to write robust code. Mastering the nuances covered here will serve any C developer well.

Similar Posts