As an experienced C++ engineer, proper input validation is essential for writing secure and robust applications. Invalid or malformed data is responsible for everything from crashes and exceptions to major security vulnerabilities.

In this comprehensive 3500+ word guide, we will deeply explore techniques and best practices for rigorously validating integer input in C++.

Table of Contents

  • Foundations
  • Stream Extraction and Handling
  • Integer Conversion and Checks
  • Validation Functions
  • Regular Expressions
  • Edge Cases
  • Multithreading Considerations
  • Language Comparisons
  • Security Considerations
  • Custom Validators
  • Conclusion

Foundations

Validating input involves two key steps:

  1. Extrating user input into program memory
  2. Checking if the input meets expected criteria

For integers, this means safely extracting the input and verifying it represents a numeric value.

We first need to decide where the integer input is coming from. Common integer input sources in C++ include:

  • Console/Terminal – std::cin, user typing
  • Network Stream – sockets, protocols
  • File Streams – file inputs, serialization
  • Kernel/Devices – ioctl calls, device communication
  • Interprocess – shared memory, pipes, signals

While the validation concepts are similar across sources, the handling and risks can differ greatly. Console input is simplest, but inputs from remote sources introduce many security considerations.

Stream Extraction and Handling

The standard C++ library provides stream classes for reading inputs from different sources. For example, console input uses std::cin:

int num;
std::cin >> num; 

For integers, formatted extraction checks the input text to ensure it parses to a valid integer value before storing into the target variable.

But raw extraction using streams::operator>> provides little validation:

           Valid |   Invalid
----------------------
std::cin >> myInt;

// User enters:

42        // ok
abc       // fails
4.2       // succeeds but wrong
9999999999 // succeeds but overflows int

So while stream extraction has built-in validation, explicit checking provides more control and safety.

Stream State and Flags

Streams maintain internal state flags during extraction that can be checked afterwards:

int num;
std::cin >> num;

if (std::cin.fail()) {  
  // invalid extraction
}

if (std::cin.bad()) {
  // serious stream error
}

This allows detecting issues after the fact. But often, we want to validate before using any extracted values.

Stream Exceptions

By enabling stream exceptions, we can catch extraction errors using try/catch:

int num;
std::cin.exceptions(std::ios_base::failbit); 

try {
  std::cin >> num; // throws on fail
} catch (std::ios_base::failure& e) {
  // invalid input handling
}

This transitions extraction errors into C++ exceptions for broader handling.

Input Buffering

Streams use internal input buffers for efficiency. For example, std::cin buffers console input.

This can cause unexpected behavior, consuming more input than expected:

int age;
std::cin >> age; // reads int 21

std::string name; 
std::cin >> name; // empty! already read full line

Buffer limits can be configured on streams. Or std::getline() can read whole lines for consistency.

Signal Handling

External events can disrupt console input streams. A SIGINT handle allows handling user interrupt signal (Ctrl + C):

void handle_SIGINT(int signal) {
  std::cin.clear(); // reset failbit
  std::cin.ignore(1000, ‘\n‘); // discard input
}

int main() {
  signal(SIGINT, handle_SIGINT);

  // input handling  
}

Robust input processing involves stream mechanics like buffering, signals, exceptions – not just syntactical validation.

Integer Conversion and Checks

Once input is extracted, we can perform direct checks on whether it represents integer data.

String Streams

A common approach is to extract user input into a std::string instead of direct integer conversion:

std::string input;
std::cin >> input;

Then we can attempt conversion on the string:

int num;
std::stringstream converter(input);

if (!(converter >> num)) {
  // failed to convert
} else {
  // valid integer in num
}

This separates the input extraction from validation and type conversion.

Incremental Conversion

The std::from_chars function converts character sequences, detecting invalid formats:

const char* input = "150";

int num;
auto result = from_chars(input, input + strlen(input), num); 

if (result.ec != std::errc()) {
  // conversion failed from input  
} else {
  // valid integer in num
}

This provides granular incremental parsing without needing intermediate strings.

isdigit() and STD Algorithms

We can check if each character matches an integer digit:

for (char c : input) {
  if (!std::isdigit(c)) { 
    // input contains non-digit  
  }
}

Or equivalently using algorithms:

if (std::any_of(input.begin(), input.end(), [](char c) {
  return !std::isdigit(c); 
})) {
  // input contained non-digit
}

But this doesn‘t fully validate format – "-123" may be valid but fails isdigit check.

Edge Case Values

Certain integer values need special handling:

Empty input -> Invalid or default?
Leading zeros -> Allow or deny? 
Signed overflow -> Define behavior?
Hex values -> Validate separately?
Locale affects ‘-‘ and ‘,‘ parsing -> Globalize?

Define and document your validator‘s edge case policy.

Performance Benchmarks

Validating methods have different computational profiles. Here is a benchmark of validation times by input size:

Integer Validation Benchmark

Fig 1. Comparative integer validation benchmarks (synthetic inputs)

We see:

  • Regex is slow for large inputs due to exponential backtracking
  • Incremental conversion gets faster with size unlike stringstreams
  • Digit check is simple but inconsistent times due to short-circuiting

Understanding performance implications allows choosing the right validator.

Validation Functions

Encapsulating checks into validation functions makes them easily reusable across an application:

bool isValidInteger(const std::string& input) {
  return //... checks here ...
}

if (isValidInteger(userInput)) {
  // use input  
} else {
  // invalid
}

Clean separation between the core program and validation code.

Localization

Supporting international formats involves managing locales:

std::locale::global(std::locale("")); // default locale

bool isValidInteger(const std::string& input) {
  std::locale loc;

  // check input using loc  
}

Locale affects digit grouping, decimal signs (1234.56 vs 1234,56) etc.

Levels of Checking

Varying levels of validation are possible:

Strict

  • Must match exact integer regex format
  • Disallow any extraneous input
  • Throw exceptions on failure

Moderate

  • Allow leading/trailing whitespace
  • Parse integers from messy input
  • Return failure value on error

Lax

  • Simply check Contains an integer
  • Ignore all other input
  • Never invalidate

Support multiple modes adjusting strictness.

Idempotence

Validators should be idempotent – return same output for same input. This may require:

  • No internal state mutations
  • Thread-safe without data races
  • Care with static locals

Idempotence simplifies reasoning about validators in complex code.

Regular Expressions

C++ regular expressions offer a powerful method to define validation rules and pattern match input:

#include <regex>

bool isValidInteger(std::string input) {

  // Integer regex 
  std::regex int_regex("^[+-]?([0-9]+)$");

  // Validate entire input matches  
  return std::regex_match(input, int_regex);
}

Benefits include:

  • Precise control over valid formats
  • Clear explicit definition
  • Detect partial match failures
  • Avoid procedural checks

But watch for performance with complex patterns.

Raw Matches

Direct regex_match() validates the entire input only:

std::regex_match(" +42", int_regex) // fails  
std::regex_match("42", int_regex) // passes

No partial matches – input must satisfy regex fully.

Partial Matches

For partial matching, iterate regex_search():

std::regex int_regex("[0-9]+");

std::smatch matches;
while (std::regex_search(" 12 abc 34 ", matches, int_regex)) {
  // found integer - matches[0]   
}

This finds all integer pieces from messier input.

Unicode and Localization

Regex grammars exist for most international numeric formats:

// Hindi digits
std::wregex hin_int(L"[०-९]+"); 

std::wsmatch matches;
std::regex_search(input, matches, hin_int);

Use wregex and wsmatch for Unicode regex parsing.

Building Validation Regexes

Composing small testable pieces helps construct reliable patterns:

Start minimal^[0-9]+$

Refine^([0-9]+)$

Extend^([0-9]+)|\-([0-9]+)$

Parametrize{2} repetitions, [0-9] character sets

Test rigorously against range of valid/invalid cases

This incremental regex development prevents bugs.

Multithreading Considerations

Validating concurrently across threads requires awareness of:

  • Atomicity -Are checks thread-safe?
  • Reentrancy – Can validators be interrupted/re-entered?
  • Immutability – Does it mutate state?
  • Lock-freedom – Avoid locks slowing threads
  • False sharing – Concurrent cache line access

Address these or simply design validators as pure functions.

Example Thread-Safe Validator

struct IntValidator {

  bool isValid(string input) const {

    // Working memory
    string copy = input; 

    // Immutable checks using copy
  } 

  // Thread-safe
  mutable mutex m;  
};

Key aspects:

  • mutable allows lazy mutex initialization
  • const method avoids visible mutation
  • Local working copy prevents false sharing

This decouples synchronization from validation.

Alternatives to Locking

Other concurrency structures like lock-free queues can validate asynchronously:

ConcurrentQueue<string> inputs; 

void inputThread() {

  while (auto input = inputs.pop()) {

    if (!isValid(input))
      inputs.push(input);
  }
}

No locking but still coordinates checker threads.

Language Comparisons

Validation capabilities vary across languages. For example, Python and C#:

Python

Python int() attempts conversion, throwing ValueError on failure:

try:
  num = int(input) 
except ValueError:
  # handle invalid integer  

The isdigit() string method checks digits simply.

And Python regexes are very similar to C++, compiled ahead-of-time.

C#

C# also wraps conversion in exception handling:

try {
  int num = Int32.Parse(input);
} catch (FormatException) {
  // handle parse failure 
}

It includes Int32.TryParse for cleaner handling without exceptions.

Compared to C++

C++ trades:

  • No built-in conversion functions
  • Manual stringstream/digit parsing
  • Powerful standardized regex library

For advantages:

  • Fine-grained input control
  • Resource efficiency
  • Execution speed

The right choice depends on program goals.

Security Considerations

Attackers exploit invalid input to trigger crashes, code exploits, data issues:

Attack Surface

Fig 2. Vulnerabilities from invalid integer data

Our job is to defend against bad data.

Integer Overflows

Seemingly valid integers can exploit logic errors:

Received int max + 10 -> wraps to MIN_INT  
`BIG_NUMBER - user_val` -> underflows to huge number

Detect overflows by checking value ranges after operations.

Use compiler flags enabling integer overflow traps:

g++ -ftrapv ...

And unsigned integers avoid underflow/overflow by design.

Memory Safety

Simple buffer over-read:

// Vulnerable function  

bool isNegative(const char* num) {
  return num[0] == ‘-‘; 
}

// Attacker exploits:

const char* evil = ""; //underflow
isNegative(evil); // BOOM! 

Use safe strings, length checks and bounds elimination to harden code.

Denial of Service

Seemingly valid inputs configured to:

  • Trigger worst case exponential backtracking regex
  • Generate cache misses stalling pipelines
  • Fork unbounded threads exceeding limits

Require computational resource monitoring to catch abuse early.

Fuzz Testing

FUZZING generates randomized invalid inputs to catch vulnerabilities during development:

Fuzzer

Fig 3. Typical fuzz testing rig

Great practice – mutate known test values just beyond valid boundaries.

Custom Validators

For special cases, craft targeted custom validators:

Input Masks

Formatters mapping to problem domain:

using Money = uint64_t; // cents 

Money parseMoney(string input) {

  // Prefix parse  
  regex dollor("^\\$"); 

  // Split decimal part
  regex cents("\\.\\d{0,2}");

  Money dollars = /*...*/
  Money cents = /* ... */

  return dollars*100 + cents;
}

Domain parsers handle non-standard but valid formats.

Sanitizers

Transforms can clean up messy inputs:

string sanitizePhoneNumber(string input) {

  // Strip non-digits
  remove_if(input.begin(), input.end(), [](char c) {
       return !isdigit(c);  
  });

  // Truncate length 
  input = input.substr(0, 10);   

  return input;
}

bool isValidPhone(string phone) {
  return regex_match(sanitizePhoneNumber(phone), phonePat);  
}

Two stage cleanup, then validation.

Stateful Validators

Maintaining validation state across inputs enables richer constraints:

class UsernameValidator {
 public:
  // State
  std::unordered_set<string> taken_names;

  // Check history
  bool isOriginal(string input) {
    return taken_names.find(input) == taken_names.end(); 
  }

  // Mutate state
  void addName(string input) { 
    taken_names.insert(input);
  }
};

Access control prevents unsafe state changes.

Conclusion

As we have explored, validating integers in C++ provides:

  • Robustness against crashes from bad data
  • Security against injection attacks
  • Safety enforcing domain rules
  • Reliability by eliminating bad failures
  • Consistency with centralized validation

The techniques shown form an essential part of any quality C++ program receiving untrusted inputs. Combining extraction checks, integer conversion, well-tested regular expressions, concurrency awareness and custom validators gives comprehensive protection.

By identifying issues early in processing, we minimize future correctness and security problems deeper in system logic.

With powerful facilities like C++ streams and regular expressions, input validation should pervade every C++ program interfacing with the outside world.

Similar Posts