tr Command in Unix/Linux with Examples: A Deep, Practical Guide

The last time I had to sanitize a multi‑gigabyte log stream before feeding it into a parser, the “simple” part wasn’t the parsing at all—it was the character cleanup. Tabs were inconsistent, zero‑width junk had slipped in, and a few vendor tools were still shouting in uppercase. I reached for a tiny utility I’ve relied on for years: tr. It sits in the pipeline like a scalpel, taking raw text and reshaping it character by character. If you’ve ever needed to normalize case, remove unwanted characters, or compress noisy spacing without spinning up heavier tools, tr is exactly the kind of sharp, focused tool that pays off fast.

I’ll walk you through how tr actually operates, how its sets work, and where it shines in real workflows. I’ll also cover the options you’ll see most, show runnable examples, highlight mistakes I still see in production scripts, and explain when you should use something else. You’ll leave with a practical playbook for text cleanup in modern Unix pipelines.

What tr really does (and what it refuses to do)

tr reads from standard input and writes to standard output. It doesn’t open files by name, it doesn’t understand fields or words, and it doesn’t process lines in a structured way. It works strictly at the character level. That narrow focus is why it’s fast and predictable, but it’s also why you should not expect it to do tasks like replace substrings or regular expressions. If you need to replace “error” with “warning,” tr is the wrong tool.

Think of tr as a translation table operating on bytes. You supply a source set (SET1) and a target set (SET2). Every character in the input that matches a character from SET1 is replaced by the corresponding character in SET2. If you use options like -d or -s, it can delete or squeeze characters instead of translating them.

When you need to normalize input before parsing or storing, this is exactly the right level of control. In my experience, it’s the quickest way to enforce consistent casing, strip stray control characters, or make sure columns are clean before handing a stream to tools like awk, cut, or JSON parsers.

Basic syntax and mental model

The core syntax is small and stable:

tr [OPTION] SET1 [SET2]

That’s it. It always reads from stdin, so the most common patterns are:

  • Pipe into it: cat file.txt | tr ‘a-z‘ ‘A-Z‘
  • Redirect into it: tr ‘a-z‘ ‘A-Z‘ < file.txt
  • Use a here‑string: tr ‘a-z‘ ‘A-Z‘ <<< "hello"

I usually avoid the cat form unless I’m already in a pipeline. Input redirection is clearer and avoids unnecessary processes. When you work with large data or in containers, shaving off extra processes can matter.

Simple case conversion

Let’s start with the most common use: changing letter case.

# Convert lowercase to uppercase

tr ‘a-z‘ ‘A-Z‘ < message.txt

If the file contains:

Welcome to

Tech Ops

The output will be:

WELCOME TO

TECH OPS

A more portable version uses character classes rather than literal ranges:

# Portable case conversion

tr ‘[:lower:]‘ ‘[:upper:]‘ < message.txt

This is safer across locales and non‑ASCII ranges. If you’re in a mixed environment, I strongly recommend the character classes.

Character sets, ranges, and classes you’ll actually use

The power of tr is in its set notation. You can describe characters in three main ways:

  • Literal lists: abc123 matches any of those characters.
  • Ranges: a-z matches all lowercase letters in ASCII.
  • Character classes: [:digit:], [:lower:], [:upper:], [:space:], [:alpha:], and so on.

I treat ranges as “quick local scripts” and classes as “production‑safe scripts.” Locale settings can change how ranges behave, especially with multibyte encodings. Character classes are more consistent across systems.

Common classes worth memorizing

  • [:lower:] lowercase letters
  • [:upper:] uppercase letters
  • [:digit:] digits 0–9
  • [:space:] whitespace (space, tab, newline, etc.)
  • [:blank:] space and tab only
  • [:alpha:] letters
  • [:alnum:] letters and digits
  • [:cntrl:] control characters
  • [:punct:] punctuation

That last distinction—[:space:] vs [:blank:]—is critical when you’re trying to preserve line breaks. If you squeeze or translate [:space:], you will affect newlines too.

Escapes and portability notes

tr recognizes some common escape sequences like \t for tab and \n for newline, but shell quoting can change how they’re interpreted. I usually prefer single quotes for literal sets and use $‘...‘ when I need C‑style escapes in shells that support it.

# Safer tab usage in bash/zsh

tr $‘\t‘ ‘ ‘ < data.tsv

If you need maximum portability, use printf to generate the character and then feed it to tr, or just use character classes like [:blank:].

Translation in practice: real‑world scenarios

1) Normalize a configuration file’s casing

If you’re dealing with config values that should be case‑insensitive, normalize them early.

tr ‘[:upper:]‘ ‘[:lower:]‘  env.normalized

This keeps downstream parsing simple. I often combine this with a grep for specific keys afterward.

2) Replace whitespace with tabs for alignment

If you need to convert a simple space‑delimited text file into a tab‑delimited one:

echo "alpha beta gamma" | tr ‘[:space:]‘ ‘\t‘

Output:

alpha	beta	gamma

Be careful: [:space:] includes newlines. If you want only spaces and tabs, use [:blank:] instead. This is a mistake I’ve seen cause flattened files when developers “fix spacing” and accidentally convert every newline to a tab.

3) Swap braces for parentheses in a code export

If you’re normalizing a simple format for ingestion or display:

tr ‘{}‘ ‘()‘  normalized.txt

That simple translation works great for single‑character swaps. If you need to replace { with [ and } with ], just adjust the sets.

4) One‑time cleanup before feeding a parser

Imagine you have a stream that mixes tabs and multiple spaces, and your parser expects single spaces. You can compress the noise with -s:

tr -s ‘ ‘  clean.txt

This “squeeze” operation turns repeated spaces into a single space. For tab plus space cleanup, use tr -s ‘[:blank:]‘ ‘ ‘ and preserve newlines carefully.

5) Remove non‑ASCII characters for legacy systems

Sometimes you have to feed data into old systems that choke on non‑ASCII bytes. You can strip anything that is not in the ASCII printable range by deleting everything except the space‑tilde range and newlines.

# Keep printable ASCII and newlines

tr -cd ‘\11\12\15\40-\176‘ ascii.txt

This uses octal ranges to preserve tabs (\11), newlines (\12), carriage returns (\15), and printable ASCII (\40–\176). It’s a useful trick when Unicode breaks downstream consumers.

Deleting characters cleanly with -d

Deletion is where tr often saves me from writing a quick script in Python or JavaScript.

Remove a specific character

echo "Welcome To Labs" | tr -d ‘W‘

Output:

elcome To Labs

If that looks too blunt, that’s because it is. tr -d removes all matches globally. If you need context‑aware deletion, this isn’t the right tool.

Remove all digits from a line

echo "Order 73920" | tr -d ‘[:digit:]‘

Output:

Order

If you need to remove digits but keep punctuation or symbols, tr makes that easy with classes.

Strip control characters before parsing

tr -d ‘[:cntrl:]‘  clean.json

This can save parsers that choke on unexpected control bytes. Just be mindful that you might be hiding upstream data corruption if those bytes shouldn’t be there in the first place.

Complementing sets with -c: select what you want to keep

I use -c to invert the set, effectively saying “match everything except these characters.” It’s powerful and slightly dangerous if you forget that -c applies to SET1 only.

Keep only digits

echo "ID=73535" | tr -cd ‘[:digit:]‘

Output:

73535

This is a clean way to extract numeric IDs from noisy strings. It’s not a regex, but it’s faster and simpler in a shell pipeline.

Keep only letters and spaces

echo "alpha-123" | tr -cd ‘[:alpha:] ‘

Output:

alpha

Note the space inside the quotes—this allows spaces to remain. Without it, you’d strip all whitespace too.

Keep only safe filename characters

If you’re normalizing user input into safe filenames, you can keep alphanumerics, dash, underscore, and dot:

echo "Report 2026/01 (final).pdf" | tr -cd ‘[:alnum:]_.-‘

Output:

Report202601final.pdf

This creates a conservative filename without spaces or slashes. It’s not perfect for every environment, but it’s a good baseline.

Squeezing repeats with -s: make text readable

The -s option collapses repeated characters into a single instance. This is great for normalizing whitespace or removing noisy separators.

Normalize spacing

echo "Welcome    To    Labs" | tr -s ‘ ‘

Output:

Welcome To Labs

Compress repeated punctuation

If you have repeated commas or dots from messy data dumps:

echo "a,,,b....c" | tr -s ‘,.‘

Output:

a,b.c

This works because -s squeezes repeats of any character in the specified set. It’s not limited to whitespace.

Trim repeated blank lines

Sometimes log exporters insert multiple blank lines. You can squeeze newlines by translating any blank line run into a single newline. The trick is to squeeze newline characters directly:

tr -s ‘\n‘  log.compact.txt

This is useful for post‑processing, but keep in mind it removes intentional blank spacing too.

Truncating sets with -t: the subtle option

-t truncates SET1 to match the length of SET2 during translation. It’s useful when you’re pairing sets of different lengths and you want precise control over the translation. Most of the time, you won’t need it, but it’s handy in data cleanup or quick encodings.

Example: map lowercase vowels to a single character by truncating the set explicitly.

echo "education" | tr -t ‘aeiou‘ ‘X‘

The result replaces all vowels with X by truncating to the first character. Without -t, behavior can vary by implementation. I use it when I want explicit behavior rather than relying on defaults.

Complete, runnable examples you can copy

Example: sanitize a user list for CSV import

Goal: uppercase names, remove digits, compress whitespace.

cat users.txt  tr ‘[:lower:]‘ ‘[:upper:]‘  tr -d ‘[:digit:]‘tr -s ‘ ‘

If users.txt contains:

Ada 1 Lovelace

grace 42 Hopper

Output:

ADA LOVELACE

GRACE HOPPER

This is a great example of tr acting as a low‑overhead data cleanup step before you import into a database or CSV.

Example: normalize log labels to lowercase

tr ‘[:upper:]‘ ‘[:lower:]‘  app.normalized.log

Log parsers often treat labels as case‑sensitive. Normalizing early prevents noisy duplicates.

Example: strip control characters before parsing JSON

tr -d ‘[:cntrl:]‘  clean.json

I’ve used this before feeding data into strict parsers. Be careful: removing control characters can hide encoding errors if you don’t know why they’re there.

Example: convert Windows line endings to Unix

tr -d ‘\r‘  unix.txt

This removes carriage returns that often appear in Windows text files.

Example: normalize delimiters to a single pipe

If you have a mix of commas and semicolons as separators:

echo "a,b;c,d"  tr ‘,;‘ ‘‘

Output:

abcd

This is a simple but effective cleanup step before parsing with cut -d‘|‘ or similar tools.

Edge cases and safe handling

Newlines and carriage returns

If you ingest files from Windows systems, you may see \r carriage returns. tr can remove them:

tr -d ‘\r‘  unix.txt

That’s the standard fix, but it’s also worth checking whether the file has mixed line endings. If it does, you may want to normalize with a more robust tool or re‑export at the source.

UTF‑8 and multibyte characters

tr operates on bytes by default. For UTF‑8 text, you can still use character classes, but range matching can be tricky. If your data includes non‑ASCII characters, I recommend testing on representative input or using higher‑level tools that are Unicode‑aware.

A practical compromise: use tr for ASCII‑level cleanup (like whitespace and control characters), then pass the stream to a Unicode‑aware tool for any language‑specific transformations.

Null bytes

tr is not designed to handle binary data with null bytes reliably in all environments. If you need to sanitize binary blobs, use tools like perl -pe or a dedicated binary‑safe parser.

Invisible characters that are not whitespace

You may encounter zero‑width spaces or non‑breaking spaces that aren’t captured by [:space:] on some systems. If your cleanup isn’t working as expected, inspect the bytes with tools like od -An -t x1 and then delete the exact byte ranges with tr -d using octal or hex escapes.

Common mistakes and how to avoid them

1) Expecting tr to do substring replacements

If you try this:

echo "error" | tr ‘error‘ ‘warn‘

You might expect “warn,” but you will not get that. tr maps character‑to‑character. It will translate each e to w, r to a, and so on. Use sed or perl for substring replacements.

2) Using ranges under the wrong locale

In some locales, a-z may include more characters than you think. If you need predictable ASCII behavior, set the locale explicitly:

LC_ALL=C tr ‘a-z‘ ‘A-Z‘ < file.txt

For portable scripts, I prefer character classes to avoid surprises.

3) Destroying newlines with [:space:]

[:space:] includes newlines. If you use:

tr -s ‘[:space:]‘ ‘ ‘

You will collapse the entire file into a single line. If you want to compress multiple spaces but preserve line breaks, use:

tr -s ‘[:blank:]‘ ‘ ‘

This is one of the most common production bugs I see in quick shell scripts.

4) Forgetting that tr doesn’t read files by name

You’ll sometimes see:

tr ‘a-z‘ ‘A-Z‘ file.txt

That won’t work as intended; it treats file.txt as a set. Use input redirection or pipe instead.

5) Assuming order doesn’t matter in sets

When translating, the order of characters in SET1 and SET2 matters. It’s a positional mapping. If you reorder either set, you change the mapping. Keep this in mind when you use ranges or custom lists.

When to use tr vs other tools

I’ll make this simple: use tr when you need character‑level operations and the transformation can be described as “replace characters, delete characters, or compress repeated characters.”

Use tr when:

  • You need to normalize case fast.
  • You want to delete a set of characters.
  • You want to compress repeated characters.
  • You’re working in a tight pipeline and want minimal overhead.

Use something else when:

  • You need substring or regex replacements (use sed, perl, or python).
  • You need field‑aware transformations (use awk).
  • You’re working with structured data like JSON (use jq).

The modern pattern I recommend is: tr for cleanup, awk for field logic, and higher‑level languages for structure. I still use this even in 2026 because the tools are reliable and the pipelines are transparent.

Performance and scalability notes

tr is extremely fast because it operates as a simple character translation table and streams input without storing it. On most modern systems, you’ll see operations in the single‑digit to low‑double‑digit millisecond range for modest files, and it scales almost linearly with input size. I’ve used it on gigabyte‑scale logs where it adds only a small percentage of overhead.

If performance matters, avoid unnecessary processes. Instead of cat file | tr, use tr < file. When piping through multiple tools, keep the pipeline lean and prefer built‑ins where possible.

A practical micro‑optimization I use: combine operations when it helps readability, but don’t sacrifice clarity for tiny gains. tr is so fast that the real cost often sits in disk I/O or other tools in the pipeline.

Real‑world pipeline patterns

Normalize whitespace, then parse columns

tr -s ‘[:blank:]‘ ‘ ‘ < data.tsv | cut -d' ' -f1,3

This normalizes inconsistent spacing before extracting columns. I often do this when merging exports from older tools that don’t respect tabs consistently.

Clean then count unique words

tr ‘[:upper:]‘ ‘[:lower:]‘ < article.txt  tr -cd ‘[:alpha:]\n ‘  tr -s ‘ ‘  tr ‘ ‘ ‘\n‘  sortuniq -c

This chain lowercases, removes non‑letters, collapses spaces, and then counts word frequency. It’s not the shortest, but it’s clear and fast. I still use this pattern for quick exploratory analysis before jumping into a notebook.

Extract numeric IDs from mixed text

tr -cd ‘[:digit:]\n‘  ids.txt

Each numeric sequence becomes contiguous, and line breaks are preserved. This is a quick way to get candidate IDs for downstream validation.

Build a safe slug from a title

If you need a URL‑friendly slug, you can use tr in a small pipeline. It’s not perfect for every language, but it’s practical:

echo "  My New Post: Tips & Tricks  " \

| tr ‘[:upper:]‘ ‘[:lower:]‘ \

| tr -cd ‘[:alnum:] \n-‘ \

| tr -s ‘ ‘ \

| tr ‘ ‘ ‘-‘

This lowercases, removes punctuation, compresses spaces, and converts spaces to hyphens.

Normalize mixed delimiters then split

echo "a,b;cd"  tr ‘,;‘ ‘:‘  tr ‘:‘ ‘\n‘

This turns commas, semicolons, and pipes into colons, then splits into one item per line. It’s a quick way to normalize inconsistent input before further processing.

Practical scenarios: when tr shines and when it doesn’t

Great fits

  • Pre‑processing for parsers: clean whitespace and control characters before JSON, CSV, or custom parsing.
  • Config normalization: standardize case and remove stray characters.
  • Log cleanup: strip problematic bytes from vendor tools.
  • Quick data prep: lightweight transformations without leaving the shell.

Not a good fit

  • Context‑aware replacements: anything that depends on surrounding characters.
  • Complex transforms: multiple rules with branching logic.
  • Unicode‑heavy transformations: scripts with non‑ASCII rules that need awareness of code points.

If you’re on the fence, ask yourself: “Can I express the change as a character‑level rule?” If yes, tr is probably the right tool.

Alternative approaches (and why you might prefer them)

Sometimes tr is not the best tool. It’s helpful to know what to reach for instead.

  • sed: Best for substring replacements or regex operations. If you need to replace full words, sed is clearer.
  • awk: Best when you need field logic, numeric calculations, or conditional transformations.
  • perl/python: Best for Unicode‑aware transforms or complex processing that goes beyond simple character rules.
  • jq: Best for JSON—don’t try to “fix” JSON with tr alone if the structure matters.

tr is the fast, simple baseline. Use it when it’s enough, and don’t be shy about switching tools when it isn’t.

Debugging and verification tactics

When I’m not sure whether a tr pipeline is doing the right thing, I do two quick checks:

1) Show the raw bytes: Use od -An -t x1 or xxd before and after to see if unwanted bytes are still there.

2) Sample small input: Run the pipeline on a tiny, representative string so you can reason about each step.

printf ‘a\tb\r\n‘ | od -An -t x1

If you see 0d in the output, that’s a carriage return. Strip it with tr -d ‘\r‘ and verify again. This kind of quick feedback loop saves a lot of time in production cleanup scripts.

A compact reference for options you’ll actually use

  • -d: Delete characters in SET1
  • -s: Squeeze repeated characters in SET1 into single occurrences
  • -c: Complement SET1 (match everything except those characters)
  • -t: Truncate SET1 to the length of SET2 during translation

You can combine these, but be careful about how they interact. The most common combo is -cd to keep only certain characters.

Modern workflow integration (2026 context)

Even with AI‑assisted workflows and newer shell environments, tr is still a useful primitive. I often generate a quick pipeline using AI assistance, then refine it to use tr for the simplest cleanup steps. It’s also a perfect fit inside container images where you want minimal dependencies.

If you’re using task runners or build scripts, consider placing tr transformations in dedicated shell scripts and unit‑testing them with bats or a small test harness. It keeps your pipeline deterministic and easy to reason about. In a CI context, tr is lightweight enough to run on every build without meaningful cost.

I also like tr in preprocessing steps for data pipelines. A tiny, explicit normalization step can eliminate entire classes of parsing bugs downstream.

Practical guidance: use, don’t overuse

I recommend using tr as the first cleanup tool in your pipeline. It’s predictable and easy to audit. But don’t twist it into something it isn’t. When you need structured data manipulation, use tools made for that. When you need complex replacements, reach for sed or a language like Python.

Here’s a personal rule I follow: if I can describe the change as a set of “character rules,” I use tr. If I’m thinking in words or patterns, I stop and switch tools.

A quick checklist before you ship a tr‑based script

  • Did you avoid [:space:] when you meant [:blank:]?
  • Are you sure your locale won’t expand the range unexpectedly?
  • Did you test the pipeline on representative data, not just a toy string?
  • Are you sure tr is the right tool and not sed or awk?
  • Have you preserved line breaks if downstream tools rely on them?

Running through this checklist takes a minute and saves a lot of cleanup later.

Key takeaways and next steps

If you take only a few things from this, I’d keep them simple. First, tr is a character translator, not a substring replacer. It reads from stdin, writes to stdout, and shines when you need fast, deterministic cleanup. Second, sets and classes are the heart of its power—use them carefully, especially when it comes to whitespace. Third, if your transformation depends on context or structure, use a different tool.

If you want to go further, I suggest building a small library of reusable tr snippets for your team: case normalization, whitespace cleanup, control‑character stripping, and delimiter normalization. These small utilities pay off repeatedly and keep pipelines readable. And if you’re teaching or onboarding, tr is a perfect introduction to the philosophy of Unix text tools: small, sharp, composable, and reliable.

Scroll to Top