Anagram Checking in Python with collections.Counter

I still remember reviewing a search feature that flagged duplicate entries by checking if two names were anagrams. The engineer had written a sort-and-compare solution, which worked, but it was harder to read than it needed to be and slower than expected for large inputs. When I swapped it for a one-liner with collections.Counter, the code became clearer, the intent popped, and the edge cases were easier to reason about. That kind of small refactor is exactly why I like Counter for anagram checks: it tells the story of the problem directly—count the characters, compare the counts.

You should care because anagrams show up in real systems more often than you’d think: word games, search normalization, duplicate detection, cipher puzzles, and content moderation. If you do this often, you want a solution that is readable, correct for messy inputs, and fast enough for production. I’ll show you how I implement it today in Python, how I handle tricky edge cases like Unicode and whitespace, and where Counter shines versus alternatives. I’ll also call out common mistakes I see in code reviews and give practical guidance for when Counter is the right tool and when it isn’t.

The core idea: compare character counts

Think of each string as a bag of tiles from a word game. If two bags contain the same tiles in the same quantities, the strings are anagrams. Counter is a dictionary subclass that turns this idea into code: it maps characters to their counts. Two Counters are equal if their keys and values match, regardless of order.

Here’s the simplest form, and yes, I use this exact pattern in real code:

from collections import Counter

def is_anagram(a: str, b: str) -> bool:

return Counter(a) == Counter(b)

If you’re mentoring someone new, this version is perfect. It shows intent, it’s easy to test, and it’s only two lines. But production inputs are rarely this clean, so let’s make it more realistic.

A practical, production-friendly version

In my experience, anagram checks usually need normalization: case insensitivity, whitespace handling, and sometimes punctuation removal. You should choose your normalization rules based on your domain. For example, in a word game, spaces and punctuation might be ignored. In a code or ID comparison, you might preserve them.

Here’s a version I’d ship for human-language inputs:

from collections import Counter

import unicodedata

import re

NONLETTER = re.compile(r"[^\w]", re.UNICODE)

def normalize_text(text: str) -> str:

# Normalize Unicode to reduce surprises (e.g., accented forms)

text = unicodedata.normalize("NFKD", text)

# Lowercase for case-insensitive comparison

text = text.lower()

# Remove non-word characters (spaces, punctuation)

text = NONLETTER.sub("", text)

return text

def is_anagram(a: str, b: str) -> bool:

na = normalize_text(a)

nb = normalize_text(b)

return Counter(na) == Counter(nb)

This version does three things that keep you out of trouble:

It normalizes Unicode, so "résumé" and "resume" are treated consistently if that’s your desired behavior.
It lowercases both strings, so "Abcd" and "dCba" are treated as anagrams.
It strips punctuation and whitespace, so "dormitory" and "dirty room" can match.

If those rules are too aggressive for your use case, remove the steps you don’t want. I always prefer to make the rules explicit rather than letting implied assumptions drive correctness.

How Counter comparison actually works

A Counter is a dictionary that maps items to counts. Dictionary equality in Python checks three things in order:

1) the number of keys,

2) the names of the keys,

3) the equality of each value per key.

Order does not matter. That’s why Counter("abcd") == Counter("dabc") evaluates to True. That’s also why Counter("abcf") == Counter("kabc") is False: the key sets differ. I mention this because some developers mistakenly think dict order affects equality, but it doesn’t. The entire anagram check is just an equality comparison between two count maps.

If you want to peek at what Counter produces, try this:

from collections import Counter

print(Counter("dabc"))

You’ll see something like:

Counter({‘d‘: 1, ‘a‘: 1, ‘b‘: 1, ‘c‘: 1})

The order might vary, but the mapping is the same. That’s exactly what you want for an anagram check.

Single-line solution, and why I still prefer it

If you already have clean inputs, I’m perfectly happy with a single line in a utility module:

from collections import Counter

def is_anagram(a: str, b: str) -> bool:

return Counter(a) == Counter(b)

This is a rare case where a one-liner is both terse and readable. You can drop it into a codebase without a comment and it still explains itself. If a reviewer asks what it’s doing, the answer is obvious. That’s the sweet spot.

Traditional vs modern approaches

There are a few common ways to solve anagrams in Python. Here’s how I compare them in 2026, with a short recommendation after the table.

Approach

Traditional behavior

Modern take (2026)

When I pick it

—

Sort and compare

Sort both strings, then compare

Clean, but creates new sorted strings and can be slower for large inputs

Good for teaching, not my default in production

Manual count loop

Use a dict or array, increment/decrement

Fast, but more code and more chances for bugs

Useful in tight loops, embedded, or non-Python languages

collections.Counter

Build counts and compare

Readable, expressive, low friction

My default for Python text inputs

Early-exit checks

Compare lengths first, then counts

Good micro-optimization

Use when inputs are large and you want quick rejectsMy recommendation: use Counter by default for Python. It’s clean, reliable, and very hard to misread. Reach for manual counting only when you’ve measured performance issues or you need complete control over normalization and memory use.

Performance and complexity you can trust

For most text sizes you’ll encounter in web apps, Counter is fast. Complexity is O(n) time, where n is the length of the string, because it counts each character once. Space is O(k), where k is the number of distinct characters. In practice, k is bounded by the character set you allow. With lowercase English letters, k is at most 26. With Unicode, k can be larger, but still far smaller than n for typical inputs.

What about real-world timing? On modern laptops and servers, I typically see anagram checks on short words complete in under a millisecond, and even long strings (tens of thousands of characters) often finish in the 10–20ms range. That’s usually good enough for search pipelines and validation APIs. If you’re doing it at massive scale, you’ll likely batch or precompute counts anyway.

A practical micro-optimization that actually helps: check lengths first. If lengths differ, they can’t be anagrams, so you can skip building Counters.

from collections import Counter

def is_anagram(a: str, b: str) -> bool:

if len(a) != len(b):

return False

return Counter(a) == Counter(b)

In my benchmarks, this helps most when the majority of comparisons are negative. It’s not a must, but it’s harmless and sometimes useful.

Common mistakes I see in code reviews

I’ve reviewed dozens of anagram implementations, and the same mistakes show up again and again. Here are the ones you should actively avoid:

1) Forgetting normalization rules

If your input includes uppercase letters or spaces, a raw Counter comparison will say two obvious anagrams are not equal. Decide the rules and apply them consistently.

2) Removing punctuation but not whitespace

This often happens when someone uses str.isalpha() and forgets that spaces are neither letters nor punctuation. The result is that spaces are kept or removed inconsistently.

3) Assuming ASCII only

If you’re working with names or multilingual content, you’ll get Unicode input. Without normalization, "é" and "e" won’t compare the way you expect. Sometimes that’s correct; sometimes it’s not. Make a deliberate choice.

4) Misusing Counter subtraction

I sometimes see code like Counter(a) - Counter(b) == Counter(). This can work, but it’s less clear and it can hide negative counts. Equality comparison is simpler and clearer.

5) Over-optimizing too early

If you’re processing thousands of strings per second, sure, measure. But I’ve seen developers replace a one-line Counter comparison with 20 lines of manual counting in the name of performance—without any benchmark data. The extra code creates more bugs than it saves CPU.

Real-world scenarios and edge cases

Let me make this concrete with a few scenarios I’ve handled.

1) Search deduplication

Imagine you’re deduplicating a list of product names where order of letters doesn’t matter for a specific feature. You should normalize case and spacing, but keep numbers:

from collections import Counter

import re

NONALNUM = re.compile(r"[^a-z0-9]", re.IGNORECASE)

def normalize(text: str) -> str:

text = text.lower()

return NONALNUM.sub("", text)

def is_anagram(a: str, b: str) -> bool:

return Counter(normalize(a)) == Counter(normalize(b))

This preserves digits, which matters for things like model numbers.

2) Word games with strict rules

In a word game, you might allow only letters and ignore accents. That’s a different choice:

from collections import Counter

import unicodedata

def normalize_letters(text: str) -> str:

text = unicodedata.normalize("NFKD", text)

# Keep only ASCII letters

return "".join(c for c in text.lower() if "a" <= c <= "z")

def is_anagram(a: str, b: str) -> bool:

return Counter(normalizeletters(a)) == Counter(normalizeletters(b))

3) Anagram checking in APIs

If you’re building an API, you should return clear errors for invalid input, rather than quietly normalizing everything. For example, reject empty strings if they aren’t meaningful in your domain, or reject inputs longer than a threshold to prevent abuse.

from collections import Counter

def is_anagram(a: str, b: str) -> bool:

if not a or not b:

raise ValueError("Inputs must be non-empty strings")

if len(a) > 10000 or len(b) > 10000:

raise ValueError("Inputs too long")

return Counter(a) == Counter(b)

This is less about anagrams and more about operational safety. I recommend this kind of guard when you’re exposing the function through public endpoints.

When to use Counter—and when not to

I use Counter for most Python anagram checks, but it isn’t always the best choice. Here’s my practical guidance:

Use Counter when:

You want readability and correctness with minimal code
Inputs are strings or small lists of symbols
You need simple, reliable behavior without complicated control flow

Avoid Counter when:

You need to compare streams of data without loading everything in memory
You need extreme micro-optimizations and have benchmark evidence
You’re working in a language without a Counter equivalent and want a portable algorithm

If you do need streaming, you can adapt the counting idea to a single pass with a dict and update counts as data arrives. But for typical in-memory strings, Counter is the right tool.

Handling Unicode and normalization safely

This is where many anagram implementations break in real systems. Unicode can represent the same visual character in multiple ways. For example, "é" can be a single code point or the combination of "e" and an accent mark. If you compare raw strings, those may not match, even if they look identical to users.

Normalization solves this. I recommend unicodedata.normalize("NFKD", text) for most text processing tasks. It splits combined characters into base + modifiers, which makes it easier to strip accents if needed. If you want to preserve accents, use NFC instead, which composes characters into a canonical form.

Here’s a practical switch that lets you choose the behavior:

from collections import Counter

import unicodedata

def normalize(text: str, strip_accents: bool = True) -> str:

form = "NFKD" if strip_accents else "NFC"

text = unicodedata.normalize(form, text)

if strip_accents:

text = "".join(c for c in text if not unicodedata.combining(c))

return text.lower()

def isanagram(a: str, b: str, stripaccents: bool = True) -> bool:

return Counter(normalize(a, stripaccents)) == Counter(normalize(b, stripaccents))

If you don’t need Unicode normalization, skip it. But if you’re dealing with names or global input, it’s worth being explicit.

Practical testing patterns

I’m a big fan of small, explicit test cases that capture the rules you care about. Here’s a minimal set you can drop into a test file:

from collections import Counter

def is_anagram(a: str, b: str) -> bool:

return Counter(a) == Counter(b)

def testanagrambasic():

assert is_anagram("abcd", "dabc") is True

assert is_anagram("abcf", "kabc") is False

def testanagramcase():

assert is_anagram("Abcd", "dabc") is False

If you add normalization, expand your tests accordingly. I usually include at least one test for whitespace, punctuation, and accents.

The “single line” in context

It’s tempting to drop the one-line solution into a blog and call it a day. But I want you to see it as a tool, not a magic trick. The reason it’s so effective is that it aligns the code with the concept. In other words, the algorithm matches the mental model. That’s a great sign in any system: when the code reads like the spec, bugs tend to fall away.

When I’m reviewing a pull request, I always ask: does the code show the idea? Counter does. That’s why I still reach for it even in 2026, even with the rise of AI-assisted tools. I can ask an AI to generate a solution, but I still want to read it quickly, verify it, and trust it in production. Counter helps with that.

A quick look at memory behavior

Counter is a dictionary, so each unique character becomes a key. The memory footprint is proportional to the number of distinct characters, not the length of the string. That’s a big deal for long inputs made of a small alphabet. If you know your inputs are lowercase English letters, you can even replace Counter with a fixed-size list of 26 integers and squeeze out a bit more speed and memory efficiency. But again, only do that if you need it. Most applications don’t.

Here’s the fixed-size version for completeness:

def isanagramascii(a: str, b: str) -> bool:

if len(a) != len(b):

return False

counts = [0] * 26

for ca, cb in zip(a, b):

counts[ord(ca) – 97] += 1

counts[ord(cb) – 97] -= 1

return all(c == 0 for c in counts)

This is fast, but it’s also more restrictive and easier to misuse. I reserve it for very tight loops where I’ve already measured that Counter is a bottleneck.

Guardrails for production systems

If you’re putting an anagram checker into a production service, here are the guardrails I recommend:

Validate input type and length; reject overly large strings early.
Decide normalization rules and document them clearly.
Add tests for your most common inputs and your edge cases.
Avoid logging raw input if it might be sensitive.
If you expect untrusted input, consider time and memory constraints.

These aren’t specific to anagrams, but they matter when you turn a simple algorithm into a real API endpoint.

Putting it all together with a clean, readable function

Here’s a version that balances readability, performance, and real-world robustness. I use this pattern in production when I want case-insensitive anagrams and I want to ignore whitespace and punctuation:

from collections import Counter

import re

import unicodedata

NONWORD = re.compile(r"[^\w]", re.UNICODE)

def normalize(text: str) -> str:

text = unicodedata.normalize("NFKD", text)

text = text.lower()

text = NONWORD.sub("", text)

return text

def is_anagram(a: str, b: str) -> bool:

na = normalize(a)

nb = normalize(b)

if len(na) != len(nb):

return False

return Counter(na) == Counter(nb)

if name == "main":

print(is_anagram("abcd", "dabc"))

print(is_anagram("abcf", "kabc"))

print(is_anagram("Dormitory", "Dirty room"))

Notice the length check after normalization. I do it after normalization because removal of punctuation can change lengths. That’s a subtle bug I’ve seen many times.

What I expect you to take away

If you remember only one thing, remember this: for an anagram check, count characters and compare counts. That’s what Counter does, and it makes the code almost self-documenting.

You should also remember that “simple” inputs are a myth in most production systems. Real data has whitespace, case differences, punctuation, and Unicode. Decide your rules explicitly, encode them in normalization logic, and write tests that show those rules. That’s how you avoid slow, painful bugs later.

Finally, don’t overthink performance. Counter is fast enough for nearly every common case, and it keeps your code clean. Only switch to manual counting after you’ve measured real slowdowns.

If you want to extend this further, here are two practical next steps: add language-specific normalization for your user base, and integrate the function into a test suite with both positive and negative cases. Those are small investments that pay off quickly, especially as your codebase grows.

That’s the approach I use in 2026: simple, explicit, measurable, and easy to trust.

The core idea: compare character counts

A practical, production-friendly version

How Counter comparison actually works

Single-line solution, and why I still prefer it

Traditional vs modern approaches

Performance and complexity you can trust

Common mistakes I see in code reviews

Real-world scenarios and edge cases

1) Search deduplication

2) Word games with strict rules

3) Anagram checking in APIs

When to use Counter—and when not to

Handling Unicode and normalization safely

Practical testing patterns

The “single line” in context

A quick look at memory behavior

Guardrails for production systems

Putting it all together with a clean, readable function

What I expect you to take away

You maybe like,

Related Posts