CSES Counting Bits: Patterns, Proofs, and Practical Implementations

I still remember the first time I wrote a counting-bits solution that timed out. I had a clean loop, a neat popcount, and a false sense of confidence. Then I pushed N to 10^15 and watched every millisecond vanish. That moment forced me to look past the surface and recognize the pattern hidden in binary ranges. If you’re working through CSES, the Counting Bits problem is where you learn to stop thinking per-number and start thinking per-bit.

Here’s what you’ll get from me: a clear mental model for counting ones from 1 to N, a derivation you can reuse, and two complete implementations that handle 10^15 without breaking a sweat. I’ll also show you the common errors I see in interviews and code reviews, and how I now test this kind of solution using 2026-era tooling without turning the post into a tool ad. My goal is for you to finish this and feel like you could explain it on a whiteboard or drop it into a production analytics job that needs to count bits fast.

Problem Restatement in My Words

You’re given a single integer N where 1 ≤ N ≤ 10^15. You must count the total number of set bits (ones) in the binary representations of every integer from 1 to N, inclusive.

Example: for N = 7, you count ones in 1,2,3,4,5,6,7 and the total is 12. For N = 5, the total is 7.

A brute-force approach that sums popcount(i) for each i works for small N, but 10^15 numbers is far beyond any reasonable loop. You need a logarithmic solution that treats bit positions as patterns rather than individual numbers.

The Pattern I Always Look For

The key pattern is this: in the sequence of binary numbers from 0 to 2^b – 1, each bit position is on exactly half the time. That gives a simple count.

If N = 2^b – 1, then total ones from 0 to N is:

b * 2^(b-1)

Why? Each of the b bit positions is on for half of the 2^b numbers. We exclude 0 or include it; it doesn’t matter because it has zero ones.

Once you see this, the non-perfect boundary case becomes a structured split:

Let m be the position of the highest set bit in N (0-based).
The numbers from 0 to 2^m – 1 form a complete block, so their total ones are m * 2^(m-1).
The numbers from 2^m to N have the highest bit set. That contributes (N – 2^m + 1) ones right away.
The remainder of each number in that upper block is just the low bits from 0 to N – 2^m, which is the same problem again.

That gives a recurrence:

count(N) = m * 2^(m-1) + (N – 2^m + 1) + count(N – 2^m)

This recurrence is only about log2(N) steps deep. That’s why it works for 10^15.

Walking Through the Example Like I Do When Debugging

Take N = 6 (binary 110).

Highest set bit position m = 2 (since 2^2 = 4).

Complete block 0..3 has total ones = 2 * 2^(2-1) = 4.
Highest-bit contribution for 4..6 is (6 – 4 + 1) = 3 ones.
Remaining low bits correspond to 0..2, so add count(2).

Now count(2): m = 1.

Complete block 0..1 has ones = 1 * 2^(1-1) = 1.
Highest-bit contribution for 2..2 is 1.
Remaining low bits correspond to 0..0, so count(0) = 0.

Total: 4 + 3 + (1 + 1 + 0) = 9.

Check: numbers 1..6 are 1(1),2(1),3(2),4(1),5(2),6(2) → 9. Good.

That’s the core idea I want you to internalize: chunk the range by the highest power of two, then recurse on the remainder.

A Clean Iterative Form I Use in Practice

I rarely keep the recursive form in production because it’s easy to write iteratively with the same logic and no stack depth concerns. I’ll show you a Python version first because it makes the math visible, then C++ for CSES.

Python (iterative, 64-bit safe)

from typing import Final
def countsetbits_upto(n: int) -> int:
# Counts set bits from 1 to n inclusive.
if n <= 0:
return 0
total = 0
while n > 0:
# Highest power of two <= n
m = n.bit_length() - 1
p = 1 << m  # p = 2^m
# Complete block 0..p-1
if m > 0:
total += m * (p >> 1)
# Highest-bit contribution for p..n
total += n - p + 1
# Remainder
n = n - p
return total
if name == "main":
print(countsetbits_upto(7))  # 12
print(countsetbits_upto(5))  # 7

Notes:

bit_length() is a clean way to find the highest set bit in Python.
I keep it iterative for clarity and to avoid recursion depth issues.
All arithmetic fits in Python’s big ints, but the same formula works with 64-bit integers in C++.

C++ (CSES-ready)

#include 
using namespace std;
using int64 = long long;
int64 countsetbits_upto(int64 n) {
if (n <= 0) return 0;
int64 total = 0;
while (n > 0) {
// Highest set bit position
int64 m = 63 - builtin_clzll(n);
int64 p = 1LL << m; // p = 2^m
// Complete block 0..p-1
if (m > 0) {
total += m * (p >> 1);
}
// Highest-bit contribution for p..n
total += n - p + 1;
// Remainder
n = n - p;
}
return total;
}
int main() {
ios::syncwithstdio(false);
cin.tie(nullptr);
long long n;
if (!(cin >> n)) return 0;
cout << countsetbits_upto(n) << "\n";
return 0;
}

I prefer builtinclzll for speed and clarity in C++17/20. It’s well supported in GCC/Clang. If you want portability across compilers without that builtin, you can write a loop or use bitwidth in C++23.

Why This Works: A Short Proof I Use in Interviews

When someone asks me to justify the recurrence, I keep it short:

Let p = 2^m be the highest power of two ≤ N. Then N can be written as p + r where 0 ≤ r < p.
Numbers from 0 to p – 1 form a complete binary cycle across m bits. Each bit is on exactly half the time. So total ones there is m * 2^(m-1).
Numbers from p to N have the highest bit on, contributing r + 1 ones in that bit.
The remaining lower bits in the range p..N run from 0 to r. That’s exactly the same problem of counting ones from 0 to r.

That gives the recurrence. The algorithm reduces N to r each step, so the number of iterations is at most the number of bits in N, which is about 50 for 10^15.

Common Mistakes I See (and How I Avoid Them)

I review a lot of competitive solutions and I keep seeing the same traps. Here’s what I watch for.

Off-by-one in the highest bit contribution: The count is n – p + 1, not n – p. The range includes both endpoints.
Forgetting the base case: If N becomes 0, stop. This prevents extra terms and avoids a negative shift.
Using 32-bit shifts: In C++ a shift on 1 defaults to 32-bit. Always use 1LL to avoid overflow.
Using recursion without tail control: Recursion is fine here, but I’ve seen stack depth mistakes when people mix in unrelated recursion. Iterative is safer.
Mixing 0..N with 1..N: The derivation uses 0..N, but the task is 1..N. Since 0 has zero ones, it doesn’t change the result, but you must stay consistent or you’ll add an extra term by accident.

When I test, I always validate both tiny values (1..16) and random values against a brute-force checker. That catches almost every bug.

Performance Characteristics You Should Expect

This approach is O(log N) time and O(1) space. For N up to 10^15, that’s around 50 iterations. In practice:

C++ runs in far less than 1 ms on typical contest hardware.
Python is also fast, typically a few microseconds to tens of microseconds.

I don’t quote exact numbers because your environment matters, but it’s effectively instantaneous for this input range.

When to Use This vs Other Methods

You should use this pattern whenever you need counts over ranges, not just for 1..N. It’s easy to extend to count ones in a range [L, R] by computing count(R) – count(L – 1).

I do not recommend this approach if:

You need to count set bits for only a handful of numbers (just use popcount).
You need per-number results rather than a total. This method gives totals, not individual bit counts.
You need bit counts under mod 2 or other number-theory constraints; a different bit DP might be clearer.

For CSES Counting Bits, this is the right approach, full stop.

Traditional vs Modern Workflow for This Problem

Here’s how I see the tradeoff in 2026.

Approach

Traditional

Modern (2026) —

—

— Understanding

Manual derivation on paper

Derive + quick check with a symbolic tool or notebook Implementation

Handwritten recursion or loops

Same core logic, but assisted by unit-test generators Validation

A few hardcoded samples

Brute-force checker + property tests for random N Debugging

Print statements

LLM-assisted explanation + local quick tests

I still do the derivation myself, but I don’t skip automation anymore. A tiny test harness with a random brute-force check has saved me from subtle off-by-ones more times than I can count.

A Small Test Harness I Actually Use

This is the kind of quick validation I run locally. It’s short, it’s fast, and it catches mistakes before submission.

import random
def countsetbits_bruteforce(n: int) -> int:
return sum(bin(i).count("1") for i in range(1, n + 1))
def countsetbits_upto(n: int) -> int:
if n <= 0:
return 0
total = 0
while n > 0:
m = n.bit_length() - 1
p = 1 << m
if m > 0:
total += m * (p >> 1)
total += n - p + 1
n -= p
return total
for _ in range(2000):
n = random.randint(1, 100000)
a = countsetbits_upto(n)
b = countsetbits_bruteforce(n)
if a != b:
print("Mismatch", n, a, b)
break
else:
print("All good")

That kind of harness also helps you build confidence if you want to adapt the logic to ranges or other variants.

Extending to Range Queries

A lot of problems extend this idea. If you ever need the total set bits in [L, R], use:

count(R) – count(L – 1)

You can wrap that into a function and reuse the core logic. I’ve used this in analytics code that counts bits in ID ranges for compression stats, and it’s the same pattern every time.

Edge Cases Worth Testing

I keep a small set of cases that I run for any implementation:

N = 1 → 1
N = 2 → 2 (1 + 1)
N = 3 → 4 (1 + 1 + 2)
N = 7 → 12 (as in the example)
N = 8 → 13 (adds one for 8)
N = 2^k – 1 (like 31, 63, 255)
N = 10^15 (performance and 64-bit safety)

The pattern of 2^k – 1 is especially helpful because you can compute the answer directly: k * 2^(k-1).

A Visual Intuition I Use

If you’re more visual, imagine the numbers from 0 to 15 laid out in binary. Each bit toggles in a regular rhythm: the least significant bit alternates every 1, the next every 2, then every 4, and so on. Counting ones in a full cycle is just half the cycle length. The recurrence takes the biggest full cycle you can fit, counts it, and then repeats for the remaining tail. I find that analogy makes the formula feel obvious instead of magical.

Why This Matters Beyond CSES

Even if you’re solving this for a contest, the skill transfers directly to real engineering problems:

Counting set bits quickly appears in bitset analytics, compression statistics, and network telemetry.
The idea of “count on full cycles, then handle the tail” shows up in prefix sums, combinatorics, and database statistics.
The recurrence pattern is similar to digit DP, which you’ll see in harder tasks.

So I treat this problem as a training ground for a way of thinking rather than a one-off trick.

Deeper Derivation: Counting by Bit Position

Sometimes the recurrence feels like a magic trick until you view it as a sum across bit positions. I like to have both mental models on hand. The recurrence is fast to implement, but the bit-position view is great for explaining why the counts are so structured.

For any bit position k (0-based), the pattern in numbers from 0 to N is periodic:

It stays 0 for 2^k numbers, then 1 for 2^k numbers, and repeats every 2^(k+1).

If you define:

cycle = 2^(k+1)
full_cycles = (N + 1) // cycle
remainder = (N + 1) % cycle

Then the count of ones at bit k from 0 to N is:

full_cycles * 2^k + max(0, remainder – 2^k)

This formula is incredibly useful for validating the recurrence. For small N, I sometimes compute the total as the sum across k to verify my loop:

count(N) = sum over k of onesatbit_k(N)

It’s O(log N) either way. The recurrence feels simpler to code because you keep shrinking N, but the per-bit formula is sometimes easier to prove on the spot because it’s just integer division and modulus.

Another Implementation: Per-Bit Formula (Python)

I don’t use this for CSES submissions because the recurrence is a few lines shorter, but the per-bit formula is great for teaching and testing. It’s also a good alternative if you want to explicitly talk about cycles.

def countsetbitsuptobitwise(n: int) -> int:
if n <= 0:
return 0
total = 0
# We need bits up to the highest set bit
maxbit = n.bitlength()
for k in range(max_bit):
cycle = 1 << (k + 1)
full_cycles = (n + 1) // cycle
remainder = (n + 1) % cycle
total += full_cycles * (1 << k)
total += max(0, remainder - (1 << k))
return total

This version is still O(log N) and uses only integer operations. For N up to 10^15, the loop is around 50 iterations. I like it as a cross-check against my recurrence implementation.

Practical Scenarios Where This Saves You

I don’t treat this as “just a contest trick.” The same technique can save real time in production data flows when you need fast, aggregated bit counts. A few examples from projects I’ve seen:

Compressed analytics: Counting ones across a large integer range gives a quick estimate of density in bitmaps before deciding whether to compress or re-encode them.
Monitoring counters: If IDs are assigned sequentially and bit patterns matter (for sharding or partitioning), you can calculate distribution without iterating over huge ranges.
Networking telemetry: Some low-level protocols pack flags into bits. Aggregating counts across ranges can be done by formula instead of scanning logs.

These aren’t hypothetical. If you ever build or maintain a system that uses bitsets or packed flags, you eventually need to count distributions fast. That’s where this method earns its keep.

When This Approach Is Not the Best Fit

There are cases where the recurrence is elegant but not the right tool. I learned to spot these early so I don’t force the wrong abstraction.

Non-contiguous sets: If the range is not [1..N] or [L..R], this method doesn’t help unless you can decompose into ranges.
Query-heavy systems: If you need many repeated queries over changing ranges, a precomputed prefix array or Fenwick tree on smaller bounds might be better.
Bitwise constraints: If you need counts only for numbers matching a mask or other condition, digit DP or DP over bits is more direct.

This is a range-aggregate method. Don’t use it to compute something it wasn’t designed for.

Bit-Level Intuition: The “Wave” View

One mental model that made this stick for me is the wave analogy. Each bit is a square wave:

The least significant bit is 010101…
The next is 00110011…
The next is 00001111…

When you count ones in a full period, you always get half the length of the period. So the total across a prefix is just the number of full periods plus whatever part of the wave your prefix cuts off. The recurrence is simply jumping from one cut to the next largest boundary.

Once you internalize the wave view, you can apply it to other problems where binary patterns repeat, such as counting numbers with a given parity pattern or summing values of specific bits in a prefix.

Defensive Implementation Tips (for C++ and Python)

I keep a short checklist in my head when I implement this in a new environment:

Use 64-bit or bigger. For N up to 10^15, a 64-bit signed integer is safe, but the totals can be larger than N. The maximum total set bits up to N is roughly N * log2(N), still within 64-bit for 10^15, but do the math and be explicit.
Always compute p as 1LL << m in C++ to avoid overflow on 32-bit literals.
Avoid negative shifts. If m can be 0, guard the m * 2^(m-1) term.
Verify for N = 0 even if the problem says N >= 1. It makes your function reusable and reduces edge-case bugs.

This checklist has saved me from errors in interviews where I’m coding quickly under pressure.

Time Complexity: What “Log N” Actually Feels Like

We toss around O(log N) a lot, but here’s the gut-level reality for this specific problem:

N up to 10^15 means at most 50 bits.
Each loop iteration is constant time: a couple of shifts, a subtraction, and a multiply.
The runtime is so small that the input parsing and output often dominate the total runtime.

So if you implement this and it still feels slow, something else is wrong. The most common culprit is accidentally reintroducing an inner loop or doing floating-point operations instead of bit shifts.

Alternative Approach: Recursive with Memoization (and Why I Rarely Use It)

You might be tempted to write a recursive function that caches results for small n. It works and it can be clean. But for this specific problem, memoization doesn’t buy you much because each call strips off the highest power of two and never repeats the same n. In other words, there’s almost no overlap.

So while I’m not against recursion, I prefer the iterative loop because it is straightforward, non-recursive, and still very readable.

A Bit of Math: Upper Bound on the Answer

Sometimes I need to sanity-check whether my type choice is safe. A quick upper bound helps:

For N with b bits, the total number of set bits from 1 to N is at most b 2^(b-1) + (N – 2^(b-1)) 1, but a loose and easy upper bound is b N. For N = 10^15, b is about 50, so b N is around 5 10^16, well within signed 64-bit (which is about 9 10^18). That’s a quick mental check that 64-bit is safe.

I don’t write this in code, but I keep it in mind when I’m deciding types on the fly.

Practical Debugging: How I Isolate Off-By-One Errors

When this logic fails, it almost always fails at boundaries. My debugging process is simple:

Check N = 1, 2, 3, 4, 5 manually.
Check N = 2^k – 1 and verify the formula k * 2^(k-1).
Check N = 2^k and verify the answer is previous + 1.
Compare the recurrence and per-bit methods on random values.

If those pass, I’m confident. If any fail, I focus on the highest-bit contribution or the base case.

A Compact C++ Variant Without Builtins

If you’re working in a setting where builtin_clzll is not available, here’s a simple manual way to find the highest set bit. It’s not as fast as the builtin but still fine for 50 iterations.

int64 highestpowerof_two(int64 n) {
int64 p = 1;
while ((p << 1) <= n) {
p <<= 1;
}
return p;
}
int64 countsetbits_upto(int64 n) {
if (n <= 0) return 0;
int64 total = 0;
while (n > 0) {
int64 p = highestpowerof_two(n);
int64 m = 0;
while ((1LL << m) < p) m++;
if (m > 0) total += m * (p >> 1);
total += n - p + 1;
n -= p;
}
return total;
}

This is a bit more verbose, but it demonstrates the idea without relying on compiler intrinsics. I prefer the builtin in contest settings, but the manual approach is a good fallback.

Extending Further: Counting Zero Bits

A common follow-up question is: can we count zeros instead of ones? You can, but you need to define what range of bits you’re counting. A number’s binary representation has no fixed width, so “zero bits” is ambiguous unless you specify a bit-width. If you define a fixed width b (like 64 bits), then zero bits from 1 to N is simply:

b * N – countonesinbbits

where countonesinbbits is the total ones in the first b bits across all numbers. You can compute countonesinbbits with the same per-bit formula or by summing the recurrence result across width. I rarely need this, but it’s useful to know the reasoning is symmetric.

Extending Further: Count of a Specific Bit in [L, R]

If you need to count how often a particular bit k is set in [L, R], it’s straightforward with the per-bit formula:

countbitk(R) – countbitk(L – 1)

This is essentially the same as the range sum, but it’s focused on one bit. I’ve used this to analyze how “balanced” a particular bit is across a range of IDs in a data pipeline.

Production Considerations: Reliability Over Cleverness

In production systems, I prioritize clarity and testability over golfing the solution. That means:

I keep the logic in a single function with a docstring explaining the recurrence.
I include a brute-force validator behind a debug flag or unit test for small n.
I avoid floating-point math to reduce subtle errors.
I prefer explicit types so it’s clear where 64-bit is required.

This is still a tiny function, but clarity is how you avoid mistakes months later when you come back to it.

A Minimal Range Function I Actually Use

When I need a [L, R] result, I wrap it cleanly so it’s hard to misuse.

def countsetbits_range(l: int, r: int) -> int:
if l > r:
return 0
return countsetbitsupto(r) - countsetbitsupto(l - 1)

This is tiny, but I can’t tell you how many times I’ve seen someone re-derive range logic and get it wrong. I keep it as a helper and move on.

The “Feel” of the Recurrence (How I Explain It Live)

If you ever need to explain this quickly, here’s the short version I use:

Find the highest power of two p <= N.
Count everything in the block 0..p-1, which is m * 2^(m-1).
Count the highest bit in the tail, which is N – p + 1.
Recur on the tail size N – p.

That’s it. If the listener understands why each step makes sense, they’ll understand the whole solution.

Common Interview Pitfalls and How I Navigate Them

In interviews, the most common pitfalls aren’t about the math—they’re about communication. I’ve seen strong candidates fail to explain why the recurrence works. Here’s how I avoid that:

I start with the full block 0..2^m-1 and explain the half-on pattern.
I explicitly define r = N – 2^m so they can visualize the tail.
I say “the remaining bits run from 0 to r,” so it’s obvious why the subproblem is identical.
I mention the O(log N) time because it’s a natural conclusion from the bit-length shrink.

If you can articulate those points, you’re solid.

Performance Comparison: Brute Force vs Pattern Counting

I don’t need exact timings to show why the pattern wins. It’s more about orders of magnitude.

Brute force does N popcounts, so it scales linearly with N.
The pattern method does around log2(N) iterations, so it scales with the number of bits.

For N = 10^15, that’s the difference between a trillion operations and about fifty iterations. That gap isn’t just faster—it’s the difference between impossible and instant.

One More Worked Example (N = 13)

I like to include a second example because it helps you internalize the steps.

N = 13 (binary 1101). Highest power of two is 8 (2^3). So m = 3, p = 8, r = 5.

Block 0..7: m 2^(m-1) = 3 4 = 12
Highest bit in 8..13: N – p + 1 = 6
Remainder is 5, so add count(5)

Now count(5): highest power is 4 (2^2). m = 2.

Block 0..3: 2 * 2 = 4
Highest bit in 4..5: 2
Remainder is 1, so add count(1)

count(1): m = 0

Block 0..0 has 0 ones
Highest bit in 1..1: 1
Remainder 0

Total = 12 + 6 + 4 + 2 + 1 = 25

You can quickly brute-force to verify: popcounts of 1..13 sum to 25. That’s the recurrence in action.

Quality Checklist I Run Before Submitting

I have a short mental list I run through before I submit this to CSES:

Did I include 1LL in shifts?
Is my base case n <= 0 returning 0?
Is the highest bit contribution n – p + 1?
Does it pass the 2^k – 1 sanity test?
Do I keep everything in 64-bit?

It’s boring, but it prevents silly mistakes.

Closing Thoughts and Next Steps

When I solve this problem now, I don’t think about counting bits at all. I think about patterns in ranges. The moment you recognize the full block 0..2^m – 1, the rest of the solution becomes a simple loop: count the full block, count the highest-bit ones in the tail, then repeat on the remainder. That’s a pattern you can reuse over and over.

If you’re preparing for CSES or interviews, I recommend two quick exercises after you finish this: first, implement the range version count(L, R) using the same function; second, write a brute-force checker and run random tests until you trust your math. Those two steps build speed and confidence, and they guard you from the subtle off-by-one error that almost everyone makes once.

If you want to go further, try deriving the total number of zero bits in the same range, or adapt the approach to count occurrences of a specific bit in [L, R]. The same ideas carry you a long way. Once you internalize this, you’ll spot similar structure in other problems and solve them faster, with fewer lines and more certainty.

Expansion Strategy

Add new sections or deepen existing ones with:

Deeper code examples: More complete, real-world implementations
Edge cases: What breaks and how to handle it
Practical scenarios: When to use vs when NOT to use
Performance considerations: Before/after comparisons (use ranges, not exact numbers)
Common pitfalls: Mistakes developers make and how to avoid them
Alternative approaches: Different ways to solve the same problem

If Relevant to Topic

Modern tooling and AI-assisted workflows (for infrastructure/framework topics)
Comparison tables for Traditional vs Modern approaches
Production considerations: deployment, monitoring, scaling