Mobile Numeric Keypad Problem — A Deep, Practical Guide

I run into this keypad-counting problem whenever I’m building input logic, testing graph DP skills, or teaching state transitions. You’re given a phone-style numeric keypad and a length n. Starting from any digit 0–9, you can stay on the same key or move up, down, left, or right to a neighbor, and you need the count of all possible sequences of length n. No diagonals, and the * and # positions are blocked. This looks simple, but it’s a perfect example of how to model movement as a graph and then do dynamic programming on top of it.

What I like about this problem is that it connects three practical ideas: grid modeling, adjacency lists, and DP with time/space tradeoffs. I’ll walk you through the model, the recurrence, and a few implementations, then show a space-saver that I use in production when n gets large. I’ll also call out edge cases, common mistakes, and when I would and would not choose this approach in real systems.

Modeling the keypad as a graph

I start by fixing the keypad layout in my head:

1 2 3

4 5 6

7 8 9

The * and # spots are invalid, and moves are allowed up, down, left, right, or staying in place. So each digit is a node in a graph, and each valid movement is an edge. That gives us a tiny, fixed graph of 10 nodes with small degree (max 5: self + up/down/left/right). For DP, I always precompute neighbors for each digit; it makes the code cleaner and faster than repeated coordinate math.

Here’s the neighbor set I use (including the key itself). You can derive these by hand from the grid. I treat each digit as a node and list its allowed next digits:

0 -> 0, 8
1 -> 1, 2, 4
2 -> 2, 1, 3, 5
3 -> 3, 2, 6
4 -> 4, 1, 5, 7
5 -> 5, 2, 4, 6, 8
6 -> 6, 3, 5, 9
7 -> 7, 4, 8
8 -> 8, 5, 7, 9, 0
9 -> 9, 6, 8

Once you see the keypad as a graph, the counting problem becomes: “How many length-n walks exist in this graph if I can start anywhere?” That’s straight DP on paths in a directed graph (the graph is undirected here, but the DP is the same).

Recurrence and the core DP idea

Let dp[len][d] be the number of sequences of length len that end on digit d. The recurrence is:

dp[1][d] = 1 for all digits d (a sequence of length 1 is just the digit itself)

dp[len][d] = sum over neighbors p of d of dp[len-1][p]

In words: to end on digit d with length len, you must have been at any neighbor p with length len-1, then pressed d. Because staying on the same key is allowed, each digit includes itself as a neighbor.

The answer is sum over all digits of dp[n][d].

This DP is O(n 10 degree), and the degree is at most 5. So it is linear in n with a tiny constant. The classic recursive version hits 5^n paths, which is only good for explanation or tiny n. For real input sizes, I use memoized recursion or an iterative table.

A simple recursive baseline (use only for teaching)

I like to show the recursive form first because it reveals the structure. But I would never ship it in production without memoization.

Mathematically:

getCount(i, j, n) = sum of getCount(x, y, n-1) over all valid moves (including staying)

Base cases:

If (i, j) is invalid or is * or #, return 0.
If n == 1, return 1.

This version is exponential because it repeats subproblems. If you run n = 30, it becomes unusable. It’s still good to sketch it, then immediately show how memoization converts it to linear time.

Top-down DP (memoization)

In practice, I prefer memoization if I’m prototyping quickly or writing in a language that makes recursion elegant. You still use the same recursive function, but you cache results by (digit, length). This reduces complexity to O(n * 10).

I usually convert the 2D keypad into digit-based neighbors to avoid repeated boundary checks. That also makes memoization keys smaller.

Here is a runnable Python example using digit neighbors and memoization. It’s fully iterative on the data model, but recursive in structure. I’ve kept comments focused on the non-obvious parts.

from functools import lru_cache
NEIGHBORS = {
0: [0, 8],
1: [1, 2, 4],
2: [2, 1, 3, 5],
3: [3, 2, 6],
4: [4, 1, 5, 7],
5: [5, 2, 4, 6, 8],
6: [6, 3, 5, 9],
7: [7, 4, 8],
8: [8, 5, 7, 9, 0],
9: [9, 6, 8],
}
@lru_cache(maxsize=None)
def count_sequences(length, digit):
if length == 1:
return 1
total = 0
for prev in NEIGHBORS[digit]:
total += count_sequences(length - 1, prev)
return total
def total_sequences(n):
if n <= 0:
return 0
return sum(count_sequences(n, d) for d in range(10))
if name == "main":
print(total_sequences(1))  # 10
print(total_sequences(2))  # 36

If you prefer bottom-up for production clarity or to avoid recursion limits, the next section is the one I use most often.

Bottom-up DP (tabulation)

Bottom-up is straightforward and avoids recursion depth issues. I build a 2D table where rows are lengths and columns are digits. Because the number of digits is fixed (10), the table is tiny even for large n.

This JavaScript version is runnable in Node and follows the same recurrence. It uses arrays sized 10, which makes it fast and cache-friendly.

const NEIGHBORS = {
0: [0, 8],
1: [1, 2, 4],
2: [2, 1, 3, 5],
3: [3, 2, 6],
4: [4, 1, 5, 7],
5: [5, 2, 4, 6, 8],
6: [6, 3, 5, 9],
7: [7, 4, 8],
8: [8, 5, 7, 9, 0],
9: [9, 6, 8],
};
function totalSequences(n) {
if (n <= 0) return 0;
let dp = new Array(10).fill(1); // length 1
for (let len = 2; len <= n; len++) {
const next = new Array(10).fill(0);
for (let d = 0; d <= 9; d++) {
let sum = 0;
for (const prev of NEIGHBORS[d]) {
sum += dp[prev];
}
next[d] = sum;
}
dp = next;
}
return dp.reduce((a, b) => a + b, 0);
}
console.log(totalSequences(1)); // 10
console.log(totalSequences(2)); // 36

This is O(n) time and O(10) space. That’s effectively O(n) time and O(1) space, which is what I ship for large n.

Space-optimized DP (rolling array)

If you compare the memoized version and the tabulated version, you’ll notice that dp[len] depends only on dp[len-1]. That means you never need the full table. The JavaScript example above already uses a rolling array, which is the practical space-optimized version.

In production code, I typically do a few extra things:

Use typed arrays if the language supports them and counts stay within 64-bit limits.
Handle large n with modular arithmetic if counts are expected to exceed 64-bit integer range.
Precompute neighbors as arrays to avoid map lookups in tight loops.

If you know that n is at most a few thousand, the rolling array is more than enough. If n could be millions, you’ll need to think about performance and integer overflow. I’ll talk about that in a later section.

Traditional vs modern workflows

The algorithm itself is simple, but the workflow around it has changed. Here’s how I compare older habits vs what I do now, especially when I’m iterating in 2026 tooling.

Area

Traditional

Modern approach I prefer —

—

— Modeling

Hard-code grid with nested loops

Precompute neighbor list once Validation

Hand-check a few inputs

Property tests + quick brute-force for small n Performance

Guess runtime

Add a micro-benchmark for n=1e6 style loops Debugging

Print tables

Use small-step visualizer and snapshot tests Collaboration

Share snippets

Share a small module with tests and docs

The key change is that I don’t just code the DP; I wrap it in tiny tests that verify invariants. For example, I check n=1 gives 10 and n=2 gives 36, and I also verify that dp[len][d] grows monotonically for each digit (it should never shrink when staying in place is allowed). These tests are fast and catch neighbor mistakes early.

Common mistakes and how I avoid them

I’ve seen a few recurring errors in interviews and code reviews. Here’s my short list.

Forgetting that staying on the same key is allowed. This changes the count by a lot and is the #1 source of wrong answers.
Treating * or # as valid cells. If you compute neighbors by coordinates, make sure to filter those out.
Off-by-one errors in length. I always define dp[1] explicitly, then loop from 2 to n. It makes the loop bounds clear.
Mis-modeling neighbors, especially for 0 and 8. I write the neighbor list on paper once and keep it as a constant.
Using recursion without memoization. That version only works for very small n.

I also recommend a small brute-force checker for n up to 3 or 4, just for self-check. If you generate all sequences by brute force for n=3, the count is still tiny and you can verify the DP output quickly.

When to use this DP and when not to

I use this DP when:

n can be large, and you need the exact count.
The graph is small and fixed, so precomputation is cheap.
You need a clear, deterministic result without randomness.

I don’t use this DP when:

You need actual sequences, not just counts (the number explodes quickly).
The move rules change per step in a complicated way (then I model it as a matrix and use exponentiation).
The graph is huge (millions of nodes). Then I’d look for symmetry, compression, or a completely different model.

If the move rules are static and you only need counts, this DP is the cleanest approach I know.

Performance considerations and scaling

For typical interview constraints, O(n) time is enough. In real systems, I still think about scaling:

With 10 digits and at most 5 neighbors each, a loop of length n is extremely fast. For n around 1e7, I expect a few tens of milliseconds to a couple hundred milliseconds in a compiled language, and maybe a few hundred milliseconds to a second in a high-level one, depending on hardware.
Counts grow quickly. For large n, values will exceed 64-bit. If you need exact counts, use big integers. If you only need counts modulo some number, apply modulus at each step.
If n is huge (think 1e9), you can model it as matrix exponentiation on a 10×10 matrix. That’s a different tool, but it’s the natural next step for very large n.

In practical workflows, I usually do the rolling-array DP and only consider matrix exponentiation if I see n that won’t finish in time. For typical mobile keypad tasks, that’s rarely necessary.

Edge cases and sanity checks

These small checks save me from silly bugs:

n <= 0: define it as 0 sequences. If you want to define n=0 as 1 empty sequence, do it explicitly and adjust your recurrence. I usually keep it 0 to avoid confusion.
n = 1: answer must be 10.
n = 2: answer must be 36.
Symmetry check: digits 1 and 3 should always have equal counts, same for 4 and 6, and 7 and 9. If your table breaks that, your neighbors are wrong.

I also validate that dp[len][5] grows faster than most digits, since 5 has the highest degree. It’s not a proof, but it’s a good smell test.

Putting it into practice

When I implement this in a codebase, I wrap it as a small module with two entry points: one for exact counts (big integer) and one for modulo counts (for large n). I keep the neighbor list as a constant, add a couple of tests for n=1 and n=2, and a small property test for symmetry. This gives me confidence that the logic is correct and resilient to future refactors.

If you’re teaching or learning, the recursive version is worth showing first, but I always nudge readers to the DP version for any real inputs. It’s cleaner, faster, and easier to reason about once you’ve seen the recurrence.

Deeper intuition: seeing it as a walk count

Once the neighbor list is fixed, each step is just a move in a small graph. That means you can reframe the problem as: “How many walks of length n-1 exist in this graph, across all starting nodes?” The DP is effectively counting walk totals by endpoint. This perspective is useful because it generalizes immediately. If you ever need to change the keypad or add new allowed moves, you just adjust the adjacency list and your DP still works.

It also helps explain why the counts explode. At each step, many digits have multiple choices (2–5 possible moves). Even if you never branch wildly, the number of paths grows exponentially with n. The DP isn’t making growth smaller; it’s just computing it efficiently.

A clean coordinate-based model (when you want to derive neighbors)

Sometimes I’m working in a language where it’s more convenient to generate neighbors from coordinates, especially if I might switch to a different keypad layout. In those cases, I use a grid model and precompute neighbors once.

Key idea: Keep a 4×3 grid with placeholders for * and #. Then, for each digit, check its potential moves. Here’s a concise Python sketch showing the pattern without bogging down the rest of the article:

Build a 4×3 grid like [[1,2,3],[4,5,6],[7,8,9],[None,0,None]].
For each digit, look at the four direction deltas plus (0,0) for staying.
Only keep positions that are inside the grid and not None.
Map digits to their allowed neighbors.

I almost never do this per call; I compute once and reuse. This is a good compromise if you want the neighbor list to be “derived” rather than hard-coded, which helps avoid transcription errors.

A production-ready version with modulo support

The main production twist is overflow. Even for modest n, the counts exceed 64-bit. If you need exact numbers, use big integers. If you only need the answer modulo M, apply mod at each step.

Here is a practical JavaScript version with an optional modulus. I keep it simple and deterministic, and I don’t default to BigInt unless I must.

const NEIGHBORS = [
[0, 8],          // 0
[1, 2, 4],       // 1
[2, 1, 3, 5],    // 2
[3, 2, 6],       // 3
[4, 1, 5, 7],    // 4
[5, 2, 4, 6, 8], // 5
[6, 3, 5, 9],    // 6
[7, 4, 8],       // 7
[8, 5, 7, 9, 0], // 8
[9, 6, 8],       // 9
];
function totalSequences(n, mod = null) {
if (n <= 0) return 0;
let dp = new Array(10).fill(1);
for (let len = 2; len <= n; len++) {
const next = new Array(10).fill(0);
for (let d = 0; d < 10; d++) {
let sum = 0;
const prevs = NEIGHBORS[d];
for (let i = 0; i < prevs.length; i++) {
sum += dp[prevs[i]];
}
next[d] = mod ? (sum % mod) : sum;
}
dp = next;
}
let total = 0;
for (let d = 0; d < 10; d++) total += dp[d];
return mod ? (total % mod) : total;
}
console.log(totalSequences(1));
console.log(totalSequences(2));
console.log(totalSequences(50, 1000000_007));

This is intentionally minimal. It scales well for large n and is easy to port to any language.

Handling big integers in practice

If you need exact counts for large n, you’ll quickly blow past 64-bit limits. In languages with big integers built in (like Python), you’re already safe. In languages without them, you have to decide if you want to implement big integer arithmetic or shift to modular counts.

My typical decision rule:

If n is under a couple hundred, exact counts might still fit in 64-bit (depending on growth). I’ll verify.
If n is in the thousands or more, I use big integers or modular arithmetic, depending on the task.
If this is in a tight loop and I only need ratios or comparisons, I’ll use modulo with a large prime to keep performance high.

For example, in JavaScript, you can rewrite everything with BigInt if you need exactness. But BigInt is slower, so I only do it when the problem statement demands exact counts.

Testing strategy I actually use

Testing matters even for a small problem like this. It’s easy to slip on neighbors or base cases. My typical test plan is simple but effective:

n=1 should return 10.
n=2 should return 36.
Symmetry assertions: dp[len][1] == dp[len][3], dp[len][4] == dp[len][6], dp[len][7] == dp[len][9].
Brute force for n=2 or n=3 using a tiny BFS or DFS generator, then compare counts per digit and total.

I also add a regression test for one higher n (like n=3 or n=4) just to lock in behavior. This is cheap and catches issues when refactoring the neighbor list.

A tiny brute-force validator (for self-check)

I won’t use this in production, but I keep a tiny brute-force generator handy when I’m teaching or debugging. It builds all sequences of length n by BFS from each digit, which is feasible for n up to 3 or 4.

The idea is simple:

Start with a list of all digits as length-1 sequences.
For each step, extend each sequence by all allowed moves.
Count how many sequences you get at length n.

This is a nice sanity check because it doesn’t share logic with your DP beyond the neighbor list, so it’s more likely to expose mistakes.

A “why it works” proof sketch

I like to give a short proof outline, especially if I’m teaching. It’s just a standard DP induction:

Base case: For length 1, each digit contributes exactly one sequence (the digit itself). So dp[1][d] = 1 is correct.
Inductive step: Assume dp[len-1][p] correctly counts sequences of length len-1 ending at digit p. Any sequence of length len ending at d must come from one of d’s neighbors p at length len-1. Summing those counts yields the correct total. Therefore dp[len][d] is correct for all digits.
Finally, summing dp[n][d] over all d yields the total sequences because each valid sequence ends at exactly one digit.

This proof is short, but it’s enough to explain correctness without diving into formal graph theory.

Alternative approach: matrix exponentiation

If n is very large (say 1e9), O(n) time is too slow. You can model the transitions as a 10×10 matrix A where A[i][j] is 1 if you can move from j to i (or from i to j depending on your convention). Then:

dp[len] = A * dp[len-1]

dp[n] = A^(n-1) * dp[1]

You can compute A^(n-1) in O(log n) time using fast exponentiation, and because the matrix is only 10×10, the constant is tiny. This is the natural “next step” if performance really matters.

I don’t reach for this unless n is massive, because the rolling DP is simpler, easier to reason about, and already fast for typical constraints. But it’s worth knowing that this option exists and that the transition matrix is fixed.

Another alternative: memoized DFS on coordinates

This is more of a stylistic alternative than a new algorithm. If you want to show the problem in “grid terms,” you can write a DFS on coordinates (i, j, n) with memoization. It’s sometimes more intuitive for beginners. The complexity is the same as the digit-based memoized approach.

The downside is you need extra logic to reject * and #, and you do coordinate bounds checks at each call. That overhead is small, but it’s also extra surface area for bugs. That’s why I stick with digit neighbors after the first explanation.

Practical scenarios where this shows up

This is a toy problem, but the pattern shows up everywhere. Here are a few places I’ve seen similar logic in real systems:

Generating possible input sequences for fuzzy matching or brute-force keypad entry simulations.
Counting ways to traverse a constrained UI element with fixed navigation rules (think TV remote navigation in a grid).
Modeling keyboard adjacency for error correction or auto-complete probabilities.
Testing state transitions in small finite-state machines, where you need to count reachability over n steps.

If you squint, this is just “count walks in a small graph.” That’s a good mental pattern to carry around.

Common pitfalls in production, not just interviews

Beyond the standard mistakes, there are a few real-world pitfalls I’ve tripped over:

Memory creep from storing full DP tables in languages where arrays aren’t compact. The rolling array avoids this.
Implicit integer conversion issues in languages where number types are floating point (like JavaScript). If counts get large, use BigInt or modulo.
Hidden assumptions about n=0. Some systems want n=0 to mean “empty sequence,” which would be 1 instead of 0. Decide explicitly.
Parallelization overhead. The problem is small; making it parallel often slows it down because overhead dominates.

I keep the implementation tight and predictable. Most bugs come from over-complication, not under-optimization.

Performance notes with actual intuition

Because the neighbor list is fixed and tiny, the loop is basically “sum 2–5 integers for each of 10 digits, for n steps.” That’s 20–50 additions per step. Even for n in the millions, this is trivial on modern hardware in a compiled language. In high-level languages, it’s still fast enough for most inputs unless you’re running it in a very tight request path.

If you want to optimize further:

Use arrays instead of maps for neighbors.
Hoist neighbor lists into constants.
Avoid repeated object allocations inside loops.
Keep the dp arrays reused if the language allows it.

That said, I’d rather keep the code readable unless profiling shows a need for micro-optimization.

Clearer explanation of symmetry

The keypad is symmetric in several pairs: 1 and 3, 4 and 6, 7 and 9. That symmetry is preserved by the movement rules. So counts for those pairs should always match at every length. This is more than a nice trick—it’s a test oracle. If those counts diverge, you almost certainly mis-specified neighbors.

It’s also a hint that you could compress the problem by grouping symmetric digits, reducing the state space from 10 to fewer equivalence classes. I rarely do this because the state space is already small, but it’s a neat analytical exercise.

A short C++ version for performance-minded readers

If you’re performance-focused, here’s a compact C++ implementation using 64-bit integers. It’s still the same algorithm, just tuned for speed and clarity:

#include 
using namespace std;
long long totalSequences(int n) {
if (n <= 0) return 0;
vector<vector> neighbors = {
{0, 8},          // 0
{1, 2, 4},       // 1
{2, 1, 3, 5},    // 2
{3, 2, 6},       // 3
{4, 1, 5, 7},    // 4
{5, 2, 4, 6, 8}, // 5
{6, 3, 5, 9},    // 6
{7, 4, 8},       // 7
{8, 5, 7, 9, 0}, // 8
{9, 6, 8}        // 9
};
long long dp[10];
for (int i = 0; i < 10; i++) dp[i] = 1;
for (int len = 2; len <= n; len++) {
long long next[10] = {0};
for (int d = 0; d < 10; d++) {
long long sum = 0;
for (int p : neighbors[d]) sum += dp[p];
next[d] = sum;
}
for (int i = 0; i < 10; i++) dp[i] = next[i];
}
long long total = 0;
for (int i = 0; i < 10; i++) total += dp[i];
return total;
}
int main() {
cout << totalSequences(1) << "\n"; // 10
cout << totalSequences(2) << "\n"; // 36
return 0;
}

It’s simple and direct. If you want modulo support, add sum %= mod and keep everything in long long or int64_t.

A note on interpretation: “starting from any digit”

Some variations of this problem specify a fixed starting digit, while others allow starting anywhere. I’m assuming “start anywhere,” which is why dp[1][d] = 1 for all digits and the final answer is the sum over all digits.

If the start digit is fixed, just initialize dp[1][start] = 1 and dp[1][others] = 0. The rest of the recurrence stays the same. This is a common variant and it’s a good reminder to always read the statement carefully.

A mini decision tree I use in real code

When I need to implement this, I almost always decide quickly based on two questions:

Do I need exact counts or modulo?
How big can n get?

That yields a small choice matrix:

Exact + small n: Rolling DP with normal integers.
Exact + large n: Rolling DP with big integers.
Modulo + any n: Rolling DP with modulo in the loop.
Very large n (1e9+): Matrix exponentiation.

It sounds trivial, but it keeps implementations consistent and prevents overengineering.

Extending the model: weighted moves or forbidden digits

Sometimes you need to adapt the model. A few examples:

If some digits are forbidden, just remove them from the neighbor list and ignore them in the final sum.
If staying is not allowed, remove the digit itself from each neighbor list.
If moves have weights (like probabilities), change the recurrence to multiply by weights, and you get a Markov chain.

This is why I like the graph interpretation: it’s naturally extensible. You change the graph, and the DP follows.

Mini checklist for correctness

Here’s a tiny checklist I keep in my head before I ship:

Neighbor list correct for all digits, especially 0 and 8.
Base case clear: dp[1] initialized correctly.
Loop boundaries: starts at len = 2 and ends at len = n.
Result is sum of dp[n][d] across all digits.
Overflow behavior defined (BigInt or modulo).

That’s it. If I check those five, I rarely have bugs.

A small worked example (n=2)

I like to sanity-check with n=2 because it’s small and confirms the neighbor list. Each digit contributes its degree (including itself) sequences of length 2:

0 has 2 neighbors
1 has 3 neighbors
2 has 4 neighbors
3 has 3 neighbors
4 has 4 neighbors
5 has 5 neighbors
6 has 4 neighbors
7 has 3 neighbors
8 has 5 neighbors
9 has 3 neighbors

Sum: 2 + 3 + 4 + 3 + 4 + 5 + 4 + 3 + 5 + 3 = 36. That matches the DP output and is a great quick check.

Key takeaways and next steps

The keypad problem is a compact way to practice graph DP with a clear real-world feel. I recommend you model it as a graph, build a neighbor list once, and use a rolling DP array to keep space tiny. You’ll get O(n) time with minimal code and a solution that scales to large n without fuss.

If you want to go further, try these next steps. First, add a modulo parameter and see how it changes your implementation. Second, write a brute-force generator for n up to 3 and compare counts, which is a great exercise in test-driven validation. Third, if you’re curious about algorithmic extensions, model the keypad as a 10×10 adjacency matrix and implement fast exponentiation—this gives you O(log n) time at the cost of more math, and it’s a solid way to practice linear algebra techniques in code.

From a practical angle, I’d also encourage you to build a small benchmark harness. Even a basic timer around the loop for a few n values will teach you how these loops behave on your hardware. That kind of intuition is useful far beyond this single problem.