Implementation of a Dynamic Array in Python (From Scratch)

You have probably hit this moment: you start with a small list, you keep appending, and suddenly you wonder why appends still feel fast even though the list keeps growing. Under the hood, a plain Python list does not reallocate one slot at a time. It grows in chunks, copying items only occasionally. That behavior is the core idea behind a dynamic array.

When I implement a dynamic array from scratch, I am not trying to replace Python‘s list. I am trying to (1) make the memory model real, (2) learn where the time goes when you insert or delete, and (3) build intuition for amortized cost, which shows up everywhere from vectors in systems languages to buffer growth in networking code.

By the end of this post, you will have a runnable DynamicArray class backed by a low-level contiguous buffer via ctypes. You will see how append stays O(1) on average, why insert is expensive, how to pick a growth rule, and which edge cases bite people (including the classic mistake of returning an IndexError instead of raising it).

What a dynamic array really is (memory first)

A dynamic array is an array-like container with a backing store that lives in a contiguous block of memory, plus a policy for what happens when the block fills up.

Two separate numbers matter:

  • length (sometimes called n): how many real elements you have stored
  • capacity: how many elements you can store before you must allocate a new block

Think of capacity as the number of seats in a theater and length as how many seats are currently occupied. You can sell tickets (append items) until you run out of seats. When you do, you do not attach one more seat to the side. You move to a bigger theater and re-seat everyone.

The expensive part is the move: allocate a new block, copy elements, then swap references so the new block becomes the backing store.

The benefit is that you do not move every time you append. You move only when you hit capacity, which lets most appends be cheap.

The three invariants I always keep in my head

When I write (or review) a dynamic array implementation, I constantly check these invariants:

1) 0 <= n <= capacity

2) For 0 <= i < n, slot A[i] contains a valid element reference

3) For n <= i < capacity, slot A[i] is either uninitialized or intentionally cleared (I prefer clearing to None after deletes)

If any of these invariants drift, the bugs become confusing fast: duplicates after insert, missing elements after pop, or memory that never gets released because you forgot to drop references.

Why “contiguous” matters even in Python

In Python, objects (like strings, dicts, instances) live on the heap. Our dynamic array (and Python‘s list) stores references (pointers) to those objects. The objects themselves are not packed tightly next to each other.

Even so, contiguity still matters because the references are laid out in a tight block. That gives you better cache locality when you iterate, index, or scan. You are not chasing a chain of pointers scattered all over memory the way you would in a linked list.

The growth rule and why append is still fast

If you grew the backing store by exactly +1 each time it filled, your total work across n appends would be roughly:

1 + 2 + 3 + ... + n = O(n^2)

That feels bad quickly.

Instead, a classic rule is to grow by a constant factor (often 2x). The copying work becomes geometric:

  • copy 1 item when capacity goes 1 -> 2
  • copy 2 items when capacity goes 2 -> 4
  • copy 4 items when capacity goes 4 -> 8

The total amount copied up to n items is O(n). Spread over n appends, the average (amortized) cost per append is O(1).

I like to explain amortized cost with a simple analogy: you sometimes pay a big moving bill, but you do not pay it on every grocery trip. Over a month, your average daily cost is steady.

One more detail: a factor of 2 is simple and makes the math easy, but it can waste memory right after a resize (almost half the allocated slots may be empty). Some runtimes use smaller growth factors or over-allocation formulas to balance copying frequency against memory overhead.

A concrete “budget” view of amortized analysis

Here is an intuition I use when teaching or sanity-checking:

  • Suppose resizing doubles capacity.
  • Every element you append will be copied forward some number of times as the array grows.

How many times can a single element get copied? Not that many.

If an element was appended when capacity was 8, it will only be copied during resizes to 16, 32, 64, … until you stop growing. That is at most log2(n) copies for that element, and across all elements the total copy work stays proportional to n.

This is why I always separate:

  • Worst-case per operation: one append can be O(n) if it triggers a resize
  • Average over a long sequence: lots of appends in a row average out to O(1)

Picking a growth factor: my practical heuristic

When I build a teaching implementation, I default to doubling because:

  • it is easy to reason about
  • it minimizes the number of resizes
  • it makes amortized analysis clean

If I were tuning for a memory-sensitive workload, I would consider:

  • 1.5x growth (less slack, more frequent resizes)
  • a hybrid rule (e.g., double at small sizes, then grow by a smaller factor at large sizes)

The “right” factor is a trade-off:

  • bigger factor = fewer copies, more unused capacity
  • smaller factor = more copies, less unused capacity

Backing storage with ctypes: a raw contiguous buffer of Python objects

Python‘s built-in list already does this work in C. When we build our own, we need a contiguous buffer that can hold object references.

The ctypes module can allocate a C-style array of pointers to Python objects. This is not about storing raw integers like C; instead, we store references to Python objects (ctypes.py_object). That makes the implementation flexible (you can append strings, dicts, custom classes), and it keeps the demo focused on resizing logic.

Here is the basic idea for a new backing array of a given capacity:

import ctypes

def make_array(capacity: int):

# Allocate a contiguous block of ‘capacity‘ slots.

# Each slot can hold a reference to a Python object.

return (capacity * ctypes.py_object)()

A few practical notes from experience:

  • This is educational code. It will be slower than built-in list for most workloads.
  • Because we store Python object references, the memory behavior is similar to list (a list stores pointers too), not like NumPy arrays where numeric values sit densely.
  • ctypes arrays do not automatically grow. That is the point: we must implement growth.

A quick note on safety and “realism”

Using ctypes.py_object keeps things safe in the sense that reference counting and GC work as expected. We are not doing unsafe pointer arithmetic. But it is still low-level enough to make resizing and shifting visible.

If you want to go deeper (not required for this post), you can also explore:

  • how Python lists over-allocate internally
  • how reference counting interacts with deleting slots
  • why clearing a slot to None can matter for memory

A Pythonic DynamicArray implementation (complete and runnable)

I am going to implement a small API surface that mirrors what you expect from a list:

  • len(x)
  • x[i]
  • x.append(value)
  • x.insert(index, value)
  • x.pop() / x.pop(index)
  • iteration

I also keep helper methods private using a leading underscore. That does not enforce privacy, but it communicates intent and keeps the public API clean.

import ctypes

from typing import Any, Iterator

class DynamicArray:

"""A minimal dynamic array backed by a contiguous ctypes buffer.

This stores Python object references (like Python‘s list), not raw numeric values.

"""

def init(self, initial_capacity: int = 1) -> None:

if initial_capacity < 1:

raise ValueError(‘initial_capacity must be >= 1‘)

self._n = 0

self.capacity = initialcapacity

self.A = self.makearray(self.capacity)

def len(self) -> int:

return self._n

def iter(self) -> Iterator[Any]:

for i in range(self._n):

yield self._A[i]

def repr(self) -> str:

items = ‘, ‘.join(repr(self.A[i]) for i in range(self.n))

return f‘DynamicArray([{items}])‘

def getitem(self, index: int) -> Any:

index = self.normalizeindex(index)

return self._A[index]

def setitem(self, index: int, value: Any) -> None:

index = self.normalizeindex(index)

self._A[index] = value

def append(self, value: Any) -> None:

if self.n == self.capacity:

self.resize(self.capacity * 2)

self.A[self.n] = value

self._n += 1

def insert(self, index: int, value: Any) -> None:

# Match list.insert behavior: clamp index into [0, n].

if index < 0:

index = 0

if index > self._n:

index = self._n

if self.n == self.capacity:

self.resize(self.capacity * 2)

# Shift right to open a slot.

for i in range(self._n, index, -1):

self.A[i] = self.A[i - 1]

self._A[index] = value

self._n += 1

def pop(self, index: int = -1) -> Any:

if self._n == 0:

raise IndexError(‘pop from empty DynamicArray‘)

index = self.normalizeindex(index)

value = self._A[index]

# Shift left to fill the hole.

for i in range(index, self._n - 1):

self.A[i] = self.A[i + 1]

# Help GC by dropping the reference in the now-unused slot.

self.A[self.n - 1] = None

self._n -= 1

# Optional: shrink if we are very under-full.

# This keeps memory from staying huge after a big transient spike.

if self.capacity > 1 and self.n <= self._capacity // 4:

newcap = max(1, self.capacity // 2)

if newcap != self.capacity:

self.resize(newcap)

return value

def to_list(self) -> list[Any]:

return [self.A[i] for i in range(self.n)]

def capacity(self) -> int:

"""Expose current capacity for learning/debugging."""

return self._capacity

def normalizeindex(self, index: int) -> int:

# Support negative indices like Python sequences.

if index < 0:

index += self._n

if not 0 <= index < self._n:

raise IndexError(‘index out of range‘)

return index

def resize(self, newcapacity: int) -> None:

if newcapacity < self.n:

raise ValueError(‘new_capacity must be >= current length‘)

B = self.makearray(new_capacity)

for i in range(self._n):

B[i] = self._A[i]

self._A = B

self.capacity = newcapacity

@staticmethod

def makearray(capacity: int):

return (capacity * ctypes.py_object)()

if name == ‘main‘:

arr = DynamicArray()

arr.append(‘auth-service‘)

arr.append(‘payments‘)

arr.append({‘team‘: ‘infra‘, ‘on_call‘: True})

print(arr)

print(‘len=‘, len(arr), ‘cap=‘, arr.capacity())

arr.insert(1, ‘search‘)

print(‘after insert:‘, arr.to_list(), ‘cap=‘, arr.capacity())

last = arr.pop()

print(‘popped:‘, last)

print(‘after pop:‘, arr.to_list(), ‘cap=‘, arr.capacity())

print(‘arr[0]=‘, arr[0])

arr[0] = ‘identity‘

print(‘after setitem:‘, arr.to_list())

If you run that file, you should see the array grow automatically as you append, shift elements during insert, and shift back during pop(index).

Two details I consider non-negotiable in real code:

  • Raise exceptions, do not return exception objects. Returning IndexError instead of raising it creates silent failures and weird downstream behavior.
  • When deleting, clear the slot (set it to None). That prevents accidentally keeping objects alive longer than needed.

What I intentionally left out (and why)

A full list replacement is a huge surface area: slicing, rich comparisons, extend, remove, index, count, sorting, reversing, multiplication, etc.

I keep this implementation small on purpose, because I want the core mechanisms to stay obvious:

  • the length/capacity split
  • resize and copy
  • shift for insert/delete

Once those are solid, you can add features safely.

Inserts and deletes: shifting is the real cost

Dynamic arrays are great at:

  • append at the end (amortized O(1))
  • random access by index (O(1))

They are not great at:

  • inserting near the front or middle (O(n) because of shifting)
  • deleting near the front or middle (O(n) because of shifting)

That O(n) is not a theoretical nit. In practice, shifting can dominate runtime when you treat a dynamic array like a queue.

If you need fast operations at both ends, I recommend reaching for collections.deque in Python rather than forcing a dynamic array to do a job it is not designed for.

A useful mental model:

  • dynamic array: good for “mostly append, lots of reads”
  • linked list: good for “lots of inserts/deletes with stable references” (but poor cache locality)
  • deque: good for “push/pop at both ends”

Cache locality matters here. A dynamic array keeps references contiguous, which plays nicely with CPU caches. That is why it often beats pointer-heavy structures even when Big-O looks similar.

A “cost map” I use when choosing operations

If I am designing an algorithm, I like to explicitly label the hot operations:

  • arr[i]: O(1)
  • arr.append(x): amortized O(1)
  • arr.pop(): amortized O(1) (plus possible shrink)
  • arr.insert(0, x): O(n)
  • arr.pop(0): O(n)

That last pair is the trap. It is very easy to accidentally write O(n^2) code by repeatedly doing work at the front of a dynamic array.

Practical scenario: why pop(0) melts performance

Imagine you are implementing a BFS queue and you do:

  • append new nodes to the end
  • pop from the front

If you implement that with a dynamic array using pop(0), every dequeue shifts everything left. Do that n times and you get ~O(n^2) shifts.

This is exactly the kind of bug that looks fine in small tests and becomes painful in production.

Resizing strategy: doubling vs smaller growth, and when to shrink

The doubling rule is popular because it is simple:

  • fewer resizes
  • amortized O(1) append

But it can overshoot memory usage, especially for spiky workloads.

In production systems, the right strategy depends on what you are building:

  • If you ingest batches and then keep the container around, consider a shrink policy.
  • If you append continuously and rarely delete, shrinking may cause thrashing (grow, shrink, grow again).

In the code above, I used a conservative shrink rule:

  • shrink when length <= capacity/4
  • shrink to capacity/2

That introduces hysteresis: you need a big change in occupancy before resizing again, which reduces resize ping-pong.

I also recommend choosing a clear minimum capacity (I used 1). Without it, repeated pops can resize down to 0, and then the next append becomes awkward.

The subtle cost of shrinking: not just time, but churn

I think of shrink policies in two dimensions:

  • time cost: shrinking requires allocation + copying, just like growing
  • allocation churn: in some real workloads, memory usage oscillates (batch in, batch out)

If you shrink too eagerly, you can end up doing a lot of unnecessary copying.

My rule of thumb:

  • If deletes are rare, do not shrink at all (or shrink only when the container becomes very small).
  • If deletes happen in bulk and then you stay small for a while, a shrink rule is worth it.

Capacity planning: a simple way to avoid resizes

One of the most practical tricks (even when using Python‘s built-in list) is to pre-allocate when you roughly know the final size.

In our implementation, that means choosing a higher initial_capacity.

If I know I am about to append ~10,000 items, I would rather start with initial_capacity=16384 (or something nearby) than pay for many growth steps.

Even if my estimate is off, I can often reduce the number of resizes drastically.

Traditional vs modern choices (what I recommend in 2026)

When you are choosing a container in Python, I recommend starting from the built-ins and only writing custom storage when you have a clear need.

Need

Traditional approach

What I recommend now —

— General-purpose growable sequence

list

list (still the right default) Many appends, many pops from left

list + pop(0)

collections.deque Dense numeric data and vector math

list of ints/floats

NumPy arrays (or array module for simple cases) Learning / interviews / teaching

hand-rolled dynamic array

hand-rolled dynamic array backed by ctypes

I still build a custom dynamic array occasionally, but it is usually for learning, instrumentation, or embedding custom behavior (like strict typing, bounds checks, or special shrink policies).

A deeper version: adding extend, clear, and remove safely

Once I have append, insert, and pop correct, the next methods I like to add are the ones people reach for constantly:

  • extend(iterable)
  • clear()
  • remove(value) (remove first match)

The main risk is that each new operation must preserve the invariants from earlier. My approach is:

  • implement the simplest correct version
  • reuse existing building blocks (append, shifting loops)
  • add tests that compare behavior to Python list

Here is an example of what those additions can look like, still keeping the class compact:

from collections.abc import Iterable

class DynamicArray(DynamicArray):

def clear(self) -> None:

# Drop references to help GC.

for i in range(self._n):

self._A[i] = None

self._n = 0

# Optional: shrink back to minimum.

if self._capacity > 1:

self._resize(1)

def extend(self, values: Iterable[object]) -> None:

for v in values:

self.append(v)

def index(self, value: object) -> int:

for i in range(self._n):

if self._A[i] == value:

return i

raise ValueError(‘value not found‘)

def remove(self, value: object) -> None:

i = self.index(value)

self.pop(i)

A few notes:

  • I purposely implement extend by repeatedly calling append. It is not the fastest possible approach, but it is correct and leverages our existing resize logic.
  • If I cared about speed, I would first count incoming values (when possible) and pre-reserve capacity. That is a more advanced optimization.
  • remove uses index + pop, which is easy to reason about.

Common mistakes I see when people implement this

I have reviewed a lot of versions of this code over the years. The same bugs show up repeatedly.

1) Returning exceptions instead of raising them

Bad pattern:

if index = self._n:

return IndexError(‘out of range‘)

Good pattern:

if index = self._n:

raise IndexError(‘out of range‘)

2) Forgetting to clear deleted slots

If you shift elements left and do not clear the last slot, your backing store still holds a reference to the deleted object. That can keep memory alive longer than you think.

3) Off-by-one errors during shifting

  • Insert shifts from n down to index+1
  • Delete shifts from index up to n-2

If your loops are wrong by one, you will duplicate elements or lose them.

4) Ignoring negative indices or clamping behavior

Python sequences treat -1 as the last element. For insert, Python clamps: insert(-10, x) inserts at 0, and insert(109, x) inserts at the end. If you are trying to feel “Pythonic,” copy those behaviors.

5) Shrink thrashing

If you shrink too aggressively (for example, shrink whenever length < capacity/2), then alternating append and pop can cause repeated resizes. Use a gap (like 1/4) to avoid it.

6) Confusing contiguous references with contiguous object data

Our ctypes backing store is contiguous for references, not for the objects themselves. Objects still live elsewhere on the heap. This matters if you are chasing performance for numeric workloads; that is where NumPy shines.

7) Not thinking about “loitering” references

Even if your DynamicArray length is correct, it can still keep objects alive if the backing store holds references past n. Clearing those slots on delete avoids that subtle memory retention.

8) Mixing “normalize index” with “insert clamp” semantics

These are different operations:

  • For getitem and pop, negative indexing is meaningful and out-of-range should raise.
  • For insert, negative indices clamp to 0 and overly large indices clamp to n.

I often see implementations accidentally call normalizeindex inside insert, which changes behavior in surprising ways.

Debugging and instrumentation: how I prove my implementation is correct

When I am learning (or teaching), I like to make the data structure talk back. Two small tools help a lot:

1) A method to print internal state (n, capacity) alongside elements

2) An invariant checker that I can run after every operation

Here is a simple invariant checker that catches many bugs early:

def checkinvariants(self) -> None:

if not (0 <= self.n <= self.capacity):

raise AssertionError(‘n/capacity invariant broken‘)

# Elements beyond n are allowed to be None; we just ensure we can read.

for i in range(self._n):

= self.A[i]

In a teaching environment, I will literally call checkinvariants() at the end of every public method while iterating on the implementation. Then I remove those calls once I am confident.

A practical “fuzzer” approach without fancy tooling

If I want high confidence quickly, I do this:

  • keep a Python list as the reference implementation
  • perform random operations (append, insert, pop) on both
  • assert that they match after each step

You do not need sophisticated libraries to get value here. Even a few hundred random steps will shake out most off-by-one bugs.

Performance considerations (with realistic expectations)

I do not build this to beat list. In CPython, list is implemented in C and heavily optimized.

What I do get from building this myself is a clear mental model:

  • why append is usually cheap
  • why insert and pop(0) are expensive
  • why growth policies matter
  • how “hidden” copying shows up as occasional spikes

What “fast” actually means in practice

In real applications, “fast enough” often comes from choosing the right algorithm and the right container, not from micro-optimizing Python code.

If you are doing:

  • millions of appends and reads: list wins
  • queue semantics: deque wins
  • numeric heavy lifting: NumPy wins

My dynamic array implementation is for understanding and for customization, not for raw throughput.

The spike profile I look for

If you plot operation time for repeated appends, the pattern looks like:

  • many tiny operations
  • occasional big spikes when resizing happens

That is the practical footprint of amortization. The average is smooth, but the worst-case moments exist.

If you are building latency-sensitive systems (say, real-time-ish workloads), those spikes can matter more than the average.

In that world, I start thinking about:

  • preallocation
  • chunked buffers
  • bounded queues

Alternative approaches that solve similar problems

Dynamic arrays are a great general-purpose tool, but I like having a menu of alternatives because it prevents me from forcing one structure into every shape.

collections.deque for queues and double-ended work

If I need fast appends and pops at both ends, I use deque.

  • append and appendleft are O(1)
  • pop and popleft are O(1)

That is exactly what you want for BFS, sliding windows, and producer/consumer buffering.

The array module for typed numeric storage

If I want a compact typed sequence (e.g., many integers) and I do not need NumPy, Python‘s array module is a practical middle ground.

It stores values densely (not references), which can reduce memory usage and improve cache behavior for numeric loops.

NumPy for serious numeric workloads

If you are doing vector math, linear algebra, image processing, or large numeric data manipulation, use NumPy.

The core win is not just that it is “faster,” but that it stores numeric data densely and runs operations in optimized C loops.

Linked lists (rarely) when stable node references matter

I almost never reach for a linked list in Python for performance reasons. But there are niche cases where you want:

  • stable references to nodes
  • lots of splicing operations

Even then, I usually try to model the problem differently before reaching for a linked list.

When you should build this yourself (and when you should not)

I recommend building a dynamic array yourself when:

  • you want to understand why list append is fast in practice
  • you are teaching data structures and you want a concrete, runnable implementation
  • you need custom behavior (growth/shrink policy, instrumentation, strict API)
  • you are writing a constrained runtime where you control allocation decisions (rare in Python, but it happens in embedded or sandboxed contexts)

I do not recommend building this yourself when:

  • you need raw speed: the built-in list is implemented in C and is extremely tuned
  • you want a production container with every list feature (slicing, sort, rich comparisons, etc.)
  • you need numeric density: use NumPy or array

If you are reading this in 2026 and you are using AI assistants in your workflow (I do), the implementation still matters. Tools can generate code quickly, but the subtle bugs above are exactly what slip into generated snippets. Knowing the invariants (length vs capacity, shifting rules, when copying happens) is what lets you review and correct those outputs confidently.

Key takeaways and practical next steps

The core trick behind a dynamic array is simple: keep a contiguous backing store, track length and capacity separately, and resize by allocating a bigger block and copying when you run out of space. Once you internalize that, a lot of everyday performance behavior in Python stops feeling magical. Appends stay fast because copying happens only occasionally, and the total copying work grows linearly with the number of appends.

If you want to keep going, here are practical next steps I recommend:

  • Add a small test suite that checks invariants after each operation: length is correct, elements match a reference list, negative indices behave properly, and the structure raises (not returns) exceptions.
  • Write a tiny benchmark that compares your DynamicArray against list for (1) append-heavy workloads and (2) insert-heavy workloads. Use ranges and relative comparisons instead of obsessing over exact numbers.
  • Add extend with a reservation optimization: if you can estimate how many items you will add, pre-grow capacity once rather than resizing repeatedly.
  • Implement contains, index, and remove carefully, then confirm your semantics match what you expect.
  • Experiment with growth factors (2.0x vs 1.5x) and shrink thresholds (1/3 vs 1/4) and observe how capacity changes under different workloads.

The biggest payoff is not the class itself. The payoff is the mental model: once you understand dynamic arrays, you start seeing them everywhere—any time a system amortizes expensive reallocation by growing in chunks.

Expansion Strategy

Add new sections or deepen existing ones with:

  • Deeper code examples: More complete, real-world implementations
  • Edge cases: What breaks and how to handle it
  • Practical scenarios: When to use vs when NOT to use
  • Performance considerations: Before/after comparisons (use ranges, not exact numbers)
  • Common pitfalls: Mistakes developers make and how to avoid them
  • Alternative approaches: Different ways to solve the same problem

If Relevant to Topic

  • Modern tooling and AI-assisted workflows (for infrastructure/framework topics)
  • Comparison tables for Traditional vs Modern approaches
  • Production considerations: deployment, monitoring, scaling
Scroll to Top