How to Handle MemoryError in Python (Without Guessing)

The first time I hit MemoryError on a production Python service, it wasn‘t during a big batch job. It was a ‘simple‘ endpoint that aggregated data for a dashboard. The payload grew over a few weeks, a new customer uploaded a larger dataset than expected, and suddenly the process tried to allocate one more list and fell over. What made it frustrating wasn‘t the exception itself – it was how little the stack trace told me about why memory spiked.\n\nIf you build anything that reads files, processes API responses, trains models, parses JSON, or does analytics, you‘ll run into memory pressure sooner than you think. And the fix is rarely ‘add more RAM‘ (even when that‘s part of the answer).\n\nHere‘s the playbook I use now: understand what Python means by MemoryError, quickly confirm whether you‘re dealing with a one-time spike or a slow leak, and then apply a small set of patterns that keep memory bounded: streaming, chunking, data-structure choices, and guardrails around inputs. I‘ll show runnable examples, profiling tactics that work in 2026 workflows, and the gotchas that keep teams stuck in the same incident loop.\n\n## What MemoryError Really Means (and What It Doesn‘t)\n\nMemoryError is a Python exception raised when the interpreter can‘t allocate the memory it needs for an operation. That allocation might be obvious (creating a huge list) or indirect (a string concatenation that doubles repeatedly, a cache that never evicts, a JSON parser materializing an entire document).\n\nA few details that matter in practice:\n\n- MemoryError is about allocation failure, not about ‘your code is slow.‘ You can have a fast program that still dies because it attempts one oversized allocation.\n- It can happen even when your machine has free memory, because allocation can fail due to fragmentation, address-space limits, container limits, or per-process limits.\n- In containerized environments, you might not get MemoryError at all. The OS (or container runtime) can kill the process with an OOM kill. That looks like a crash with no Python traceback.\n\nI keep this quick checklist when triaging:\n\n- Do you see a Python traceback ending with MemoryError? That‘s a direct allocation failure.\n- Do you see the process just disappear (exit code 137 is common in containers)? That‘s likely an OOM kill.\n- Are you actually recursing infinitely? In modern Python, deep recursion usually hits RecursionError first, unless recursion limits are changed or stack/memory gets weird.\n\n### A subtle but important nuance: big allocations often require contiguous memory\n\nSome allocations fail because the interpreter (or the underlying allocator) can‘t find a big enough contiguous block, even if total free memory looks decent. This shows up in real life with:\n\n- building one massive bytes/str\n- creating gigantic lists of pointers\n- decoding huge JSON into nested Python objects\n\nThe practical implication is: splitting work into chunks isn‘t just ‘nice for memory.‘ It can be the difference between ‘works reliably‘ and ‘fails unpredictably.‘\n\n### Minimal reproduction of the exception\n\nThis is the kind of operation that can trigger it on many machines:\n\n # memoryerrordemo.py\n\n def allocatetoobig():\n # This tries to allocate a very large list. Depending on your machine,\n # it may crash with MemoryError quickly.\n payload = [0] (1010)\n return len(payload)\n\n if name == "main":\n print(allocatetoobig())\n\nThe fix is never ‘wrap it in try/except and continue.‘ Your goal is to avoid attempting allocations that can‘t succeed.\n\n## The Three Usual Culprits: Unbounded Growth\n\nIn my experience, MemoryError almost always comes from unbounded growth in one of three forms:\n\n1) An unbounded loop that keeps appending\n2) A data structure that‘s much larger than you think\n3) Recursion (or repeated call stacks) that never terminates\n\n### 1) Unbounded loops: lists that only grow\n\n def createlargelistforever():\n rows = []\n while True:\n rows.append("event")\n\n createlargelistforever()\n\nIf you genuinely need to ‘process forever,‘ the fix is not ‘stop forever.‘ The fix is to stop storing everything.\n\nHere‘s the bounded-memory version: process one item at a time and write results somewhere durable.\n\n from pathlib import Path\n\n\n def readevents():\n # Stand-in for Kafka / a socket / a long poll.\n counter = 0\n while True:\n counter += 1\n yield {"eventid": counter, "type": "click"}\n\n\n def processforever(outputpath: Path) -> None:\n with outputpath.open("a", encoding="utf-8") as f:\n for event in readevents():\n # You only hold one event at a time.\n f.write(f"{event[‘eventid‘]},{event[‘type‘]}\n")\n # If you need batching, batch a fixed window (see later section).\n\n\n if name == "main":\n processforever(Path("events.csv"))\n\nNotice what changed: I didn‘t ‘fix memory.‘ I changed the shape of the program so memory can‘t grow without bound.\n\n### 2) Large data structures: the hidden multipliers\n\nA classic trap is thinking ‘it‘s just a million strings.‘ In Python, a million strings can be far more than you expect because:\n\n- Each object has overhead (headers, refcounts, pointers)\n- Containers store references\n- Duplicated text might not be shared\n\nThis is why storing raw rows as dictionaries is comfortable but expensive.\n\nAnother multiplier I see constantly: the same dataset existing in multiple forms at the same time. For example:\n\n- you read a file into bytes\n- decode it to str\n- parse it into a Python dict/list tree\n- then create a second transformed structure\n\nAny one representation might be okay; holding two or three at once is what pushes you over the edge. The fix is often: convert in a streaming way, and delete intermediate objects early (or better, never create them).\n\n### 3) Recursion without a base case (or with a base case you never reach)\n\n def recursivefunction(n: int) -> int:\n return n + recursivefunction(n + 1)\n\n recursivefunction(1)\n\nEven with a base case, recursion in Python doesn‘t get tail-call elimination. If you can write it iteratively, you usually should.\n\n def sumupto(limit: int) -> int:\n total = 0\n n = 1\n while n <= limit:\n total += n\n n += 1\n return total\n\n print(sumupto(1000000))\n\nIf you must use recursion (tree walks, parsing, graph traversal), add explicit depth limits and consider iterative approaches with an explicit stack (list.append / list.pop) so you control memory more predictably.\n\n## First Response: Confirm Whether It‘s a Spike, a Leak, or a Limit\n\nWhen I‘m on-call, my first goal is not elegance – it‘s to stop guessing. You can fix a one-time spike by chunking. You fix a leak by finding the object graph that keeps growing. You fix a limit by changing container settings or process architecture.\n\n### Quick questions I ask (and why)\n\n- Did memory climb steadily over minutes/hours? That suggests a leak or unbounded caching.\n- Did memory jump during one request/job and then crash? That suggests a large allocation spike.\n- Did the crash correlate with one tenant, one file, one report type? That suggests input-driven growth.\n- Did anything change recently (dependency, feature flag, new customer data shape)? Memory regressions are often one small change with a big multiplier.\n\n### Instrument memory in-process (simple and effective)\n\nFor long-running services, add lightweight RSS logging. This won‘t tell you which objects are growing, but it tells you whether growth is real.\n\n import os\n import time\n\n\n def getrssmegabytes() -> float:\n # Linux: read RSS from /proc. In containers, this is still usually available.\n # For macOS/Windows you can use psutil instead.\n with open(f"/proc/{os.getpid()}/status", "r", encoding="utf-8") as f:\n for line in f:\n if line.startswith("VmRSS:"):\n parts = line.split()\n kb = int(parts[1])\n return kb / 1024\n return 0.0\n\n\n def logmemoryforever() -> None:\n while True:\n print(f"rssmb={getrssmegabytes():.1f}")\n time.sleep(2)\n\n\n if name == "main":\n logmemoryforever()\n\nIn 2026, I often wire this into structured logs and dashboards. The point is not precision – the point is trend.\n\n### Catch MemoryError only to add context, then fail fast\n\nIf you continue after MemoryError, you risk running in a partially broken state (failed allocation, half-built objects, inconsistent caches). My rule: catch it to log context and abort the unit of work.\n\n import logging\n\n logger = logging.getLogger(name)\n\n\n def buildreport(customerid: str) -> bytes:\n try:\n # Replace with real report building\n rows = ["row"] 50000000\n return "\n".join(rows).encode("utf-8")\n except MemoryError:\n logger.exception("MemoryError while building report", extra={"customerid": customerid})\n # Raise a domain-specific error or fail the request.\n raise\n\nCatching is not the solution. Better design is the solution.\n\n### Distinguish Python-level failures from container/system-level OOM kills\n\nThis distinction changes your debugging approach:\n\n

Symptom

Most likely cause

What I do next

\n

\n

Python traceback ends with MemoryError

Allocation failed inside Python

Find the exact allocation site; reduce peak allocations; chunk/stream

\n

Process exits with no traceback, often exit code 137

OOM kill by OS / container runtime

Check container memory limit, node pressure, cgroup OOM events; reduce footprint or increase limit

\n

Memory climbs steadily across requests

Leak or cache growth

Heap profiling; find growing types/retainers; add eviction, fix references

\n\nYou can burn hours if you treat an OOM kill like a Python exception. They feel similar in production, but the fix path is different.\n\n## Understand Your Runtime Limits Before You Change Code\n\nBefore rewriting anything, I like to verify the actual constraints in the environment. It‘s amazing how often the ‘mystery‘ is just a limit you didn‘t know was there.\n\n### Common limits that trigger MemoryError (or make it happen sooner)\n\n- 32-bit Python or a 32-bit process in a legacy environment (address space ceiling)\n- Container memory limits (cgroups)\n- Per-process limits (ulimit / resource constraints)\n- Memory-hungry neighbors on the same node (your process loses the fight)\n\n### Programmatically check some limits (Linux-friendly)\n\n import resource\n\n\n def showlimits():\n # RLIMITAS is virtual memory; RLIMITDATA is data segment (varies by OS);\n # RLIMITRSS is often ignored on Linux but may exist.\n for name in ["RLIMITAS", "RLIMITDATA", "RLIMITRSS"]:\n if hasattr(resource, name):\n res = getattr(resource, name)\n soft, hard = resource.getrlimit(res)\n print(f"{name}: soft={soft} hard={hard}")\n\n\n if name == "main":\n showlimits()\n\nEven if you never change these limits in code, it‘s useful to log them at startup for production services. It turns ‘maybe it‘s a limit‘ into ‘yes, the soft limit is X.‘\n\n### A practical rule I use: size inputs against available memory, not file size\n\nDevelopers often say ‘the file is only 500MB.‘ That doesn‘t mean parsing it is safe. A 500MB JSON file can expand into multiple gigabytes of Python objects.\n\nSo I ask a different question: ‘What is the worst-case expanded in-memory representation?‘ If we don‘t know, we should not materialize it.\n\n## Make Memory Bounded: Streaming, Chunking, and Backpressure\n\nIf you remember one idea from this post, make it this: the safest memory is memory you never allocate.\n\nMost MemoryError incidents are caused by a ‘materialize everything‘ approach:\n\n- read the whole file\n- parse the whole JSON\n- load every row\n- build one huge string\n\nInstead, I design for bounded working sets.\n\n### Read files in chunks\n\n from pathlib import Path\n\n\n def countlines(path: Path) -> int:\n # Reads line-by-line; does not load the file into memory.\n total = 0\n with path.open("r", encoding="utf-8") as f:\n for in f:\n total += 1\n return total\n\n\n if name == "main":\n print(countlines(Path("server.log")))\n\nFor CSV-like files, I usually go one step further: parse and aggregate as I stream, so I never store rows at all.\n\n### Stream HTTP responses instead of buffering\n\n import requests\n\n\n def downloadlargefile(url: str, outpath: str) -> None:\n with requests.get(url, stream=True, timeout=30) as r:\n r.raiseforstatus()\n with open(outpath, "wb") as f:\n for chunk in r.itercontent(chunksize=1024 1024):\n if chunk: # filters out keep-alive chunks\n f.write(chunk)\n\n\n if name == "main":\n downloadlargefile("https://example.com/big-export.csv", "big-export.csv")\n\nThis pattern shows up everywhere: download, decrypt, decompress, parse, write. If you do it in streaming steps, your peak memory stays small.\n\n### Decompress streams instead of gzip.decompress(...)\n\nA common spike is decompressing a large payload in one go. If you do gzip.decompress(bigbytes), your process temporarily holds both compressed and decompressed forms. Streaming avoids that.\n\n import gzip\n import shutil\n\n\n def gunzipfile(srcgzpath: str, dstpath: str) -> None:\n with gzip.open(srcgzpath, "rb") as src, open(dstpath, "wb") as dst:\n shutil.copyfileobj(src, dst, length=1024 1024)\n\n### Parse large JSON without building the full object\n\nIf you call json.loads() on a multi-GB document, you‘re asking for trouble. In services I maintain, I prefer newline-delimited JSON (NDJSON) so we can process object-by-object.\n\n import json\n from pathlib import Path\n\n\n def processndjson(path: Path) -> int:\n processed = 0\n with path.open("r", encoding="utf-8") as f:\n for line in f:\n event = json.loads(line)\n # Do work here\n processed += 1\n return processed\n\n\n if name == "main":\n print(processndjson(Path("events.ndjson")))\n\nIf you don‘t control the format and must parse a single JSON array, consider a streaming parser library (there are good ones) or change the upstream contract. In my experience, negotiating NDJSON or chunked pagination upstream saves far more engineering time than heroic memory optimizations downstream.\n\n### Batch with fixed windows\n\nBatching is fine. Unbounded batching is not.\n\n from collections.abc import Iterable\n\n\n def batched(items: Iterable[dict], batchsize: int):\n batch = []\n for item in items:\n batch.append(item)\n if len(batch) >= batchsize:\n yield batch\n batch = []\n if batch:\n yield batch\n\nNow your memory use stays proportional to batchsize, not to runtime.\n\nA small practical tip: size batchsize by bytes, not by count, if items vary a lot. For example, you can estimate item size using len(json.dumps(item)) during development or sampling in production (with caution). Count-based batching fails when a ‘small number of huge items‘ arrives.\n\n### Add backpressure (don‘t out-run your sink)\n\nIf you read faster than you write (common with queues), memory becomes the buffer. I prefer explicit queues with max sizes.\n\n import queue\n import threading\n\n\n def producer(q: queue.Queue):\n for i in range(10000000):\n q.put({"eventid": i}) # blocks when queue is full\n q.put(None)\n\n\n def consumer(q: queue.Queue):\n while True:\n item = q.get()\n if item is None:\n return\n # Process item\n\n\n def runpipeline():\n q: queue.Queue = queue.Queue(maxsize=10000)\n t1 = threading.Thread(target=producer, args=(q,), daemon=True)\n t2 = threading.Thread(target=consumer, args=(q,), daemon=True)\n t1.start()\n t2.start()\n t1.join()\n t2.join()\n\n\n if name == "main":\n runpipeline()\n\nThe queue‘s max size forces bounded memory.\n\nIn async code, the same idea applies: don‘t schedule infinite tasks and keep references to them. Use bounded semaphores, bounded queues, and streaming responses.\n\n## Choose Data Structures That Don‘t Inflate Your RAM Bill\n\nPython‘s default structures are general-purpose and easy to use. They‘re not always memory-friendly.\n\nHere are swaps I make routinely.\n\n### Prefer iterators/generators over lists\n\nBad (materializes everything):\n\n squares = [n n for n in range(50000000)]\n\nBetter (lazy):\n\n squares = (n n for n in range(50000000))\n\nIf you need to loop once, the generator often wins outright.\n\nA pattern I like in production is to treat ‘returning a list‘ as an explicit decision. If a function‘s job is ‘produce items,‘ returning an iterator communicates bounded memory intent better than returning a list.\n\n### Store rows as tuples or arrays when schema is fixed\n\nIf you have millions of rows with a fixed schema, dictionaries are expensive.\n\n # dict-heavy approach (easy, costly)\n row = {"customerid": 123, "country": "US", "spendcents": 2599}\n\n # tuple approach (cheaper)\n row = (123, "US", 2599)\n\nYou trade readability for memory. I usually wrap tuples with a namedtuple or dataclass(slots=True) to recover clarity.\n\n from dataclasses import dataclass\n\n\n @dataclass(slots=True)\n class Purchase:\n customerid: int\n country: str\n spendcents: int\n\nSlots remove the per-instance dict, which can be a big deal at scale.\n\n### Use slots (or dataclass(slots=True)) for high-volume objects\n\nWhen I see code that creates millions of instances of a plain class, I assume it will be memory-expensive until proven otherwise. Switching to slots is one of the highest ROI changes you can make, because it‘s local and predictable.\n\nOne gotcha: slots=True can break code that assumes it can attach arbitrary attributes dynamically. If you rely on that behavior, slots will surface it quickly.\n\n### Use array (or NumPy) for numeric data instead of Python ints\n\nIf you‘re storing large numeric arrays, Python integers have a lot of overhead. Use array, NumPy, or another typed container.\n\n from array import array\n\n\n def millionintstyped() -> array:\n values = array("I") # unsigned int\n values.extend(range(1000000))\n return values\n\n\n if name == "main":\n print(len(millionintstyped()))\n\nEven if you aren‘t doing scientific computing, typed arrays can be a clean fix for ‘I only need numbers but I‘m paying for Python objects.‘\n\n### Watch out for ‘helpful‘ caches\n\nI‘ve seen production MemoryError caused by:\n\n- functools.lrucache with an unbounded maxsize=None\n- global dictionaries keyed by user input\n- memoization maps that never evict\n- per-request caches accidentally stored in global state\n\nBe explicit:\n\n from functools import lrucache\n\n\n @lrucache(maxsize=10000)\n def lookupexchangerate(currency: str) -> float:\n # Pretend this calls a remote service\n return 1.0\n\nThat maxsize is not a micro-detail. It‘s your memory ceiling.\n\nA trick I use: if a cache key comes from user input, I assume it can be unbounded unless we validate it. ‘Users won‘t do that‘ is not a strategy.\n\n## Prevent the Common ‘String Explosion‘ Patterns\n\nStrings are sneaky because they feel small – until concatenation patterns multiply them.\n\nThis pattern is dangerous:\n\n def consumememory():\n data = "a" (108)\n while True:\n data += data\n\n consumememory()\n\nYou‘re doubling the string repeatedly. Even if you start smaller, it grows exponentially.\n\n### If you must build large text, build it incrementally\n\nUse list-append + join, or stream to a file.\n\n from pathlib import Path\n\n\n def writereportlines(path: Path, totallines: int) -> None:\n with path.open("w", encoding="utf-8") as f:\n for i in range(totallines):\n f.write(f"line={i}\n")\n\n\n if name == "main":\n writereportlines(Path("report.txt"), totallines=5000000)\n\nIf you truly need the final string in memory (often you don‘t), use chunked joins so you don‘t keep too many intermediates alive.\n\n### Prefer bytearray for incremental binary buffers\n\n import os\n\n\n def readbinaryinchunks(path: str) -> bytes:\n buf = bytearray()\n with open(path, "rb") as f:\n while True:\n chunk = f.read(1024 1024)\n if not chunk:\n break\n buf.extend(chunk)\n return bytes(buf)\n\n\n if name == "main":\n if os.path.exists("video.bin"):\n payload = readbinaryinchunks("video.bin")\n print(len(payload))\n\nThis still loads everything eventually, so it‘s only appropriate when you know the file is bounded. If it isn‘t, stream to a destination.\n\n### Beware accidental string duplication\n\nI see this a lot in ‘simple‘ code:\n\n- reading a giant file with f.read() (string/bytes)\n- then calling .splitlines() (list of strings)\n- then calling "\n".join(...) (another giant string)\n\nThat pattern can create multiple large copies. The streaming approach avoids it.\n\n## Profiling Memory in 2026: Practical Tools and a Workflow That Doesn‘t Waste Your Day\n\nYou don‘t fix memory issues by staring at code and ‘thinking harder.‘ You fix them by measuring.\n\nHere‘s a workflow that works well for me on Python 3.12+ codebases.\n\n### Start with tracemalloc for allocation hotspots\n\ntracemalloc is built-in and good at answering: ‘Where are the allocations coming from?‘ It‘s not perfect (it tracks Python-level allocations, not everything the process might allocate through native extensions), but it‘s often enough to find the smoking gun.\n\nA simple pattern I use is ‘snapshot before and after a suspicious operation‘:\n\n # tracemallocsnapshotdemo.py\n\n import tracemalloc\n\n\n def buildbigstructure():\n return [{"i": i, "s": str(i)} for i in range(200000)]\n\n\n def main():\n tracemalloc.start(25)\n before = tracemalloc.takesnapshot()\n\n data = buildbigstructure()\n\n after = tracemalloc.takesnapshot()\n top = after.compareto(before, "lineno")\n\n print("Top allocation diffs:")\n for stat in top[:10]:\n print(stat)\n\n # Make sure data isn‘t optimized away\n print(len(data))\n\n\n if name == "main":\n main()\n\nWhat I look for:\n\n- a small number of lines responsible for most allocated bytes\n- allocation sites inside hot paths (called per row / per request)\n- unexpected allocations caused by convenience code (string formatting, dict construction, copies)\n\nIf you add this pattern around the code that runs right before the crash, you often get a direct pointer to the allocation site that pushes you over the edge.\n\n### Use a line-by-line memory profiler when the code is pure Python\n\nWhen the culprit is pure Python, line profilers can be excellent. They answer: ‘Which line increases memory and never releases it?‘\n\nI typically use them to confirm suspicions, not as the first tool. The workflow I like is:\n\n1) Find the high-level suspect with RSS trend + request correlation\n2) Narrow down with tracemalloc\n3) Confirm with a line-by-line tool or targeted snapshots\n\n### Use a native-aware profiler when you rely on extensions (NumPy/pandas/etc.)\n\nIf your workload uses native extensions heavily, Python-level allocation tracking can miss the big picture because the memory is allocated outside Python‘s object allocator. In those cases, I reach for tools designed to capture native allocations too.\n\nThe practical approach is still the same: measure before/after around one operation and compare. If you can‘t see it in tracemalloc, assume it‘s in native allocations or in memory that stays mapped.\n\n### The single most important measurement: peak memory\n\nAverage memory doesn‘t usually kill you. Peak memory does.\n\nWhen I profile a memory issue, I care about:\n\n- peak RSS during the request/job\n- peak heap allocations during the request/job\n- how much of that peak is avoidable (extra copies, materialization)\n\nThe fixes that matter reduce peak, not just total. Chunking and streaming reduce peak.\n\n## Finding Leaks in Long-Running Services (The ‘Why Does It Keep Growing?‘ Problem)\n\nIf memory grows steadily across many requests, you have one of these situations:\n\n- a cache is growing without bound\n- a global container retains per-request objects\n- a background task is accumulating results\n- a queue is unbounded (memory is your queue)\n- references keep objects alive (common with closures, callbacks, or accidental global state)\n\n### My leak-hunting loop\n\n1) Identify the smallest reproducer: a single endpoint, a single job type, a single dataset\n2) Run it repeatedly in a loop and log RSS every N iterations\n3) Take tracemalloc snapshots every N iterations and compare\n4) Once I see a growing type or allocation site, inspect what‘s retaining it\n\n### A practical ‘repeat the request‘ harness\n\n import time\n\n\n def handlersimulation():\n # Replace with the endpoint logic you suspect.\n payload = {"rows": [str(i) for i in range(100000)]}\n return payload\n\n\n def main():\n for i in range(1, 501):\n handlersimulation()\n if i % 25 == 0:\n print(f"iteration={i}")\n time.sleep(0.01)\n\n\n if name == "main":\n main()\n\nThis doesn‘t prove a leak by itself, but it gives you a controlled loop to attach profilers to. The real win is removing external variability (traffic patterns, noisy neighbors, unrelated background tasks).\n\n### Common retention bugs I see\n\n- Storing request objects in globals for ‘debugging‘\n- Appending errors/exceptions to a global list\n- Logging frameworks buffering too much (especially if handlers queue logs)\n- A per-tenant cache keyed by tenant id that never evicts\n- ‘Just for metrics‘ labels containing unbounded user input (causing large in-memory series)\n\nIf you‘re debugging a leak, always ask: ‘What object is growing, and who is holding a reference to it?‘\n\n## Patterns for Pandas / NumPy / Data Science Workloads\n\nIf your MemoryError comes from analytics code, you often have different failure modes than typical web services. Two patterns show up constantly:\n\n1) Loading too much into a DataFrame at once\n2) Creating many intermediate arrays/frames in transformations\n\n### Chunked reads and incremental aggregation\n\nIf you‘re reading a large CSV, prefer chunked reading and aggregate incrementally. Even if you don‘t use a dedicated chunk API, you can stream rows and maintain only the aggregates you need.\n\nThe guiding question: do you need the raw rows at the end, or just a derived result? If you only need aggregates, don‘t keep rows.\n\n### Reduce intermediate copies\n\nMany ‘clean-looking‘ transformations create extra copies. A common example is chaining operations that each produce a new object. The fix is not to make code ugly; it‘s to be deliberate about which intermediate forms you actually need and when you can drop them.\n\nA practical trick: after a large intermediate is no longer needed, explicitly delete it and allow GC to reclaim Python objects. This doesn‘t always reduce RSS immediately (allocators may keep arenas mapped), but it often prevents further growth.\n\n### Use smaller dtypes intentionally\n\nEven outside pandas, numeric dtypes matter. If you‘re storing values that fit in 32-bit ints but you use 64-bit by default, you can easily double memory usage. Multiply that by several intermediate arrays and you get a crash.\n\nI treat dtype selection as a memory budget decision, not a micro-optimization.\n\n## Database and ORM Gotchas (Accidentally Loading the World)\n\nA shockingly common MemoryError is caused by an ORM query that materializes everything into Python objects. The query might be fast, and the database might handle it, but Python can‘t hold the result set.\n\n### Safer patterns I aim for\n\n- Use server-side cursors / streaming fetch when supported\n- Paginate by primary key rather than offset (offset pagination can get slow and encourages huge pages)\n- Select only necessary columns (fetching wide rows multiplies memory)\n- Convert to a compact representation early (tuples, arrays, dict with only required keys)\n\nEven if you don‘t change the DB layer, you can often fix the memory spike by changing the contract: ‘return a summary‘ rather than ‘return raw rows.‘\n\n## Async and Concurrency: How Good Intentions Create Memory Spikes\n\nConcurrency is a multiplier. If one request takes 300MB peak memory and you allow 10 concurrent requests of that type, you just designed a 3GB peak.\n\n### Common concurrency-driven memory failures\n\n- asyncio.gather over a huge list of coroutines (you create and retain them all)\n- downloading many files concurrently and buffering them in memory\n- reading from a queue faster than downstream processing\n\n### A bounded concurrency pattern\n\nIf I need concurrency, I cap it. The simplest form is a semaphore.\n\n import asyncio\n\n\n async def fetchone(i: int) -> int:\n await asyncio.sleep(0.01)\n return i\n\n\n async def boundedfetch(n: int, limit: int = 50) -> list[int]:\n sem = asyncio.Semaphore(limit)\n results: list[int] = []\n\n async def run(i: int):\n async with sem:\n return await fetchone(i)\n\n for i in range(n):\n # Schedule a limited number at a time rather than creating a giant list\n results.append(await run(i))\n\n return results\n\nThis example is intentionally simple, but the principle holds: control the number of in-flight tasks and the size of in-memory buffers.\n\n## Guardrails That Prevent Incidents (Input Limits, Circuit Breakers, and ‘No Surprises‘ Contracts)\n\nMost MemoryError incidents aren‘t just coding mistakes – they‘re missing guardrails. The system allowed an input that violated an implicit assumption.\n\n### Guardrails I put in front of memory-heavy endpoints\n\n- Maximum rows / maximum bytes / maximum time windows\n- Tenant-level quotas\n- Explicit pagination and caps (hard caps, not ‘recommended‘)\n- Refuse to build ‘full export in memory‘; generate asynchronously and store the result\n- Validate content types and compression ratios (zip bombs are real)\n\nA key mindset shift: if a request can generate an unbounded amount of data, it should probably not be a synchronous request.\n\n### A simple ‘fail fast‘ size check\n\nIf you know the output could be huge, check early. Don‘t do 90% of the work and then crash.\n\n def enforcemaxrows(rowcount: int, maxrows: int = 1000000) -> None:\n if rowcount > maxrows:\n raise ValueError(f"Too many rows: {rowcount} > {maxrows}")\n\nThis isn‘t sophisticated, but it prevents the worst-case scenario: a single request taking down the process.\n\n## Operational Playbook: How I Keep Teams Out of the Same Incident Loop\n\nFixing the code is half the work. The other half is making memory problems visible and hard to reintroduce.\n\n### What I monitor\n\n- RSS by process (and by pod/container if relevant)\n- restart counts (a memory bug often shows up as repeated restarts)\n- request/job types correlated with memory spikes\n- queue depth (unbounded queue depth often turns into unbounded memory)\n- latency + memory (they often move together when swapping starts)\n\n### What I log when memory errors happen\n\nIf a unit of work fails with MemoryError, I want enough context to reproduce it:\n\n- tenant/customer id\n- input size in bytes (or best available estimate)\n- row counts\n- feature flags and code version\n- a correlation id\n\nThen I fail the request/job cleanly. I do not try to limp onward.\n\n### What I change in architecture when the problem is structural\n\nSometimes the right fix is not a micro-optimization. It‘s a boundary change:\n\n- move large report generation to an async job\n- store intermediate results on disk/object storage\n- use a separate worker pool with a strict memory limit\n- split workloads per tenant to avoid one tenant taking down the fleet\n\nIf the business requirement is ‘export a lot of data,‘ the technical requirement becomes ‘do it without materializing it in one process.‘\n\n## Testing for Memory Regressions (So It Doesn‘t Come Back Next Month)\n\nI like adding at least one automated check for memory-heavy paths. It doesn‘t need to be perfect; it needs to catch big regressions.\n\n### What I test\n\n- A worst-case-ish input (within reason) that used to cause spikes\n- A repeated loop (e.g., run the same handler 200 times) to detect leaks\n- Peak memory staying under a threshold range (not a single exact number)\n\n### Why ranges matter\n\nMemory measurements vary across OS versions, allocator behavior, and dependency changes. If you set an unrealistically tight threshold, the test becomes noise and gets disabled. I prefer to catch regressions like ‘we doubled peak memory‘ rather than ‘we increased by 3MB.‘\n\n### A pragmatic approach\n\nEven without a fancy memory harness, you can do this:\n\n- run the heavy function N times\n- log RSS at the start, middle, end\n- fail if the trend is clearly upward\n\nIt won‘t catch every leak, but it catches the class of leak that grows linearly with requests – the one that wakes you up at 3am.\n\n## A Practical Checklist I Use When MemoryError Shows Up\n\nWhen I‘m tired and in incident mode, I want a short checklist that forces the right questions. This is mine:\n\n1) Is it a Python MemoryError or an OOM kill?\n2) Is the pattern a spike (one request/job) or a leak (steady growth)?\n3) What input triggered it (tenant, dataset, endpoint, job type)?\n4) Where is the big allocation happening (use tracemalloc snapshots)?\n5) Can I reduce peak memory by streaming/chunking/eliminating intermediates?\n6) Do I have bounded concurrency and bounded queues?\n7) Are caches bounded and keyed safely?\n8) Do we need guardrails (limits, async jobs, quotas) to prevent recurrence?\n\n## Closing Thought: Make the Safe Path the Default\n\nThe most reliable way I‘ve found to handle MemoryError isn‘t a clever trick. It‘s designing code so it cannot accidentally allocate unbounded memory. Streaming reads, chunked processing, bounded concurrency, bounded caches, and explicit input contracts turn memory from a constant fear into an engineering parameter you can reason about.\n\nWhen you do still hit a memory incident (and you will), the difference between a one-hour fix and a week-long mystery is measurement: know whether it‘s a spike, a leak, or a limit; log enough context to reproduce; and use profiling tools that show you where allocations come from. Once you build that muscle, MemoryError stops being a scary surprise and becomes just another bug class you know how to eliminate.

Scroll to Top