Multiprocessing in Python (Set 1): A Practical Introduction

You notice it the first time you profile a Python script that should be fast: one CPU core is pegged at 100%, the other cores are mostly idle, and your “parallel” attempt with threads barely changes the runtime. I run into this pattern constantly in real systems: data science feature builders, ETL transforms with heavy parsing, media processing, and batch scoring jobs. The code is correct, but it is stuck doing CPU work in a single interpreter process.\n\nMultiprocessing is the simplest way to turn that single-core ceiling into real parallel execution on multi-core machines. Instead of asking one Python process to juggle everything, you start multiple OS processes, each with its own Python interpreter and its own Global Interpreter Lock (GIL). That means CPU-bound Python code can finally run in parallel.\n\nI’m going to walk you through the first set of skills that actually matters in day-to-day work: what multiprocessing is, when it helps (and when it doesn’t), and how to write your first correct, cross-platform Process examples. You’ll leave with a mental model you can apply immediately, plus runnable code you can paste into a file and execute.\n\nThis is “multiprocessing python set 1” in the most practical sense: enough foundation to make correct programs, measure whether they help, and avoid the classic traps.\n\n## What “multiprocessing” really means (and what it does not)\nWhen I say “multiprocessing” in Python, I mean: running multiple operating-system processes that cooperate on a task. Each process has its own memory space and typically its own Python interpreter instance. That is the big difference from threads, which share one process and one interpreter.\n\nA common misunderstanding is to mix up these layers:\n\n- CPU cores and processors are hardware.\n- Processes and threads are OS scheduling units.\n- Python multiprocessing is a library that helps you spawn and coordinate processes.\n\nIf your machine has multiple cores (almost every developer laptop and cloud VM does in 2026), the OS can schedule different processes on different cores at the same time. That’s real parallelism.\n\nThe flip side is also important: multiple processes do not share memory by default. That is a feature (safety and isolation) and a cost (you need explicit data sharing or message passing).\n\nIf you keep just one sentence in your head, make it this: multiprocessing is parallel work through multiple processes, and processes are isolated unless you deliberately connect them.\n\n### A simple mental model I use\nI picture each process as a small, independent Python program:\n\n- It has its own globals.\n- It has its own imported modules.\n- It has its own heap and garbage collector.\n- It can crash without taking the parent down (unless you depend on it).\n\nThe parent process becomes more like an orchestrator: it spawns workers, gives them inputs, receives outputs, and decides what to do if something fails.\n\nThat last part is the hidden power: once you are comfortable with that orchestrator mindset, multiprocessing stops feeling like a trick and starts feeling like an architecture tool.\n\n## Why you reach for multiprocessing in Python\nI reach for multiprocessing when the work is CPU-bound and the runtime is dominated by Python bytecode execution: numeric transforms, parsing, pure-Python loops, hashing, text processing, compression, and many “data pipeline” steps that feel like I/O but actually spend most time in CPU.\n\nHere’s the intuition I use:\n\n- If your program mostly waits (network calls, disk I/O, database queries), multiprocessing can help, but it’s often not my first choice.\n- If your program mostly computes (tight loops, parsing, encryption, heavy transforms), multiprocessing is usually the first tool I try.\n\n### The GIL is the reason CPU-bound threads disappoint\nIn CPython (the standard Python implementation), the GIL ensures only one thread executes Python bytecode at a time within a single process. Threads still make sense for I/O-heavy tasks, but if you’re burning CPU in Python code, threads often don’t speed things up.\n\nMultiprocessing sidesteps that because each process has its own interpreter and its own GIL.\n\nA subtle nuance I keep in mind: some C-extensions (NumPy, parts of compression libraries, image libraries) release the GIL while doing heavy work in C. In those cases, threads can sometimes scale even for “CPU-heavy” workloads because the CPU-heavy portion isn’t running as Python bytecode. If you’re not sure, measure.\n\n### A quick chef analogy (because it sticks)\nIf you’re alone in a kitchen doing five CPU-heavy tasks (kneading, chopping, mixing, timing), you can switch between them quickly, but you still only have one pair of hands. Threads are like trying to be faster at switching between bowls.\n\nMultiprocessing is hiring assistants. Now there are multiple pairs of hands (cores) and multiple people (processes) working at the same time.\n\n### A quick profiling checklist before you parallelize\nBefore I jump to multiprocessing, I do three small checks because they save me time:\n\n1) Is the workload actually CPU-bound?\n – If overall CPU usage is low and the program is slow, you’re probably waiting on I/O or locks.\n\n2) Is the hot code in Python or in native code?\n – If the hotspot is a C-extension that releases the GIL, threads might be enough.\n\n3) Is the task “embarrassingly parallel”?\n – If the work splits into independent chunks (files, rows, documents, frames), multiprocessing is a natural fit.\n – If every chunk needs shared mutable state, it’s still possible, but you’ll spend most of your effort on coordination rather than speed.\n\n## Choosing between asyncio, threads, and processes\nWhen someone asks me “what should I use,” I give a direct guideline:\n\n- asyncio: best when you have thousands of concurrent I/O waits (HTTP calls, sockets) and you want low overhead.\n- threads: best when you have blocking I/O or libraries that don’t expose async APIs.\n- processes (multiprocessing): best when you need real parallel CPU execution for Python code.\n\nHere’s a practical comparison you can keep nearby.\n\n

Approach

What runs in parallel?

Best for

What I watch out for

—

asyncio

I/O waits overlap; CPU does not run in parallel

High-concurrency networking

CPU-heavy callbacks block the event loop

threading

OS threads, but Python bytecode is GIL-limited

Blocking I/O, C extensions that release GIL

Race conditions, shared state bugs

multiprocessing

True CPU parallelism via multiple processes

CPU-bound Python code, batch compute

Data copying overhead, pickling limits\n\nIf you’re unsure, I recommend you do one measurement: run a representative workload and watch CPU usage. If you’re stuck at ~100% on an N-core box and you expected more, multiprocessing is usually the next move.\n\n### When multiprocessing is not the right answer\nThis is the part I wish more tutorials emphasized early, because it prevents disappointment:\n\n- Your task is dominated by network I/O: spawning processes won’t make the network faster. Use async or threads (or better batching/backpressure).\n- Your task is dominated by disk I/O: processes can help if parsing/decoding is CPU-heavy, but if you are truly disk-bound you may just create contention.\n- Your tasks are tiny: if each unit of work is a few milliseconds, overhead can erase any speedup. You need chunking or a different approach.\n- You need shared in-memory state: you can share data, but it becomes an engineering problem (locks, managers, shared memory). Sometimes a single-process optimized algorithm is faster and simpler.\n\n## Your first correct Process: target, args, start, join\nThe core API in multiprocessing is intentionally small. The first thing I teach is how to start two independent processes and then wait for them.\n\nSave this as mpset1basic.py and run it with python mpset1basic.py.\n\n import multiprocessing\n\n\n def printsquare(n: int) -> None:\n # This runs in a child process.\n print(f‘square({n}) = {n n}‘)\n\n\n def printcube(n: int) -> None:\n # This runs in a child process.\n print(f‘cube({n}) = {n n n}‘)\n\n\n if name == ‘main‘:\n # Create process objects.\n p1 = multiprocessing.Process(target=printsquare, args=(10,))\n p2 = multiprocessing.Process(target=printcube, args=(10,))\n\n # Start both processes.\n p1.start()\n p2.start()\n\n # Wait for both processes to finish.\n p1.join()\n p2.join()\n\n print(‘Done!‘)\n\nA few details matter more than they look:\n\n- target=... is the callable the process will run.\n- args=(10,) is a tuple. The trailing comma is required for a one-item tuple.\n- start() actually launches the child process.\n- join() blocks until that process finishes.\n\n### Why the if name == ‘main‘: guard is not optional\nOn macOS and Windows (and on Linux when you choose certain start methods), process creation uses a “spawn” style: it starts a fresh Python interpreter and imports your module. If you create processes at import time, you can accidentally create infinite child spawning.\n\nI treat the main guard as a hard rule:\n\n- Put process creation and start() calls behind it.\n- Put top-level configuration and constants outside it.\n\nIf you follow that habit from day one, you avoid a whole class of bugs.\n\n### What output ordering should you expect?\nNotice that two processes run concurrently. That means their print() output can appear in different orders between runs. Don’t write tests that assume a fixed order unless you add synchronization.\n\n### A tiny upgrade: naming your processes\nWhen I’m debugging, names matter. You can add name=‘...‘ and then log currentprocess().name inside the worker. It’s one of those “small now, huge later” habits.\n\n import multiprocessing as mp\n\n\n def worker(n: int) -> None:\n me = mp.currentprocess()\n print(f‘[{me.name}] got n={n}‘)\n\n\n if name == ‘main‘:\n p = mp.Process(target=worker, args=(123,), name=‘parser-0‘)\n p.start()\n p.join()\n\n## Seeing what runs where: PIDs, names, liveness, exit codes\nA second program I always run early is one that prints process IDs. This makes multiprocessing feel concrete.\n\nSave as mpset1pids.py:\n\n import multiprocessing\n import os\n import time\n\n\n def worker(label: str) -> None:\n print(f‘[{label}] pid={os.getpid()} parent={os.getppid()}‘)\n time.sleep(0.2)\n\n\n if name == ‘main‘:\n print(f‘[main] pid={os.getpid()}‘)\n\n p1 = multiprocessing.Process(target=worker, args=(‘worker-1‘,), name=‘worker-1‘)\n p2 = multiprocessing.Process(target=worker, args=(‘worker-2‘,), name=‘worker-2‘)\n\n p1.start()\n p2.start()\n\n print(f‘[main] started p1 pid={p1.pid} name={p1.name}‘)\n print(f‘[main] started p2 pid={p2.pid} name={p2.name}‘)\n\n # While children are sleeping, they are usually alive.\n print(f‘[main] p1 alive? {p1.isalive()}‘)\n print(f‘[main] p2 alive? {p2.isalive()}‘)\n\n p1.join()\n p2.join()\n\n print(f‘[main] p1 exitcode={p1.exitcode} alive? {p1.isalive()}‘)\n print(f‘[main] p2 exitcode={p2.exitcode} alive? {p2.isalive()}‘)\n print(‘[main] done‘)\n\nWhat I want you to notice:\n\n- os.getpid() is different in each process.\n- p1.pid is set after the process starts.\n- isalive() is a cheap way to check process state.\n- exitcode is 0 on a normal success. If your worker crashes, you’ll often see a non-zero exit code.\n\n### A practical debugging habit\nWhen a multiprocessing job “hangs,” I print:\n\n- PIDs (so I can inspect them in the OS if needed)\n- isalive()\n- exitcode\n\nThis alone usually tells me whether I’m stuck waiting on .join(), stuck in a deadlock in IPC, or crashing in a child process.\n\n### One more thing: failures don’t automatically raise in the parent\nThis surprises people: if a child process throws an exception, the parent won’t automatically re-raise it just because you called join(). You’ll get a non-zero exitcode, and any traceback is printed to that child’s stderr.\n\nFor manual Process workflows, that’s why I always check exitcode and treat non-zero as failure. With Pool and ProcessPoolExecutor, exceptions can be propagated back to the parent more conveniently (we’ll get there later).\n\n## The hidden cost: process startup and data transfer\nMultiprocessing gives you parallel CPU, but there are two costs you should always think about.\n\n### 1) Startup overhead\nStarting a process is not free. On modern machines, it’s often in the few-milliseconds to few-tens-of-milliseconds range per process, depending on OS, Python version, imports, and start method.\n\nThat means:\n\n- If each task is tiny (like 1–5 ms), spawning a process per task can be slower than a simple loop.\n- If each task is heavy (like 200 ms+ or seconds), process overhead becomes noise.\n\nIn practice, I batch small tasks into chunks or use a worker pool rather than spawning thousands of short-lived processes.\n\n### 2) Data passing overhead (pickling)\nWhen you pass arguments to a child process, Python typically serializes them (pickles them). That has implications:\n\n- Objects must be picklable.\n- Large objects can be expensive to serialize and copy.\n\nIf you try to pass a huge pandas DataFrame to every worker, you can spend more time copying than computing. For a first pass, keep arguments simple: numbers, strings, small dicts, and file paths.\n\nLater, if you need shared memory or zero-copy patterns, you can reach for multiprocessing.sharedmemory, NumPy shared buffers, or external storage, but I don’t start there.\n\n### A rule of thumb I actually use in production\nIf I can describe the task input as “a handful of scalars plus a filename,” multiprocessing tends to be pleasant. If the task input is “a huge Python object graph,” multiprocessing tends to be painful unless I redesign the data flow.\n\n## Start methods: spawn, fork, forkserver (and why you should care)\nEven in a “basic” multiprocessing workflow, the process start method shapes behavior.\n\n### The three start methods\n- spawn: starts a fresh interpreter process and imports your module. Most predictable across platforms.\n- fork: copies the parent process memory (copy-on-write). Fast startup, but can interact badly with threads and some native libraries.\n- forkserver: starts a server process and forks from that server. A middle ground for certain workloads.\n\nIn 2026, I default mentally to: assume spawn-like behavior unless you know you’re on Linux and you intentionally choose fork. That mindset keeps your code portable.\n\n### Checking (and setting) the method\nYou can inspect the start method like this:\n\n import multiprocessing as mp\n\n\n if name == ‘main‘:\n print(mp.getstartmethod())\n\nAnd you can set it early (before creating any processes):\n\n import multiprocessing as mp\n\n\n if name == ‘main‘:\n mp.setstartmethod(‘spawn‘, force=True)\n print(mp.getstartmethod())\n\nI’m not telling you to set it in every script. I’m telling you to know it exists, because it explains a lot of “works on Linux, breaks on macOS” stories.\n\n### What breaks most often under spawn\n- Defining worker functions inside other functions (they can become unpicklable in some cases).\n- Capturing non-picklable state in closures.\n- Starting processes from module import code.\n\nA simple rule that saves time: define worker callables at module top-level and keep the arguments plain.\n\n### A note about fork and “it was faster on my Linux box”\nIf you benchmark on Linux with fork, you might see impressive startup speed and lower memory usage because of copy-on-write behavior. That can be real. But I treat it as an optimization step, not my starting point, because fork can surface weird issues when the parent process has threads, open network sockets, or certain native runtime state.\n\nWhen I want code that’s portable and predictable, I design for spawn first and then optionally tune start methods per environment later.\n\n## Scaling beyond two processes: Pool for batches of work\nCreating Process objects manually is great for learning and for small fixed fan-outs. For “do this 10,000 times” jobs, I reach for a pool.\n\nHere’s a runnable example that counts primes in multiple ranges. It’s intentionally CPU-heavy so you can observe real speedups on a multi-core machine.\n\nSave as mpset1poolprimes.py:\n\n import multiprocessing as mp\n import os\n from dataclasses import dataclass\n\n\n def isprime(n: int) -> bool:\n if n < 2:\n return False\n if n == 2:\n return True\n if n % 2 == 0:\n return False\n limit = int(n 0.5)\n d = 3\n while d <= limit:\n if n % d == 0:\n return False\n d += 2\n return True\n\n\n @dataclass(frozen=True)\n class PrimeJob:\n start: int\n end: int\n\n\n def countprimes(job: PrimeJob) -> tuple[PrimeJob, int, int]:\n # Return job + count + pid so you can see distribution.\n cnt = 0\n for x in range(job.start, job.end):\n if isprime(x):\n cnt += 1\n return job, cnt, os.getpid()\n\n\n if name == ‘main‘:\n # Split work into coarse chunks so per-task overhead stays small.\n jobs = [\n PrimeJob(10000000, 10020000),\n PrimeJob(10020000, 10040000),\n PrimeJob(10040000, 10060000),\n PrimeJob(10060000, 10080000),\n ]\n\n workers = min(4, os.cpucount() or 1)\n print(f‘[main] cpucount={os.cpucount()} workers={workers}‘)\n\n with mp.Pool(processes=workers) as pool:\n results = pool.map(countprimes, jobs)\n\n total = 0\n for job, cnt, pid in results:\n print(f‘[worker pid={pid}] primes in [{job.start}, {job.end}) = {cnt}‘)\n total += cnt\n\n print(f‘[main] total primes found across jobs = {total}‘)\n\nA few practical notes:\n\n- Pool.map() is the easiest entry point: one function, one iterable of inputs.\n- I keep each job big enough that overhead is not dominating.\n- Returning the PID helps you confirm that multiple processes were used.\n\n### Why Pool is the “default” for batch compute\nWith a pool, you pay startup overhead once for N workers, then feed them many jobs. That’s the core pattern behind many production batch systems.\n\nIf you later need more control, Pool.imapunordered() can start yielding results earlier, which is great for long-running batches.\n\n### Chunking: the lever that decides whether multiprocessing helps\nOne of the most practical skills in multiprocessing set 1 is understanding chunking. If you have 100,000 small tasks, handing them to a process pool one-by-one can drown you in overhead.\n\nWhat I do instead:\n\n- Combine many small items into one “work unit” (a chunk).\n- Each process handles a chunk and returns aggregated results.\n\nFor example, if you’re parsing 1,000,000 lines of text, don’t send each line as a separate job. Send “a list of 10,000 lines” as a job.\n\nIt’s not glamorous, but chunking is often the difference between “multiprocessing was slower” and “multiprocessing cut runtime in half.”\n\n## Common mistakes I see (and how I avoid them)\nMost multiprocessing pain is not “hard,” it’s just unfamiliar. Here are the issues I see most often and the rule I follow to prevent each one.\n\n### Mistake 1: Forgetting the main guard\nSymptom: infinite child spawning on macOS/Windows, or a script that re-runs setup code in every worker.\n\nFix: process creation goes inside:\n\n if name == ‘main‘:\n …\n\n### Mistake 2: Passing unpicklable objects\nSymptom: errors like AttributeError: Can‘t pickle local object ....\n\nFix: keep worker functions at module top-level, and pass simple data (ints, strings, dataclasses, dicts with simple values). If you need to share complex state, pass an ID and load the resource inside the worker.\n\n### Mistake 3: Expecting shared state to “just work”\nSymptom: you mutate a global list in a worker and the parent never sees it.\n\nFix: assume isolation. Use explicit channels: return values, queues, pipes, or shared memory primitives.\n\n### Mistake 4: Printing from many processes and assuming it will be clean\nSymptom: logs look jumbled, lines interleave, or output appears “missing” when buffering is involved.\n\nFix: treat multiprocessing output like distributed logs. Options that work well:\n\n- Keep prints minimal, especially inside tight loops.\n- Add a clear prefix per process ([worker-3 pid=...]).\n- For serious jobs, have workers send log messages to the parent via a queue and let the parent print them in one place.\n\n### Mistake 5: Creating too many processes\nSymptom: high memory usage, thrashing, poor performance, or the OS spends more time scheduling than computing.\n\nFix: start with processes=os.cpucount() (or slightly less if you need headroom) and only go beyond that if you have a specific reason. For CPU-bound work, “more processes than cores” rarely helps.\n\n### Mistake 6: Not measuring, then guessing\nSymptom: you add multiprocessing, complexity increases, but runtime doesn’t improve (or gets worse).\n\nFix: I build a tiny measurement harness and compare:\n\n- single-process baseline\n- multiprocessing version\n- different worker counts\n- different chunk sizes\n\nI don’t chase perfect benchmarking, just directionally correct data.\n\n## A minimal timing harness you can reuse\nWhen I’m experimenting, I like a small pattern that prints elapsed time without introducing a full benchmarking framework.\n\n import time\n\n\n class Timer:\n def init(self, label: str):\n self.label = label\n self.t0 = 0.0\n\n def enter(self):\n self.t0 = time.perfcounter()\n return self\n\n def exit(self, exctype, exc, tb):\n dt = time.perfcounter() – self.t0\n print(f‘[{self.label}] {dt:.3f}s‘)\n\nThen I can do:\n\n with Timer(‘single-process‘):\n runsingle()\n\n with Timer(‘multiprocessing‘):\n runmp()\n\nThe goal in set 1 is not micro-optimizing; it’s answering: “Did my parallel version actually use multiple cores, and did it actually finish faster?”\n\n## Getting results back: return values vs queues (the first real design choice)\nOnce you go beyond toy examples, you quickly ask: “How do I get outputs back from workers?”\n\nIn multiprocessing, I think of two broad patterns:\n\n1) Return values (map-style)\n – You give each job to a pool function and it returns a value.\n – This is simplest when results are not huge.\n\n2) Message passing (queue/pipe-style)\n – Workers push results (and/or logs) to a queue; the parent consumes them.\n – This is better when you want streaming behavior, progress reporting, or early cancellation.\n\n### Map-style return values (simplest)\nYou already saw pool.map(countprimes, jobs). This scales nicely when each job has a clear input and output.\n\nWhere it becomes less ideal:\n\n- results are enormous (you’ll serialize a lot back to the parent)\n- you want to emit partial results while a long job is running\n- you need dynamic work generation\n\n### Queue-style results (more control)\nHere’s the smallest “queue” example I consider useful, because it shows the pattern without too much ceremony.\n\nSave as mpset1queueresults.py:\n\n import multiprocessing as mp\n import os\n\n\n def worker(n: int, outq: ‘mp.Queue[tuple[int, int]]‘) -> None:\n # Put (n, pid) so parent can confirm parallelism.\n outq.put((n n, os.getpid()))\n\n\n if name == ‘main‘:\n outq: ‘mp.Queue[tuple[int, int]]‘ = mp.Queue()\n\n procs = [mp.Process(target=worker, args=(i, outq)) for i in range(5)]\n for p in procs:\n p.start()\n\n # Collect exactly N results (one per worker).\n results = [outq.get() for in procs]\n\n for p in procs:\n p.join()\n\n print(‘results:‘, results)\n\nWhy I like queues in real scripts:\n\n- they decouple producers from consumers\n- the parent can keep running while workers work\n- you can add a “poison pill” sentinel to signal shutdown\n\nBut I also treat queues as a coordination tool that can deadlock if misused. In set 1, the safe pattern is: always know who is responsible for putting how many items, and always know who is responsible for consuming them.\n\n## Process lifecycle basics: stopping, timeouts, and cleanup\nIn the real world, processes don’t always finish nicely. Maybe a worker gets stuck, or an input file is corrupted and triggers a slow path, or you need to enforce an SLA.\n\nA few lifecycle concepts matter early:\n\n- join(timeout=...): wait up to some time\n- terminate(): ask the OS to stop the process\n- kill(): forcefully stop (available on many platforms; behavior varies)\n- daemon=True: the process is tied to parent lifetime (not a general solution, but useful to know)\n\n### A simple timeout pattern\nI use this when a worker might hang and I’d rather fail fast than wait forever.\n\n import multiprocessing as mp\n import time\n\n\n def slow() -> None:\n time.sleep(10)\n\n\n if name == ‘main‘:\n p = mp.Process(target=slow)\n p.start()\n\n p.join(timeout=0.5)\n if p.isalive():\n print(‘worker timed out; terminating‘)\n p.terminate()\n p.join()\n\n print(‘exitcode:‘, p.exitcode)\n\nThis is not “elegant cancellation,” but it is often the difference between a stuck batch job and a job that fails predictably.\n\n## Practical scenarios: where multiprocessing shines (and how I structure it)\nHere are patterns I repeatedly use in production-style scripts.\n\n### Scenario 1: Many independent files\nIf the work is “do CPU-heavy parsing on each file,” multiprocessing is almost perfect. The input can be file paths (small, picklable) and each worker loads and processes its own file.\n\nThe structure I like:\n\n- parent enumerates file paths\n- pool maps processfile(path)\n- worker reads file, parses, returns small summary (counts, metrics, output path)\n\nThis avoids shipping big in-memory objects across process boundaries.\n\n### Scenario 2: CPU-heavy transforms on a big list\nIf the work is “transform each item,” I do one more step: chunking.\n\n- parent chunks list into batches of size 500–5000 (depends on cost per item)\n- pool maps processbatch(batch)\n- worker loops in-process over that batch and returns aggregate results\n\nThis reduces scheduling overhead dramatically.\n\n### Scenario 3: Producer-consumer pipelines\nIf the work is “read inputs (I/O) then compute (CPU) then write outputs (I/O),” I sometimes use processes to separate stages, but I’m careful. Pipeline parallelism is real, but coordination complexity can explode.\n\nFor set 1, my advice is: start with a single pool that does “read + compute + write” per unit (usually per file). Only split into a pipeline when measurement proves it’s necessary.\n\n## Performance considerations you can act on immediately\nYou don’t need a PhD in performance tuning to get value out of multiprocessing, but you do need to watch a few knobs.\n\n### Worker count\nFor CPU-bound tasks, I usually start with:\n\n- workers = os.cpucount()\n- or workers = os.cpucount() - 1 to leave headroom\n\nThen I test a couple of values around it. If performance is flat after a point, stop increasing.\n\n### Task size (again: chunking)\nIf each task is too small, overhead dominates. If each task is too large, you can get poor load balancing (one worker gets the “hard chunk” and everyone else finishes early).\n\nThe sweet spot is when:\n\n- each task takes long enough to amortize overhead\n- but short enough that work distributes evenly\n\nI often aim for “hundreds of milliseconds to a few seconds per task” as a starting point, then adjust.\n\n### Serialization costs\nIf the data you’re passing around is large, you can lose your speedup to pickling and copying. If I see that happening, I redesign the boundary:\n\n- pass paths/IDs instead of objects\n- read data inside the worker\n- write output to disk and return only metadata\n\nThis can feel old-school, but it works.\n\n## Alternative approach in the same spirit: concurrent.futures\nEven though this is focused on multiprocessing, it’s worth knowing there’s another standard-library interface that many people find cleaner: concurrent.futures.ProcessPoolExecutor.\n\nWhy I mention it in set 1:\n\n- it has a simple “submit / futures” model\n- exceptions are naturally propagated when you call future.result()\n- it composes nicely with timeouts\n\nConceptually it’s the same idea (multiple processes), just a different API. If you understand the basics in this article, you can use either tool effectively.\n\n## Production considerations (lightweight, but real)\nIf you take multiprocessing from “toy script” to “daily job,” a few operational details matter.\n\n### Logging\nI try not to let every process write to the same log file without a plan. For simple cases, stdout is fine. For serious jobs:\n\n- workers push log messages to a queue\n- parent writes logs (or uses a logging handler)\n\nThis avoids interleaving and makes log ordering easier to reason about.\n\n### Monitoring and progress\nPeople love progress bars; multiprocessing makes them trickier. In set 1, the simplest progress indicator is counting completed results in the parent. If you use imapunordered, you can increment a counter as results arrive.\n\n### Memory usage\nMultiple processes multiply baseline memory. If your parent imports a large stack of libraries, each spawned worker may also import them. On some platforms and start methods you can share pages copy-on-write, but I never assume it. If memory is tight, reduce worker count or reduce imports in the worker path.\n\n## Expansion Strategy\nAdd new sections or deepen existing ones with:\n- Deeper code examples: More complete, real-world implementations\n- Edge cases: What breaks and how to handle it\n- Practical scenarios: When to use vs when NOT to use\n- Performance considerations: Before/after comparisons (use ranges, not exact numbers)\n- Common pitfalls: Mistakes developers make and how to avoid them\n- Alternative approaches: Different ways to solve the same problem\n\n## If Relevant to Topic\n- Modern tooling and AI-assisted workflows (for infrastructure/framework topics)\n- Comparison tables for Traditional vs Modern approaches\n- Production considerations: deployment, monitoring, scaling\n\n### My closing checklist for multiprocessing set 1\nWhen I’m teaching someone (or reminding myself) how to use multiprocessing correctly, I end with this checklist:\n\n- Is the workload truly CPU-bound?\n- Did I keep worker functions top-level and inputs picklable?\n- Did I put process creation behind if name == ‘main‘:?\n- Did I choose a reasonable worker count (near core count)?\n- Did I choose task sizes that amortize overhead (chunking)?\n- Did I measure a baseline and confirm multi-core usage?\n\nIf you can answer “yes” to those, you’re already in the top tier of people using multiprocessing effectively, because most failures come from skipping one of those fundamentals.\n\nFrom here, the next natural steps are learning structured ways to pass results (queues, managers), using pools for large batches with good chunk sizing, and handling exceptions/timeouts in a way that makes your batch jobs robust. That’s where multiprocessing stops being a neat trick and becomes a tool you can trust.

You maybe like,

Related Posts