Report this

What is the reason for this report?

Python Multiprocessing Example: Process, Pool & Queue

Updated on March 31, 2026
Python Multiprocessing Example: Process, Pool & Queue

Introduction

This tutorial walks through the multiprocessing module with runnable patterns for the Process class, worker pools, queues, locks, shared primitives, and inter-process pipes. You build a mental model of the CPython global interpreter lock, process start methods, and where parallelism trades memory for wall-clock time. After working the examples, you can split CPU-heavy Python across cores, coordinate workers without data races, and interpret timing results to choose the right pool size for your workload. If you still need a baseline Python install on Linux, start with Install Python 3 on Ubuntu.

Key Takeaways

  • multiprocessing sidesteps the Python GIL for CPU-bound numeric work by running separate interpreters that execute Python bytecode in parallel.
  • A Process wraps one callable in its own interpreter lifecycle: start() schedules it, join() waits for a clean exit, and terminate() forces shutdown when cooperative exit is not an option.
  • Pool plus map, starmap, or apply_async distributes a function over many inputs; pick starmap when each task supplies multiple positional arguments.
  • Queue and Pipe move pickled messages between processes; Value, Array, and Manager expose shared state with different safety and complexity trade-offs.
  • Lock serializes critical sections so shared counters and logs stay consistent when several processes update the same memory proxy.
  • Start method (spawn, fork, forkserver) decides how interpreters boot; spawn is the default on Windows and recent macOS, which makes the if __name__ == "__main__": guard mandatory for runnable entry points.
  • Performance is workload-specific: pools speed up CPU-bound loops until you saturate cores, then overhead and memory pressure flatten gains.

What Is Python Multiprocessing and When Should You Use It

Python multiprocessing runs separate operating-system processes, each with its own Python interpreter, then coordinates them with primitives that look and feel like threading but cross process boundaries instead of sharing one interpreter. That layout matters because CPython serializes execution of Python bytecode in a single process with the global interpreter lock, which limits the benefit of threads for CPU-heavy Python code.

Workload type multiprocessing threading asyncio
CPU-bound Python loops Strong fit: true multi-core parallelism Weak: GIL blocks parallel bytecode Weak for CPU: one thread, cooperative tasks
I/O-bound waits (network, disk) Possible, higher start-up cost Good fit: blocking waits release the GIL Often the lowest overhead fit
Mixed pipelines Profile first; combine models if needed Combine threads with I/O-bound waits Use tasks for waits, processes for hot loops

Official reference: multiprocessing in the Python standard library.

How Python Multiprocessing Differs from Threading

The core difference is memory: threads share it, processes do not. That single fact drives every other trade-off between the two models. Processes duplicate interpreter state at start-up (depending on start method), isolate memory by default, and communicate by pickling messages through Queue, Pipe, or Manager proxies. For a thread-centric API tour, read the Python threading tutorial; python multiprocessing vs threading debates usually hinge on whether the bottleneck is CPU in Python bytecode or time spent waiting on external systems.

The Global Interpreter Lock (GIL) and Why It Matters

The GIL is a mutex inside CPython that allows only one thread to execute Python bytecode at a time, even on a machine with many cores. That means two threads in the same process cannot run Python instructions simultaneously. The practical consequence: a CPU-bound loop that stays in Python gets no speed benefit from threading, because the second thread waits while the first holds the lock.

The script below demonstrates the ceiling. Two threads split a counting loop; two processes split the same loop. Compare the wall-clock times.

# Tested on Python 3.11
import time
import threading
from multiprocessing import Process


def count(n):
    total = 0
    for i in range(n):
        total += i
    return total


def run_threads(n):
    t1 = threading.Thread(target=count, args=(n,))
    t2 = threading.Thread(target=count, args=(n,))
    t0 = time.perf_counter()
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    return time.perf_counter() - t0


def run_processes(n):
    p1 = Process(target=count, args=(n,))
    p2 = Process(target=count, args=(n,))
    t0 = time.perf_counter()
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    return time.perf_counter() - t0


if __name__ == "__main__":
    n = 5_000_000
    print(f"threads:   {run_threads(n):.3f}s")
    print(f"processes: {run_processes(n):.3f}s")
threads:   0.847s
processes: 0.461s

The thread version is slower than a single-threaded run on most hardware because threads fight over the GIL and pay context-switch overhead without gaining parallelism. The process version runs both halves simultaneously on separate cores. C extensions such as NumPy release the GIL during heavy array operations, so the ceiling does not apply uniformly to all Python code, but it applies to everything written in pure Python.

When Multiprocessing Is the Right Choice

Reach for multiprocessing when profiling shows CPU time dominated by Python-level computation that does not release the GIL, or when you must isolate failure (one crashing worker does not tear down siblings). Skip it when tasks are tiny relative to process start cost, when you already saturate the box with one process, or when shared mutable Python objects would force complex synchronization: in those cases threading or asyncio may spend less time on orchestration. For running non-Python programs or shell pipelines, compare this module with subprocess.

Python Multiprocessing Process Class Example

Use Process directly when you need exact control over a single worker’s lifecycle: starting it, waiting for it, or forcing it to stop. A pool abstracts that lifecycle away, which is convenient for batch jobs but makes bespoke supervision harder.

Creating and Starting a Process

You construct Process with target plus optional args and kwargs, then call start() to fork or spawn the worker. Nothing runs until start() schedules the new interpreter entry point.

multiprocessing.cpu_count() reports how many logical CPUs the operating system exposes to Python, which helps size pools; the integer differs by hardware.

# Tested on Python 3.11
import multiprocessing


def main():
    print("Number of cpu :", multiprocessing.cpu_count())


if __name__ == "__main__":
    main()
Number of cpu : 8
# Tested on Python 3.11
from multiprocessing import Process


def print_func(continent="Asia"):
    print("The name of continent is : ", continent)


if __name__ == "__main__":
    names = ["America", "Europe", "Africa"]
    procs = []
    proc = Process(target=print_func)
    procs.append(proc)
    proc.start()

    for name in names:
        proc = Process(target=print_func, args=(name,))
        procs.append(proc)
        proc.start()

    for proc in procs:
        proc.join()
The name of continent is :  Asia
The name of continent is :  America
The name of continent is :  Europe
The name of continent is :  Africa

Joining and Terminating Processes

join() blocks until the child exits cleanly, which is how you avoid exiting the parent while workers still hold resources. terminate() sends SIGTERM on POSIX (the Windows implementation uses TerminateProcess), which helps when code ignores cooperative shutdown; expect arbitrary interruption, so prefer join() whenever workers can finish their own loops.

# Tested on Python 3.11
import time
from multiprocessing import Process


def slow_worker():
    time.sleep(60)


if __name__ == "__main__":
    p = Process(target=slow_worker)
    p.start()
    time.sleep(0.1)
    p.terminate()
    p.join()
    print("exitcode after terminate:", p.exitcode)
exitcode after terminate: -15

POSIX systems often report -15 (SIGTERM). Windows exit codes differ; treat nonzero values as forced shutdown rather than success.

Passing Arguments to a Process

Use args for positional parameters (remember the trailing comma for a single argument) and kwargs for keyword parameters. Pickle still applies, so targets should stay picklable under your selected start method.

# Tested on Python 3.11
from multiprocessing import Process


def greet(name, punctuation="."):
    msg = f"hello {name}{punctuation}"
    print(msg)


if __name__ == "__main__":
    p = Process(target=greet, args=("sam",), kwargs={"punctuation": "!"})
    p.start()
    p.join()
hello sam!

Python Multiprocessing Pool Example

A Pool maintains a fixed set of worker processes and recycles them across tasks, so you pay the process start cost once rather than once per job. Use it when you have many independent units of work and want ordered results back without managing individual process lifecycles.

Using Pool.map() for Parallel Iteration

Pool.map applies one callable to every element of an iterable, preserving result order while work runs in parallel across workers. If you materialize inputs with a comprehension, you can still feed a generator pipeline on the parent side by batching work, as described in How To Use Generators in Python 3.

# Tested on Python 3.11
from multiprocessing import Pool


def square(x):
    return x * x


if __name__ == "__main__":
    with Pool(4) as pool:
        out = pool.map(square, range(6))
    print(out)
[0, 1, 4, 9, 16, 25]

By default pool.map sends one item at a time to each worker. For large iterables this creates overhead because the parent dispatches thousands of individual tasks. Pass chunksize to batch items into groups: each worker receives chunksize elements per dispatch call rather than one, which reduces scheduling overhead at the cost of less even load distribution when task durations vary.

# Tested on Python 3.11
# Uses square() defined in the example above
from multiprocessing import Pool

if __name__ == "__main__":
    with Pool(4) as pool:
        out = pool.map(square, range(12), chunksize=3)
    print(out)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

Python Multiprocessing For Loop Example with Pool

To parallelize a for loop, build the input list with a comprehension or range and pass it to pool.map. The parent iterates once to construct the list; workers process elements in parallel.

# Tested on Python 3.11
from multiprocessing import Pool


def job(n):
    return n, n ** 3


if __name__ == "__main__":
    inputs = [k for k in range(4)]
    with Pool(2) as pool:
        results = pool.map(job, inputs)
    print(results)
[(0, 0), (1, 1), (2, 8), (3, 27)]

Using Pool.apply_async() for Non-Blocking Execution

apply_async schedules one call without blocking the parent and returns an AsyncResult. Call .get() when you need the result. Pass a timeout argument to .get() to avoid hanging indefinitely if a worker stalls.

# Tested on Python 3.11
from multiprocessing import Pool


def add(a, b):
    return a + b


if __name__ == "__main__":
    with Pool(2) as pool:
        res = pool.apply_async(add, (2, 3))
        other = pool.apply_async(add, (40, 2))
        print("first:", res.get())
        print("second:", other.get())
first: 5
second: 42

Use timeout on .get() to cap how long the parent waits for a result.

# Tested on Python 3.11
import time
from multiprocessing import Pool


def slow_task(x):
    time.sleep(x)
    return x


if __name__ == "__main__":
    with Pool(2) as pool:
        res = pool.apply_async(slow_task, (5,))
        try:
            print(res.get(timeout=1))
        except Exception as e:
            print("timed out:", type(e).__name__)
timed out: TimeoutError

Python Multiprocessing Starmap Example

starmap unpacks each tuple in the iterable as positional arguments, so a function that takes multiple parameters does not need a wrapper. Use it instead of map when each task naturally carries more than one argument.

# Tested on Python 3.11
from multiprocessing import Pool


def label_power(label, exponent):
    return f"{label} ^ {exponent} = {2 ** exponent}"


if __name__ == "__main__":
    work = [("A", 1), ("B", 2), ("C", 3), ("D", 4)]
    with Pool(2) as pool:
        out = pool.starmap(label_power, work)
    print(out)
['A ^ 1 = 2', 'B ^ 2 = 4', 'C ^ 3 = 8', 'D ^ 4 = 16']

Python Multiprocessing Queue Example

A multiprocessing.Queue is not the same as queue.Queue from the standard library. The standard library version is an in-memory data structure safe only for threads in the same process. The multiprocessing version uses an OS pipe and a background feeder thread to move pickled objects across process boundaries. Use multiprocessing.Queue whenever you need to pass data between separate processes.

Passing Data Between Processes with Queue

Same-process usage mirrors the standard library queue API: put enqueues, get dequeues.

# Tested on Python 3.11
from multiprocessing import Queue


def main():
    colors = ["red", "green", "blue", "black"]
    cnt = 1
    queue = Queue()
    print("pushing items to queue:")
    for color in colors:
        print("item no: ", cnt, " ", color)
        queue.put(color)
        cnt += 1

    print("\npopping items from queue:")
    cnt = 0
    while not queue.empty():
        print("item no: ", cnt, " ", queue.get())
        cnt += 1


if __name__ == "__main__":
    main()
pushing items to queue:
item no:  1   red
item no:  2   green
item no:  3   blue
item no:  4   black

popping items from queue:
item no:  0   red
item no:  1   green
item no:  2   blue
item no:  3   black

queue.empty() is unreliable when multiple processes share the same queue. The Python documentation states that the result is not reliable under multiprocessing semantics because another process may add or remove an item between the check and the next call. Use the sentinel pattern in the next section for any real producer-consumer implementation.

Producer-Consumer Pattern Using Queue

Use this pattern when a producer generates work at its own pace and one or more consumers should process items as they arrive, without either side polling or busy-waiting. The producer signals completion by enqueuing a sentinel value, None in this case, that the consumer recognizes as a stop signal. If you have multiple consumers, enqueue one sentinel per consumer so each worker gets its own stop signal and does not exit early.

# Tested on Python 3.11
import time
from multiprocessing import Process, Queue


def producer(q, items):
    for it in items:
        q.put(it)
        time.sleep(0.05)
    q.put(None)


def consumer(q):
    while True:
        item = q.get()
        if item is None:
            break
        print("consumed:", item)


if __name__ == "__main__":
    q = Queue()
    data = ["alpha", "beta", "gamma"]
    p = Process(target=producer, args=(q, data))
    c = Process(target=consumer, args=(q,))
    p.start()
    c.start()
    p.join()
    c.join()
    print("done")
consumed: alpha
consumed: beta
consumed: gamma
done

Python Multiprocessing Lock Example

A Lock lets you mark a block of code as a critical section so only one process runs it at a time. Without one, two processes that read, modify, and write a shared value can interleave those steps and corrupt the result.

Preventing Race Conditions with Lock

Without synchronization, two processes that read/modify/write a shared Value from Python lose updates because the operations are not atomic across processes.

# Tested on Python 3.11
from multiprocessing import Process, Value


def unsafe_increment(counter, n):
    for _ in range(n):
        counter.value += 1


if __name__ == "__main__":
    n = 200_000
    c = Value("i", 0)
    p1 = Process(target=unsafe_increment, args=(c, n))
    p2 = Process(target=unsafe_increment, args=(c, n))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    print("final without extra sync:", c.value)
    print("expected:", 2 * n)
final without extra sync: 203240
expected: 400000

The exact value printed for final without extra sync is non-deterministic and changes between runs. On some hardware or low-contention configurations, the value may equal expected, meaning no race was observed during that run. Execute the script multiple times to see the inconsistency.

Using Lock with Shared Counters

Acquire the lock around the increment so the read/modify/write happens as one critical section.

# Tested on Python 3.11
from multiprocessing import Process, Value, Lock


def safe_increment(counter, lock, n):
    for _ in range(n):
        with lock:
            counter.value += 1


if __name__ == "__main__":
    n = 200_000
    c = Value("i", 0)
    lock = Lock()
    p1 = Process(target=safe_increment, args=(c, lock, n))
    p2 = Process(target=safe_increment, args=(c, lock, n))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    print("final with Lock:", c.value)
    print("expected:", 2 * n)
final with Lock: 400000
expected: 400000

The example above passes a Lock to manually created Process objects. With Pool, you cannot pass a Lock directly as a task argument because Pool pickles arguments on every call and locks are not picklable that way. Use an initializer function to inject a shared Lock into each worker at start-up instead.

# Tested on Python 3.11
from multiprocessing import Pool, Lock, Value

lock = None
counter = None


def init_worker(shared_lock, shared_counter):
    global lock, counter
    lock = shared_lock
    counter = shared_counter


def increment_with_pool(_):
    with lock:
        counter.value += 1


if __name__ == "__main__":
    shared_lock = Lock()
    shared_counter = Value("i", 0)
    n = 1000
    with Pool(
        processes=4,
        initializer=init_worker,
        initargs=(shared_lock, shared_counter),
    ) as pool:
        pool.map(increment_with_pool, range(n))
    print("final:", shared_counter.value)
    print("expected:", n)
final: 1000
expected: 1000

Shared State Between Processes

Processes start with copied memory snapshots under fork, or clean imports under spawn. Either way, arbitrary Python objects are not shared across process boundaries by default. To share state, you must explicitly opt into one of the primitives covered in this section.

Using multiprocessing.Value and multiprocessing.Array

Value wraps a single ctypes scalar in shared memory; Array stores a fixed-length buffer of the same type. The type code ("i" for signed int, "d" for double) follows the ctypes convention. In the example below, the child writes into index 1 of a three-element double array. The Value is intentionally left untouched by the child to show that it retains the parent’s initial value across the process boundary.

# Tested on Python 3.11
from multiprocessing import Process, Value, Array


def modify_shared(n, arr, idx):
    arr[idx] = n


if __name__ == "__main__":
    num = Value("i", 42)
    buf = Array("d", [0.0, 0.0, 0.0])
    p = Process(target=modify_shared, args=(3.14, buf, 1))
    p.start()
    p.join()
    print("Value:", num.value)
    print("Array:", list(buf))
Value: 42
Array: [0.0, 3.14, 0.0]

Using multiprocessing.Manager for Complex Shared Objects

Manager runs a dedicated server process and returns proxies that worker processes access over an internal connection. Every read and write goes through that server, which makes Manager significantly slower than Value or Array for tight loops. Use it when you need a shared dict, list, or other rich object and the access frequency is low enough that the round-trip cost does not dominate.

# Tested on Python 3.11
from multiprocessing import Process, Manager


def f(shared_dict, shared_list, n):
    shared_dict[n] = n * n
    shared_list.append(n)


if __name__ == "__main__":
    with Manager() as manager:
        d = manager.dict()
        l = manager.list()
        p1 = Process(target=f, args=(d, l, 2))
        p2 = Process(target=f, args=(d, l, 3))
        p1.start()
        p2.start()
        p1.join()
        p2.join()
        print("shared dict:", dict(d))
        print("shared list:", list(l))
shared dict: {3: 9, 2: 4}
shared list: [3, 2]

List ordering and dict key ordering in the output above reflect one possible completion sequence. Because two processes append and write concurrently, values appear in whichever order workers finish. Your output may differ between runs.

Inter-Process Communication with Pipe

Use Pipe when exactly two processes need to talk directly and the overhead of a Queue server process is not worth it. Pipe(duplex=True) gives each end a send() and recv(), making it a lightweight point-to-point channel.

# Tested on Python 3.11
from multiprocessing import Process, Pipe


def child(conn):
    conn.send({"pid": "child", "msg": "ping"})
    print("child received:", conn.recv())
    conn.close()


if __name__ == "__main__":
    parent_conn, child_conn = Pipe()
    p = Process(target=child, args=(child_conn,))
    p.start()
    print("parent received:", parent_conn.recv())
    parent_conn.send({"pid": "parent", "msg": "pong"})
    p.join()
parent received: {'pid': 'child', 'msg': 'ping'}
child received: {'pid': 'parent', 'msg': 'pong'}

Process Start Methods: spawn, fork, and forkserver

The start method controls how Python creates a new interpreter for each child process. spawn starts a clean interpreter and re-imports the module from scratch. fork copies the parent’s memory at the OS level. forkserver uses a pre-imported server process to fork workers on demand. Which method your code uses by default depends on the operating system.

Start method Default on Behavior When to use
spawn Windows; macOS with recent CPython Starts a clean interpreter, re-imports modules, pickles arguments Maximum safety across platforms, slower cold start
fork Many Linux installs when available Child inherits parent memory at the syscall Fast fork of large parents; inherited state can surprise you
forkserver Opt-in on POSIX Starts one server process that pre-imports the module, then forks workers from that server on demand Middle ground when fork semantics are acceptable but import cost hurts

Default Start Methods by Operating System

CPython picks defaults per OS: Windows always uses spawn. macOS moved to spawn as the default for recent releases to avoid unsafe inherited state after fork. Many Linux distributions still default to fork when the platform supports it, but consult multiprocessing.get_start_method() on your build.

How to Set the Start Method Explicitly

Call multiprocessing.set_start_method() once near program entry, before you create Process or Pool objects, and keep that call inside if __name__ == "__main__":. If the interpreter already locked a default (common on Windows and macOS under spawn), the call raises RuntimeError; catch it when you paste the pattern into exploratory sessions. For a dedicated spawn pool without touching the global default, use get_context.

# Tested on Python 3.11
import multiprocessing as mp

if __name__ == "__main__":
    try:
        mp.set_start_method("spawn")
    except RuntimeError:
        pass
    print("global start method:", mp.get_start_method())
global start method: spawn
# Tested on Python 3.11
import multiprocessing as mp

if __name__ == "__main__":
    ctx = mp.get_context("spawn")
    with ctx.Pool(2) as pool:
        print("mapped:", pool.map(abs, [-1, -2, -3]))
    print("context start method:", ctx.get_start_method())
mapped: [1, 2, 3]
context start method: spawn

On Windows and macOS defaults, child processes import main modules from disk. Guard module-level side effects with if __name__ == "__main__": or you risk recursive process creation and confusing pickling errors when the start method is spawn.

When spawn re-imports the module to boot the child, any code outside if __name__ == "__main__": runs again in the child. On Windows and macOS this causes infinite process creation, and you see the error below before the OS kills the runaway.

# Tested on Python 3.11
# broken pattern: no main guard, triggers recursive spawn on Windows/macOS
from multiprocessing import Process


def worker():
    print("working")


p = Process(target=worker)
p.start()
p.join()
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.
        ...
        if __name__ == '__main__':
            freeze_support()
            ...

The fix is always the same: wrap the launch code.

# Tested on Python 3.11
# corrected pattern
from multiprocessing import Process


def worker():
    print("working")


if __name__ == "__main__":
    p = Process(target=worker)
    p.start()
    p.join()
working

A second common failure is passing a lambda or a locally defined function under spawn. Because spawn pickles the target by name and re-imports it, the child cannot locate anything that is not defined at module top level.

# Tested on Python 3.11
# broken pattern: lambda is not picklable under spawn
from multiprocessing import Pool

if __name__ == "__main__":
    with Pool(2) as pool:
        pool.map(lambda x: x * 2, range(4))
AttributeError: Can't pickle local object '<lambda>'

The exact error message varies by Python version and platform. On some builds you see _pickle.PicklingError instead of AttributeError. Both indicate the same root cause: the target function is not importable by name in the child interpreter.

Move the function to module level to fix it.

Handling Exceptions Raised Inside Workers

Exceptions in worker processes do not automatically surface in the parent. The behavior differs between Pool.map and Pool.apply_async, and missing this distinction causes silent data loss in production.

Pool.map re-raises the worker exception in the parent when you call it, so wrapping the call in try/except works as expected.

# Tested on Python 3.11
from multiprocessing import Pool


def risky(x):
    if x == 2:
        raise ValueError("bad input: 2")
    return x * 10


if __name__ == "__main__":
    with Pool(2) as pool:
        try:
            results = pool.map(risky, range(4))
        except ValueError as e:
            print("caught:", e)
caught: bad input: 2

Pool.apply_async is different: exceptions are stored on the AsyncResult object and only raised when you call .get(). If you never call .get(), the exception disappears silently.

# Tested on Python 3.11
# Uses risky() defined in the example above
from multiprocessing import Pool

if __name__ == "__main__":
    with Pool(2) as pool:
        results = [pool.apply_async(risky, (i,)) for i in range(4)]
        for r in results:
            try:
                print(r.get())
            except ValueError as e:
                print("caught:", e)
0
10
caught: bad input: 2
30

Always call .get() on every AsyncResult before the pool closes. Uncollected results silently discard worker exceptions.

Performance Benchmarks: Serial vs. Multiprocessing

The benchmark script below measures wall-clock time for a CPU-bound loop run serially and across multiple pool sizes, so you can see exactly where gains flatten on your hardware.

Benchmarking a CPU-Bound Task Across Process Counts

The benchmark below answers one question: at what pool size does the overhead of spawning workers stop paying off? It divides a fixed sum-of-squares computation across one, two, four, and eight processes and prints elapsed seconds for each.

# Tested on Python 3.11
import time
from multiprocessing import Pool


def sum_squares_chunk(args):
    start, end = args
    return sum(i * i for i in range(start, end))


def work_ranges(n_workers, total):
    chunk = total // n_workers
    ranges = []
    s = 0
    for w in range(n_workers):
        e = s + chunk if w < n_workers - 1 else total
        ranges.append((s, e))
        s = e
    return ranges


def main():
    total = 5_000_000

    t0 = time.perf_counter()
    sum(i * i for i in range(total))
    t_serial = time.perf_counter() - t0
    print(f"serial\t1\t{t_serial:.4f}")

    for n in (2, 4, 8):
        t0 = time.perf_counter()
        with Pool(n) as pool:
            parts = pool.map(sum_squares_chunk, work_ranges(n, total))
            sum(parts)
        t_pool = time.perf_counter() - t0
        print(f"pool\t{n}\t{t_pool:.4f}")


if __name__ == "__main__":
    main()
serial 1 0.1722
pool 2 0.1394
pool 4 0.0855
pool 8 0.0830

Absolute seconds change with CPU model, governor settings, and background load; the comparison remains valid on any Python 3.8+ install because the workload is deterministic.

Stop raising worker counts when timing flattens or memory pressure grows; oversized pools thrash caches and spend time scheduling.

Python Multiprocessing vs. Threading vs. asyncio

The right concurrency model depends on what your code is waiting on. Use the table below to decide, then read the guidance underneath it.

Comparison Table

Criterion multiprocessing threading asyncio
Parallelism type Multi-process Multi-thread (same interpreter) Single-thread cooperative
GIL impact Separate GIL per process One GIL per process One GIL for event loop
Best for CPU-bound Python without C extensions releasing the GIL Blocking I/O that can wait in C or releases GIL Many network or disk waits with async-capable libraries
Memory model Isolated by default; explicit IPC Shared interpreter memory Shared interpreter memory
Overhead Higher: process start, pickling IPC Lower: threads are lighter Lowest when an async stack covers the full I/O path end-to-end
Typical use cases Numeric transforms, transcoding shells Legacy blocking libraries Web scrapers, network gateways

Choosing the Right Concurrency Model

Profile first: hot loops in Python point to processes, long waits on sockets or disks point to asyncio, and mixed workloads may split stages. Prefer threading when libraries are already thread-safe and spend most time outside Python bytecode. For additional background, revisit the Python threading tutorial and the standard library asyncio documentation.

Common Errors and How to Fix Them

Structured debugging pays off because many failures are pickling or lifecycle mistakes; cross-check stack traces with How To Debug Python Errors.

“Can’t pickle” Errors

Problem: Pool or Process raises _pickle.PicklingError: Can't pickle ... when spawning workers.

Cause: The start method serializes targets and arguments; lambdas, nested functions, instance methods without care, or non-importable contexts break pickling.

Fix: Move workers to top-level functions in importable modules, pass picklable data only, and keep launch code inside if __name__ == "__main__":.

# Tested on Python 3.11
# broken pattern (nested functions confuse pickle under spawn):

# def main():
#     def work(x):
#         return x + 1
#     with Pool(2) as pool:
#         pool.map(work, range(3))

# corrected pattern:
from multiprocessing import Pool


def work(x):
    return x + 1


def main():
    with Pool(2) as pool:
        print(pool.map(work, range(3)))


if __name__ == "__main__":
    main()
[1, 2, 3]

Deadlocks and How to Avoid Them

Problem: Processes hang forever with idle CPUs.

Cause: Typical patterns include a parent blocking on join() while a child waits on data from the parent, two locks taken in opposite order, or draining a Queue while the producer never finishes.

Fix: Order lock acquisition consistently, bound queues with timeouts during debugging, and avoid calling join on the same process from multiple threads without external coordination.

The example below reproduces one common pattern: a parent joins a child while the child is blocked trying to put into a full queue that the parent never drains.

# Tested on Python 3.11
# broken pattern: parent joins before draining the queue
import multiprocessing


def fill_queue(q, n):
    for i in range(n):
        q.put(i)  # blocks once the queue's internal pipe buffer fills


if __name__ == "__main__":
    # maxsize=5 so the pipe buffer fills quickly
    q = multiprocessing.Queue(maxsize=5)
    p = multiprocessing.Process(target=fill_queue, args=(q, 100))
    p.start()
    p.join()          # deadlock: parent waits for child to finish,
                      # child waits for parent to read from the queue
    while not q.empty():
        print(q.get())
# This script hangs. Kill it with Ctrl+C.

Fix the ordering: drain the queue before joining, or use a larger buffer, or switch to Pool which manages this internally.

# Tested on Python 3.11
# corrected pattern: drain before join
import multiprocessing


def fill_queue(q, n):
    for i in range(n):
        q.put(i)


if __name__ == "__main__":
    q = multiprocessing.Queue(maxsize=5)
    p = multiprocessing.Process(target=fill_queue, args=(q, 10))
    p.start()
    results = [q.get() for _ in range(10)]  # drain while child runs
    p.join()
    print(results)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Zombie Processes and Proper Cleanup

Problem: Defunct child entries linger in ps output or handles leak on Windows.

Cause: Parents never join() children, or they terminate abruptly without closing pools.

Fix: Always join managed Process objects, use context managers with Pool (with Pool(...) as pool:), and shut down managers explicitly when tests create many short-lived pools.

The example below shows a process that becomes a zombie because the parent never joins it, followed by the correct pattern.

# Tested on Python 3.11
# broken pattern: no join, child becomes zombie on POSIX
import time
from multiprocessing import Process


def short_task():
    time.sleep(0.1)


if __name__ == "__main__":
    p = Process(target=short_task)
    p.start()
    # parent exits without joining; child entry lingers in process table
    print("parent done, child exitcode:", p.exitcode)
parent done, child exitcode: None

exitcode of None means the child has not been collected. On POSIX systems the entry remains in the process table until the parent calls join or exits completely.

# Tested on Python 3.11
# corrected pattern: always join before parent exits
import time
from multiprocessing import Process


def short_task():
    time.sleep(0.1)


if __name__ == "__main__":
    p = Process(target=short_task)
    p.start()
    p.join()
    print("child exitcode after join:", p.exitcode)
child exitcode after join: 0

Frequently Asked Questions

Q: What is the difference between multiprocessing.Process and multiprocessing.Pool in Python?

A: Process gives you exact control over the lifecycle of a single worker, which fits custom supervision, explicit signals, or long-running services. Pool recycles a fixed number of interpreter processes and schedules many short tasks across them, which removes boilerplate for data-parallel maps. Pick Process when you need bespoke orchestration; pick Pool when you batch independent jobs of the same shape.

Q: How do I share data between processes in Python multiprocessing?

A: Use Queue or Pipe for message passing, Value and Array for compact numeric buffers, and Manager for richer shared dicts and lists served by a dedicated process. Each option trades latency for flexibility: queues minimize coupling, Manager adds RPC-like overhead, and sharedctypes keeps memory tight. Pick one pattern per resource so ownership stays obvious.

Q: Why does Python multiprocessing behave differently on Windows and macOS compared to Linux?

A: Defaults follow platform capabilities: Windows relies on spawn, recent macOS releases default to spawn for safety after fork, while many Linux systems still prefer fork. spawn re-imports your module in the child, which makes the if __name__ == "__main__": guard mandatory and changes which objects remain picklable. Linux fork inherits file descriptors and interpreter state, which speeds start-up but can surprise code that assumed a clean interpreter.

Q: What does “can’t pickle” mean in Python multiprocessing and how do I fix it?

A: The error means the start method could not serialize part of your task for the child interpreter. Common culprits are lambdas, nested functions, open sockets, or closures that capture unpicklable locals. Move the target to a top-level function in an importable module, replace lambdas with named functions, and trim closure captures down to plain data.

Q: How does Pool.starmap() differ from Pool.map() in Python?

A: map calls func(x) once per element x of the iterable. starmap unpacks each element and calls func(*x), which lets you pass tuples that expand into multiple parameters without wrapping arguments in another object. Use starmap when each task naturally carries several positional arguments.

Q: Is Python multiprocessing faster than threading for all tasks?

A: No. CPU-bound Python that stays under the GIL favors multiprocessing because separate interpreters truly run in parallel. I/O-bound workloads often run faster on asyncio or threads because skipping process creation avoids pickling overhead and extra RAM. Measure wall-clock time and CPU utilization instead of assuming one model always wins.

Q: How many processes should I use in a multiprocessing Pool?

A: Start near os.cpu_count() for CPU-bound tasks, then sweep a few values up and down while watching timing and resident memory. Oversized pools oversubscribe CPU and may page; undersized pools leave hardware idle. Memory-heavy tasks may require fewer workers than the core count to remain stable.

Q: What happens if I do not call join() after starting a process in Python multiprocessing?

A: The parent can finish while children still run, which risks zombies on POSIX and abandoned handles on Windows. Even when parents outlive children, skipping join makes exit codes and exceptions harder to observe. Call join after start (or use Pool context managers) so cleanup runs deterministically.

Conclusion

Python’s multiprocessing module sidesteps the GIL by running each worker in a separate interpreter, which gives CPU-bound Python code access to every core on the machine. The right primitive depends on the shape of your work: use Process when you need direct lifecycle control over individual workers, Pool when you have many independent tasks of the same shape, Queue or Pipe when workers need to exchange data, and Value, Array, or Manager when they need to share state.

A few rules hold across every pattern. Always wrap process-spawning code in if __name__ == "__main__":, particularly on Windows and macOS where the default start method is spawn. Always call join() after start() to avoid zombie processes. Always call .get() on every AsyncResult before the pool closes so worker exceptions are not silently discarded.

For next steps, read the Python threading tutorial to understand when threads outperform processes for I/O-bound workloads, and the official multiprocessing documentation for the full API reference.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

Anish Singh Walia
Anish Singh Walia
Author
Sr Technical Writer and Team Lead
See author profile

I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer(Team Lead) @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix

Pankaj Kumar
Pankaj Kumar
Author
See author profile

Java and Python Developer for 20+ years, Open Source Enthusiast, Founder of https://www.askpython.com/, https://www.linuxfordevices.com/, and JournalDev.com (acquired by DigitalOcean). Passionate about writing technical articles and sharing knowledge with others. Love Java, Python, Unix and related technologies. Follow my X @PankajWebDev

Vinayak Baranwal
Vinayak Baranwal
Editor
Technical Writer II
See author profile

Building future-ready infrastructure with Linux, Cloud, and DevOps. Full Stack Developer & System Administrator. Technical Writer @ DigitalOcean | GitHub Contributor | Passionate about Docker, PostgreSQL, and Open Source | Exploring NLP & AI-TensorFlow | Nailed over 50+ deployments across production environments.

Category:
Tags:
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.

Still looking for an answer?

Was this helpful?

Hi You wrote “Without join() function call, process will remain idle and won’t terminate”. That is not correct., the process will terminate with or without join(). The difference is: -With join(): the program will wait for every process to finish -Without join(): each process will run without waiting for other processes to finish’ Thks

- namta

Why does the start() must be called within main function ?

- xram

In the do_job function ,if there other method to replace time.sleep? For some function ,you may need not to know the possible time to sleep.

- RUI ZHANG

Thank you but I think there isn’t enough explanation.

- Emre ATAKLI

How to Kill the Proc? is Proc.terminate() works best

- Senthilkumar Rajendran

Great article man ! Simple and very informative. Even official python documentation for multiprocessing is not this simple and easy to read.

- Ashish Dhiman

Very easy to understand and informative! Thanks a lot man!

- Kenny

Thanks, your two way communication example helped me finally close in on my bug. One that your code will have as well if number or size of messages in the queue fills the buffer used by it. ‘.get()’ needs to be called on the Queue being used to send messages back from the sub-process (child) to the main process (parent) before the ‘.join()’ call that blocks main until the sub-process has completed. What is not seen from glancing at the Python code is that underneath, the Queue depends on a limited size buffer, even if no maximum count is set. Because of that, if the get() and join() are out of order a deadlock can happen where the sub-process cannot exit because it cannot finish putting things in the queue, and the main process cannot continue to where it removes things from the queue because it is waiting on the sub-process to exit. “…you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined.” - https://docs.python.org/3.8/library/multiprocessing.html#programming-guidelines

- Kevin Whalen

Thanks was helpful as always. The simple explanation is the strong point.

- Sukesh Suvarna

Please help me to resolve this issue https://stackoverflow.com/q/62237516/13193575

- Lovely

Creative CommonsThis work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.