`await` in Python: how asynchronous code really pauses (and keeps moving)

Most performance problems I see in Python services aren’t “Python is slow” problems—they’re “we’re waiting on the network” problems. You call an HTTP API, the database, an object store, a message broker, and suddenly your request handler spends most of its life doing nothing while sockets and kernels do the real work. The await keyword is Python’s way of turning that waiting time into something useful: while one operation is waiting for I/O, your program can move on and make progress on other tasks.

I’ve also watched teams misuse await and accidentally make systems slower (blocking the event loop with CPU work, mixing sync libraries into async paths, or awaiting the wrong thing). The good news: once you understand what await truly means—"pause this coroutine here, let the event loop run something else"—the rest becomes a set of practical patterns.

You’ll leave with a mental model for how await interacts with the event loop, what you’re allowed to await, how concurrency differs from parallelism, and how to write async code that behaves well under load.

What await actually does

await is not “run this later.” It’s not a thread, and it’s not magic parallelism. I think of it like pulling into a turnout on a one-lane road: your coroutine temporarily steps aside at a point where it cannot proceed (because it’s waiting for I/O), and the event loop uses that time to let other coroutines drive forward.

The syntax is simple:

  • await

But the expression has a strict requirement: it must be awaitable (more on that soon). When Python executes await some_awaitable, it:

  • Suspends the current coroutine at that line.
  • Hands control back to the event loop.
  • Registers interest in “resume me when the awaited operation finishes.”
  • Later, resumes the coroutine and produces the awaited result (or raises an exception).

A tiny example that shows the “pause here, but don’t block everything” behavior:

import asyncio

async def greeting():

print("Hello")

await asyncio.sleep(1) # yields control; the loop can run other tasks

print("World")

asyncio.run(greeting())

Typical output:

Hello

World

asyncio.sleep(1) doesn’t block the process for one second. It schedules a wake-up and yields. That difference—yielding to the event loop instead of blocking the OS thread—is the core promise of await.

If you remember only one rule, make it this: await pauses one coroutine, not the whole program.

Awaitables: coroutines, Tasks, Futures, and “async-aware” objects

When someone tells me “await is broken,” the bug is often simpler: they tried to await something that isn’t awaitable, or they forgot to await something that is.

Here’s what you can await in modern Python:

  • Coroutine objects (created by calling an async def function)
  • asyncio.Task instances (scheduled coroutines)
  • asyncio.Future instances (a lower-level promise-like object)
  • Objects that implement the await protocol (they define await)

A quick way to tell if you’ve got a coroutine: calling an async function doesn’t execute it immediately—it returns a coroutine object.

import asyncio

async def fetchprofile(userid: int) -> dict:

await asyncio.sleep(0.1)

return {"userid": userid, "plan": "pro"}

async def main():

coro = fetch_profile(42) # coroutine object; nothing has run yet

profile = await coro # now it runs (driven by the event loop)

print(profile)

asyncio.run(main())

In code review, I look for these two failure modes:

  • Forgotten await (you return a coroutine instead of its result)
  • Double scheduling (you wrap a coroutine in a task and also await it in a confusing way)

Coroutines vs Tasks (the practical difference)

A coroutine is like an unstarted recipe card. A Task is like putting that recipe on the stove: it’s been handed to the event loop to run.

If you want concurrency, you typically create tasks and then await their completion.

import asyncio

async def loadcustomer(customerid: int) -> dict:

await asyncio.sleep(0.2)

return {"customerid": customerid}

async def loadorders(customerid: int) -> list[dict]:

await asyncio.sleep(0.3)

return [{"orderid": "A100"}, {"orderid": "A101"}]

async def main():

customertask = asyncio.createtask(load_customer(7))

orderstask = asyncio.createtask(load_orders(7))

# Both tasks can run while we await their results.

customer = await customer_task

orders = await orders_task

print(customer, orders)

asyncio.run(main())

I recommend: use create_task() when you intentionally want overlap; otherwise just await coroutines directly for sequential flow.

Futures: mostly for integration points

In 2026, most app-level Python code shouldn’t create raw Future objects by hand. I still see Futures when integrating with callback-style libraries, low-level protocols, or custom event-loop plumbing. If you’re writing web services, your daily tools are coroutines and Tasks.

The event loop: why await feels like yielding

The event loop is the coordinator that decides what runs next. It watches file descriptors (sockets), timers, subprocess pipes, and scheduled callbacks. When an awaited I/O operation can’t proceed, the coroutine yields, and the loop runs another ready task.

This is why async code can scale so well for I/O-heavy workloads: you can keep thousands of operations “in flight” without needing thousands of OS threads.

Concurrency vs parallelism (a table I use with teams)

Goal

Traditional sync approach

Modern async approach

What you actually get

Handle many network waits

Threads / thread pool

asyncio + await

Concurrency with low overhead

Use multiple CPU cores

Multiprocessing

Multiprocessing / native extensions

Parallelism

Mix CPU work with I/O

Thread pool executor

asyncio.to_thread() or process pool

Concurrency + limited parallelismIf your job is mostly “wait for remote systems,” await shines. If your job is “compute a million hashes,” await won’t help; you need parallelism or faster algorithms.

What happens when you await

When I’m teaching this, I draw three states:

  • Running: the coroutine is executing Python code.
  • Suspended: it hit an await and yielded to the loop.
  • Ready: its awaited thing finished; the loop can resume it.

The key is that await only helps if the awaited operation is truly non-blocking. If you call a blocking function inside an async def—like time.sleep(1) or a sync HTTP client—you stop the whole event loop and lose the main benefit of async.

Patterns you’ll use weekly: sequential, gather, TaskGroup, pipelines

Most real async programs are built from a few repeatable shapes. I’m opinionated here because consistent patterns keep teams out of trouble.

1) Sequential awaits (clear, safe)

Use this when operation B depends on operation A.

import asyncio

async def readfeatureflags(account_id: str) -> dict:

await asyncio.sleep(0.2)

return {"beta_dashboard": True}

async def computedashboard(accountid: str, flags: dict) -> dict:

await asyncio.sleep(0.2)

return {"accountid": accountid, "beta": flags["beta_dashboard"]}

async def main():

flags = await readfeatureflags("acct_912")

dashboard = await computedashboard("acct912", flags)

print(dashboard)

asyncio.run(main())

Sequential await reads like normal code. I recommend starting here until you have a clear reason to overlap work.

2) Concurrent fan-out with asyncio.gather

Use this when you can start multiple independent operations at once.

import asyncio

async def fetch_weather(city: str) -> str:

await asyncio.sleep(1)

return f"weather:{city}:sunny"

async def fetch_traffic(city: str) -> str:

await asyncio.sleep(2)

return f"traffic:{city}:light"

async def main():

results = await asyncio.gather(

fetch_weather("Seattle"),

fetch_traffic("Seattle"),

)

print(results)

asyncio.run(main())

In this shape, both requests start “together,” and you await the combined result.

One subtle point: gather()’s error behavior can surprise people. By default, if one awaitable raises, gather() raises too, and other awaitables may be cancelled depending on timing. For “best effort” fan-out (collect what you can), pass return_exceptions=True and handle results carefully.

3) Structured concurrency with asyncio.TaskGroup

If you’re on Python 3.11+ (very common in 2026), TaskGroup is my preferred primitive for groups of related tasks. It gives you clearer lifecycle rules and makes cancellation behavior less mysterious.

import asyncio

async def fetch_pricing(sku: str) -> dict:

await asyncio.sleep(0.3)

return {"sku": sku, "price": 19.99}

async def fetch_inventory(sku: str) -> dict:

await asyncio.sleep(0.2)

return {"sku": sku, "available": 12}

async def main():

results: dict[str, dict] = {}

async with asyncio.TaskGroup() as tg:

pricingtask = tg.createtask(fetch_pricing("SKU-483"))

inventorytask = tg.createtask(fetch_inventory("SKU-483"))

# Exiting the TaskGroup means tasks finished (or errors were handled/raised).

results["pricing"] = pricing_task.result()

results["inventory"] = inventory_task.result()

print(results)

asyncio.run(main())

If you’re building request/response services, TaskGroup maps well to “for this request, start these child operations, and clean them up reliably if the request is cancelled.”

4) Pipelines and backpressure

The moment you move from “two calls at once” to “thousands of items,” you need limits. Otherwise you create 50,000 tasks and wonder why memory spikes.

A common pattern is a semaphore-limited worker function:

import asyncio

async def enrichrecord(recordid: int, limit: asyncio.Semaphore) -> dict:

async with limit:

await asyncio.sleep(0.05) # pretend this is an HTTP call

return {"recordid": recordid, "status": "enriched"}

async def main():

limit = asyncio.Semaphore(50) # cap concurrency

tasks = [enrichrecord(i, limit) for i in range(1000)]

results = await asyncio.gather(*tasks)

print(len(results))

asyncio.run(main())

This keeps your system stable under load. I recommend choosing a concurrency cap based on downstream limits (DB connection pool size, API rate limits) rather than guessing.

Real-world I/O with await: HTTP, databases, files, and “don’t block the loop”

The most valuable async code is boring: it waits on I/O the right way, consistently.

HTTP calls: pick an async-native client

If you do HTTP inside async def, use a client designed for it (commonly httpx in async mode or aiohttp). The important bit is that the library must provide awaitable operations.

Even if you don’t memorize client APIs, keep the shape consistent:

  • create a client/session
  • await requests
  • close client/session (often via async context manager)

The detail that matters: connection pooling and timeouts. In production services, I always set timeouts explicitly so “await forever” doesn’t become your failure mode.

Databases: async driver + pool sizing

Async DB drivers (like asyncpg for Postgres) are great when requests spend time waiting on the database. But you still need to respect pool sizes. A service that can run 1,000 coroutines concurrently doesn’t magically get 1,000 database connections.

I recommend:

  • set a pool size that matches your DB capacity
  • gate concurrency at the application layer (semaphore, bounded worker pool)
  • measure query latency and connection wait time separately

Files: know what’s truly async

Local disk access is tricky. On many platforms, file I/O isn’t genuinely non-blocking in the same way sockets are. If you’re doing heavy file reads/writes in an async server, I often push that work into a thread:

import asyncio

from pathlib import Path

def readtextsync(path: Path) -> str:

return path.read_text(encoding="utf-8")

async def readtextasync(path: Path) -> str:

# Runs the blocking file read in a worker thread.

return await asyncio.tothread(readtext_sync, path)

async def main():

text = await readtextasync(Path("./README.md"))

print(text[:80])

asyncio.run(main())

This is a pattern I use a lot: keep the async surface area, but move blocking calls off the event loop.

The “one blocking call” rule

If you’re building an async web service, assume this is true:

  • One accidental blocking call in a hot path can degrade tail latency dramatically.

Examples I flag immediately:

  • time.sleep() inside async def
  • synchronous HTTP clients
  • CPU-heavy parsing/compression/encryption inside the event loop (without offloading)

If you need CPU work, prefer asyncio.to_thread() for modest CPU tasks and a process pool for heavier compute.

Errors, cancellation, and timeouts: making async code survivable

Async code that “works on my laptop” often breaks in production because of error propagation and cancellation behavior. I treat these as first-class design concerns.

Exceptions propagate through await

If the awaited coroutine raises, await re-raises the exception at the await point. That’s good: it keeps normal try/except flow.

import asyncio

async def risky_lookup(key: str) -> str:

await asyncio.sleep(0.1)

raise RuntimeError(f"lookup failed for {key}")

async def main():

try:

value = await risky_lookup("account:missing")

print(value)

except RuntimeError as exc:

print(f"handled error: {exc}")

asyncio.run(main())

In my experience, the trickier part is when you launch multiple tasks: errors can surface later, far from where they were created. That’s why I prefer structured concurrency (TaskGroup) for request-scoped fan-out.

Cancellation is normal in servers

In web servers, cancellation happens all the time:

  • the client disconnects
  • a timeout triggers
  • the system is shutting down

In asyncio, cancellation is represented by asyncio.CancelledError. You usually don’t swallow it; you let it propagate so the runtime can stop work quickly.

A safe pattern:

  • catch CancelledError only to do cleanup
  • re-raise it afterwards

import asyncio

async def writeauditlog(event: str) -> None:

await asyncio.sleep(0.2)

async def handler():

try:

await asyncio.sleep(10) # pretend this is long I/O

except asyncio.CancelledError:

# cleanup that must happen

await writeauditlog("request_cancelled")

raise

Timeouts: prefer scoped timeouts

In modern asyncio, I recommend scoped timeouts so you can see exactly what’s covered.

import asyncio

async def slowremotecall():

await asyncio.sleep(5)

return "done"

async def main():

try:

async with asyncio.timeout(1.0):

result = await slowremotecall()

print(result)

except TimeoutError:

print("remote call timed out")

asyncio.run(main())

Time limits are part of your API contract. If you don’t set them, the default becomes “forever,” and you learn about it during an incident.

Common mistakes I see (and how to avoid them)

These are the issues I most frequently fix for teams adopting async.

Mistake 1: Forgetting to await

Symptom: you see in logs, or you get warnings about coroutines never awaited.

Fix: trace the call chain. Any async def you call must be awaited (directly or indirectly) unless you intentionally schedule it as a background task.

Mistake 2: Doing blocking work inside async def

Symptom: everything slows down under concurrency; one slow request makes others slower.

Fix: replace blocking libraries with async equivalents, or offload with asyncio.to_thread().

Mistake 3: Creating “fire-and-forget” tasks without supervision

Symptom: exceptions disappear, tasks leak, shutdown hangs.

Fix: keep a reference to tasks, add done callbacks or collect results, or use TaskGroup so lifecycle is explicit.

Mistake 4: Confusing concurrency with parallelism

Symptom: CPU-bound endpoints don’t improve after “adding await.”

Fix: use multiprocessing, native extensions, vectorized libraries, or offload compute to worker systems. Async is for waiting, not for making CPU faster.

Mistake 5: Unlimited fan-out

Symptom: memory spikes, downstream rate limits trip, DB connection pool thrashes.

Fix: add bounded concurrency (Semaphore), batch work, and respect pool sizes.

Mistake 6: Mixing event loops

Symptom: “event loop is closed” or “cannot call asyncio.run from a running event loop.”

Fix: in libraries, avoid asyncio.run(); let the application own the event loop. In notebooks and some frameworks, you’re already inside a running loop.

Mistake 7: Poor observability

Symptom: async code becomes hard to debug.

Fix: add structured logging with request/task identifiers, propagate context (contextvars), and instrument outbound calls. In 2026, I expect teams to have tracing across async boundaries and to catch “slow await points” in profiles.

What I want you to do next

When you adopt await, aim for one clear goal: keep the event loop free to make progress. Use async where you spend time waiting on networks and databases, keep CPU-heavy work out of the loop, and put hard boundaries around concurrency so load doesn’t turn into chaos.

If you’re converting an existing codebase, I recommend a staged approach:

  • Start with a single async entry point (one service endpoint, one worker pipeline).
  • Replace only the I/O libraries in that path with async-native equivalents.
  • Add timeouts everywhere you cross a network boundary.
  • Introduce bounded concurrency as soon as you fan out beyond a handful of calls.
  • Add tests that run under an async test runner and include cancellation/timeout cases.

Once you’re comfortable, await becomes less of a “feature” and more of a punctuation mark: it tells the runtime, “I’m waiting—go do something useful.” That’s the mindset that makes async code pleasant to work with and reliable at scale.

Scroll to Top