PostgreSQL + Python: Querying Data with psycopg2 in 2026

When a Python service starts to feel slow or “mysteriously inconsistent,” I almost always find the same culprit: data access that grew without a plan. A single SELECT that looked harmless in a prototype now runs dozens of times per request, and nobody remembers which queries are safe to cache or batch. If you’re building on PostgreSQL, this is the moment to get intentional about how you query.

In this post, I’ll show you how I query PostgreSQL from Python using psycopg2. You’ll see solid connection patterns, practical query shapes, and how to fetch data predictably. I’ll also cover common pitfalls, performance considerations, and when psycopg2 is the wrong choice. My goal is simple: you should walk away with a playbook you can use today, and a mental model that keeps your data access clean as your system grows.

Why psycopg2 still matters in 2026

I keep psycopg2 in my toolkit because it is dependable, explicit, and battle-tested. Newer async drivers and higher-level ORMs are great for some teams, but psycopg2 stays relevant for three reasons.

First, it’s stable. When you need predictable behavior under load, you want a driver that is boring in the best way. psycopg2 has long-running production history and a large ecosystem of docs, examples, and troubleshooting threads.

Second, it keeps you close to SQL. PostgreSQL is powerful, and SQL is expressive. If you work close to the wire, you can take advantage of features like JSONB, CTEs, window functions, and server-side cursors without waiting for an ORM to catch up.

Third, it plays nicely with modern workflows. In 2026, I still build developer experiences around virtual environments, lockfiles, and reproducible containers. I also use AI-assisted workflows for query review and migration planning, but that does not replace a clear, explicit Python layer. psycopg2 fits that style: you see every query, you understand every parameter, and debugging is straightforward.

Install and connect with intent

I prefer to keep database drivers explicit in my dependency files and to install a binary wheel for local development. For production, I usually build psycopg2 against the system libpq to keep security updates centralized. For a quick start, you can install the binary package:

pip install psycopg2-binary

If you are building a production image, I recommend the source package with build dependencies:

pip install psycopg2

Here’s a minimal connection helper. I keep it tiny and explicit so it’s easy to test and easy to switch out when I move to connection pooling.

import os

import psycopg2

def get_connection():

return psycopg2.connect(

dbname=os.getenv("PGDATABASE", "appdb"),

user=os.getenv("PGUSER", "appuser"),

password=os.getenv("PGPASSWORD", "secret"),

host=os.getenv("PGHOST", "127.0.0.1"),

port=int(os.getenv("PGPORT", "5432")),

)

Two practical notes:

  • Use environment variables so you can swap between local, staging, and production without editing code.
  • Keep credentials out of source control. If your team uses a secret manager, read them at startup and pass them in here.

If you prefer a context manager to ensure cleanup, wrap it like this:

from contextlib import contextmanager

import psycopg2

@contextmanager

def db_conn():

conn = psycopg2.connect(

dbname="appdb",

user="appuser",

password="secret",

host="127.0.0.1",

port=5432,

)

try:

yield conn

finally:

conn.close()

This pattern keeps connections short-lived and avoids leaks in scripts or CLI tasks.

Querying data: fetchall, fetchone, and fetchmany

The core loop is simple: connect, create a cursor, execute a query, fetch results. The choice of fetch method shapes performance and memory use, so I choose deliberately based on the size of the result set.

fetchall: great for small result sets

If you know the result set is small and bounded, fetchall is clean and readable.

from typing import List, Tuple

def list_students():

with db_conn() as conn:

with conn.cursor() as cur:

cur.execute("SELECT id, full_name, grade FROM students ORDER BY id")

rows: List[Tuple[int, str, str]] = cur.fetchall()

return rows

I use fetchall when I expect under a few thousand rows. In my experience, most simple admin screens or small reference lookups fall here.

fetchone: move through results incrementally

fetchone pulls one row at a time from the cursor. It is useful for queries where you only need the first row, or when you want to iterate without loading everything at once.

def getstudent(studentid: int):

with db_conn() as conn:

with conn.cursor() as cur:

cur.execute(

"SELECT id, full_name, grade FROM students WHERE id = %s",

(student_id,),

)

row = cur.fetchone()

return row

The cursor is stateful, so repeated fetchone calls keep moving forward. This is handy for large result sets or streaming a response.

fetchmany: control memory use

fetchmany is the sweet spot when you want batched processing. You tell psycopg2 how many rows per batch, and it yields chunks.

def exportstudents(batchsize: int = 500):

with db_conn() as conn:

with conn.cursor() as cur:

cur.execute("SELECT id, full_name, grade FROM students ORDER BY id")

while True:

batch = cur.fetchmany(batch_size)

if not batch:

break

for row in batch:

yield row

I use fetchmany for jobs that write to CSV, send results to another system, or build a cache. It keeps memory flat and avoids giant lists.

Parameterized queries and safety

If you take one thing from this section, make it this: always parameterize. It’s not just about security, it’s also about correct type handling and better query plans.

Here’s the wrong way:

cur.execute(f"SELECT id, fullname FROM students WHERE fullname = ‘{name}‘")

Here’s the right way:

cur.execute(

"SELECT id, fullname FROM students WHERE fullname = %s",

(name,),

)

Notice the tuple with a single element. That comma matters.

A few more practical patterns:

  • Lists and IN clauses: use ANY and pass a list.

cur.execute(

"SELECT id, full_name FROM students WHERE id = ANY(%s)",

([1, 3, 7],),

)

  • Date ranges: pass datetime objects directly.

from datetime import datetime, timedelta

start = datetime.utcnow() – timedelta(days=7)

end = datetime.utcnow()

cur.execute(

"SELECT id, createdat FROM enrollments WHERE createdat BETWEEN %s AND %s",

(start, end),

)

  • JSON fields: psycopg2 can adapt Python dicts when you use Json.

from psycopg2.extras import Json

cur.execute(

"SELECT id FROM events WHERE payload @> %s",

(Json({"type": "signup"}),),

)

These are the patterns I lean on in real systems, and they keep query code both safe and clear.

Real-world query patterns I rely on

Basic SELECT * queries get you started, but real systems need more. Here are patterns I use constantly, with reasoning and a sample for each.

Partial selects for smaller payloads

Always select only the columns you need. It improves response time, reduces network load, and keeps your code honest.

cur.execute(

"SELECT id, fullname FROM students WHERE grade = %s ORDER BY fullname",

("A",),

)

Pagination you can trust

Offset-based pagination is easy but can drift with inserts. For stable pagination, I use keyset pagination.

def liststudentspage(lastid: int, pagesize: int = 50):

with db_conn() as conn:

with conn.cursor() as cur:

cur.execute(

"""

SELECT id, full_name, grade

FROM students

WHERE id > %s

ORDER BY id

LIMIT %s

""",

(lastid, pagesize),

)

return cur.fetchall()

This stays consistent as new rows arrive. It also avoids the growing cost of large OFFSET values.

Aggregations for reporting

I prefer aggregation in SQL, not Python. It reduces round-trips and moves the heavy lifting to the database.

cur.execute(

"""

SELECT grade, COUNT(*) AS student_count

FROM students

GROUP BY grade

ORDER BY grade

"""

)

Use CTEs for clarity

When the logic is complex, I use a CTE to keep queries readable.

cur.execute(

"""

WITH recent_enrollments AS (

SELECT studentid, createdat

FROM enrollments

WHERE created_at >= NOW() – INTERVAL ‘30 days‘

)

SELECT s.id, s.fullname, r.createdat

FROM students s

JOIN recentenrollments r ON s.id = r.studentid

ORDER BY r.created_at DESC

"""

)

CTEs are not just for advanced SQL. They’re a clarity tool when a single query starts to feel crowded.

Performance habits that pay off

You don’t need to be a database guru to get good performance. I follow a small set of habits that consistently reduce latency and prevent surprises.

1) Use indexes that match your access patterns

If you filter on grade and created_at together, create a composite index in that order. PostgreSQL uses indexes best when the leading columns match the filter. I often measure a drop from 50–120ms down to 10–25ms after an index that matches real usage.

2) Keep result sets small

Even a modest JSON response can balloon under high traffic. I keep query payloads slim and paginate by default. This helps both the database and the API layer.

3) Avoid per-row queries

The most common performance issue I see is the N+1 query pattern. It looks like this: fetch a list, then run a query per row to fetch details. Instead, use a JOIN or an IN clause to fetch everything in one round-trip.

4) Reuse connections with pooling

Opening connections is expensive. In services, I use a pool rather than creating a new connection per request. psycopg2 offers psycopg2.pool as a starting point. Here is a small example using SimpleConnectionPool.

from psycopg2.pool import SimpleConnectionPool

pool = SimpleConnectionPool(

minconn=1,

maxconn=10,

dbname="appdb",

user="appuser",

password="secret",

host="127.0.0.1",

port=5432,

)

def getconnfrom_pool():

return pool.getconn()

def releaseconnto_pool(conn):

pool.putconn(conn)

def liststudentswith_pool():

conn = getconnfrom_pool()

try:

with conn.cursor() as cur:

cur.execute("SELECT id, full_name FROM students ORDER BY id")

return cur.fetchall()

finally:

releaseconnto_pool(conn)

Connection pooling keeps latency stable under load. If you run in a container platform, I also recommend a PgBouncer sidecar or a shared pooling layer.

5) Use server-side cursors for huge results

When you must fetch a large dataset, use a named cursor so the server streams results.

def stream_students():

with db_conn() as conn:

with conn.cursor(name="students_stream") as cur:

cur.itersize = 1000

cur.execute("SELECT id, full_name, grade FROM students ORDER BY id")

for row in cur:

yield row

Server-side cursors are a lifesaver for exports and batch jobs.

Common mistakes I see (and how to avoid them)

I’ve reviewed a lot of Python data layers, and the same mistakes show up repeatedly. Here are the ones you can fix quickly.

  • Missing commit: If you run inserts or updates without conn.commit(), nothing persists. I often wrap write operations in a context that commits on success and rolls back on error.
  • Swallowing exceptions: If you catch every error and return None, debugging becomes impossible. Log the exception and let it bubble when it should.
  • Hard-coded credentials: This is both a security issue and a deployment headache. Use environment variables or a secret manager.
  • Building SQL strings manually: This is a security risk and often breaks with quoting or type conversion. Always parameterize.
  • Fetching too much: SELECT * can be fine in scripts, but it is risky in services because schemas evolve. Select only what you need.
  • Forgetting to close cursors: Always use with conn.cursor() so cleanup is automatic.

If you fix these six, you’ll remove most of the fragility from your data layer.

When psycopg2 is the wrong tool

I like psycopg2, but I don’t force it everywhere. Here’s when I choose something else.

  • You need async I/O: If your service is asyncio-based, asyncpg or psycopg 3 async support is a better fit.
  • You need automatic schema mapping: If your team prefers working at the model level, an ORM like SQLAlchemy can improve velocity and reduce boilerplate.
  • You want tight integration with async web frameworks: Some frameworks have strong ecosystem support for async drivers and connection pools that work naturally in the event loop.

Here’s a quick comparison that I use when advising teams:

Approach

Traditional strength

Modern strength —

— psycopg2

Explicit SQL, simple control, stable behavior

Works well with pooling, easy observability, great for migrations and data jobs SQLAlchemy ORM

Rapid CRUD, schema mapping, relationships

Mature async support, strong tooling around migrations and models asyncpg

High throughput in async services

Low latency under concurrency, modern API surface

If you’re building a small service or a data job, psycopg2 is a good default. If you are building an async API or want heavy model mapping, pick the tool that matches that reality.

Practical, runnable example: a tiny reporting endpoint

Let me tie everything together with a small example. This script connects to PostgreSQL, queries recent enrollments, aggregates counts, and prints a mini report. It uses parameterized queries, a context manager, and fetchall.

import os

from datetime import datetime, timedelta

from contextlib import contextmanager

import psycopg2

@contextmanager

def db_conn():

conn = psycopg2.connect(

dbname=os.getenv("PGDATABASE", "appdb"),

user=os.getenv("PGUSER", "appuser"),

password=os.getenv("PGPASSWORD", "secret"),

host=os.getenv("PGHOST", "127.0.0.1"),

port=int(os.getenv("PGPORT", "5432")),

)

try:

yield conn

finally:

conn.close()

def weeklyenrollmentreport(days: int = 7):

end = datetime.utcnow()

start = end – timedelta(days=days)

with db_conn() as conn:

with conn.cursor() as cur:

cur.execute(

"""

SELECT s.grade, COUNT(*)

FROM enrollments e

JOIN students s ON s.id = e.student_id

WHERE e.created_at BETWEEN %s AND %s

GROUP BY s.grade

ORDER BY s.grade

""",

(start, end),

)

rows = cur.fetchall()

return rows

if name == "main":

results = weeklyenrollmentreport(7)

for grade, count in results:

print(f"Grade {grade}: {count} enrollments")

This is not glamorous, but it’s the kind of script that ships in real teams: simple, readable, and resilient.

Data access in modern workflows (2026 edition)

Querying is no longer just about writing SQL. I recommend a workflow that makes data access observable and reproducible.

  • CI checks for migrations and query linting. I often add a basic SQL linter and a migration consistency check to CI so schema drift is caught early.
  • Local environments that match production. Use containers or a dev database with realistic data volumes so you can observe query shape and performance.
  • AI-assisted review, but not blind trust. I use assistants to propose indexes, rewrite queries, and spot N+1 patterns, but I always validate with EXPLAIN and real metrics.
  • Structured logging around queries. Logging slow queries with timing and a sanitized query signature helps you find hotspots quickly.

These habits make your data layer durable, which matters more than any single library choice.

Edge cases you should plan for

Real systems hit odd corners. Here are the ones I plan around.

Null handling

If a column can be NULL, Python gets None. I handle that explicitly in code or with COALESCE in SQL. Don’t assume your schema tells the whole story.

Time zones

If you store timestamps without time zones, you will have a bad day. I store timestamptz and always pass datetime objects that are timezone-aware.

Large text fields

If you store large JSON or text blobs, avoid selecting them in list views. Only fetch them when you need the detail.

Long-running queries

If a query can take seconds, consider moving it to a background job or using materialized views. Your API should not wait on heavy analytics queries.

Transaction boundaries

If you are doing multiple writes that must succeed together, wrap them in a transaction and handle exceptions cleanly. psycopg2 defaults to transactions, which is good, but you should be explicit when you need a rollback.

def transfercredits(fromid: int, to_id: int, amount: int):

with db_conn() as conn:

try:

with conn.cursor() as cur:

cur.execute(

"UPDATE accounts SET credits = credits – %s WHERE id = %s",

(amount, from_id),

)

cur.execute(

"UPDATE accounts SET credits = credits + %s WHERE id = %s",

(amount, to_id),

)

conn.commit()

except Exception:

conn.rollback()

raise

This pattern keeps your data consistent even when errors occur.

A short checklist I actually use

Here’s the quick list I run through whenever I add a new query:

  • Is the query parameterized? If not, I fix it.
  • Am I selecting only the columns I need?
  • Does this query need an index to match its filters?
  • Can I reduce round-trips by batching or joining?
  • What is the expected result size, and which fetch method matches that?

If I can answer those five, I feel confident the query will age well.

Closing: build a calm data layer

You don’t need a fancy stack to query PostgreSQL well. You need clear SQL, safe parameter handling, and an honest view of how your queries behave under real load. psycopg2 gives you a straightforward path to all of that.

If you’re just getting started, I recommend building a small connection helper, writing a few explicit queries, and learning the differences between fetchall, fetchone, and fetchmany. If you already have a working system, look for the silent problems: unbounded result sets, repeated per-row queries, and missing indexes. Fixing those tends to cut latency by large margins and makes your service feel smoother immediately.

Your next step depends on where you are today. If you’re prototyping, start with a clean, parameterized query layer and keep it small. If you’re in production, add pooling and start measuring slow queries. And if you’re scaling fast, treat your SQL as first-class code: review it, test it, and keep it readable. PostgreSQL is an amazing database. With psycopg2, you can meet it on its terms and keep your Python code in control.

Scroll to Top