Introduction
In day-to-day database operations, a key responsibility of a DBA is to monitor database health, handle performance issues, and support customers in resolving problems efficiently. Database performance issues usually arise due to workload patterns, inefficient queries, or improper system and database configuration and schema design.
To effectively troubleshoot performance problems, it is essential to analyze running queries, understand their execution paths, and identify why certain queries are slow. Once the root cause is identified, performance issues can be resolved through proper query tuning, indexing strategies, and configuration adjustments.
Performance optimization is not only about creating indexes, choosing the correct data type for each column is equally important. For example, if you have a college table with a column like type, and the expected maximum length is around 200 characters, it is best to define it as VARCHAR(200) rather than using TEXT or VARCHAR(4000) without a clear requirement. Oversizing columns unnecessarily can increase storage usage, enlarge indexes, consume more memory, and potentially impact query performance due to larger row sizes and additional I/O. Defining column data types based strictly on business requirements ensures better efficiency, improved scalability, and optimal overall database performance.
However, database performance is not solely determined by queries. It is influenced by multiple factors, including:
- Query design and execution behavior
- Table and index bloat
- Over-indexing or poorly designed indexes
- Memory and I/O configuration
- Proper table structure and data type selection
A structured approach is required to identify and fix slow queries. This approach focuses on the fundamentals:
- Identifying problematic SQL statements
- Reading and interpreting execution plans with purpose
- Understanding how the database engine processes queries internally
By combining query-level analysis with system-level awareness, DBAs can accurately identify and resolve performance bottlenecks, delivering stable and predictable database performance.
The Fundamentals: A Three-Step Approach
Effective query optimization follows a systematic process:
1. Identify Problematic SQL Statements:
Not all slow queries are created equal. The first step is identifying which queries deserve your attention. Look for patterns in execution time, resource consumption, and frequency of execution. A query that runs once a day and takes 5 seconds may be less critical than a query that runs 1,000 times per minute and takes 100 milliseconds.
2. Read and Interpret Execution Plans with Purpose:
EXPLAIN ANALYZE is your window into how PostgreSQL processes a query. Understanding execution plans isn’t about memorizing node types it’s about recognizing patterns and understanding the cost implications of different operations. Every scan type, join method, and filter operation tells a story about how your data is being accessed and combined.
3 . Understand How the Database Engine Processes Queries Internally:
The PostgreSQL query planner makes decisions based on statistics, available indexes, and estimated costs. Understanding why the planner chose a sequential scan over an index scan, or a hash join over a nested loop, helps you make informed decisions about schema design, indexing strategies, and query rewrites.
Real-World Example: From Sequential Scans to Index Usage
Step 1: Use EXPLAIN to Identify Performance Issues
samdb=# Explain Analyze SELECT c.cus_id, c.cu_name,
c.cu_email, i.reg_name, i.salary, o.of_name,
o.off_address
FROM customers c, intership i, office o
WHERE c.cu_email = i.st_email
AND c.cu_email = o.off_email
AND i.salary > 5000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=67.90..146.99 rows=171 width=79) (actual time=13.913..17.384 rows=150 loops=1)
Hash Cond: (o.off_email = c.cu_email)
-> Seq Scan on office o (cost=0.00..66.37 rows=2937 width=49) (actual time=1.088..3.555 rows=2937 loops=1)
-> Hash (cost=65.77..65.77 rows=170 width=64) (actual time=12.731..12.734 rows=150 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 23kB
-> Hash Join (cost=6.83..65.77 rows=170 width=64) (actual time=9.081..12.617 rows=150 loops=1)
Hash Cond: (i.st_email = c.cu_email)
-> Seq Scan on intership i (cost=0.00..49.58 rows=2046 width=32) (actual time=0.606..4.628 rows=2046 loops=1)
Filter: (salary > 5000)
-> Hash (cost=4.70..4.70 rows=170 width=32) (actual time=7.303..7.304 rows=170 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 19kB
-> Seq Scan on customers c (cost=0.00..4.70 rows=170 width=32) (actual time=0.962..7.169 rows=170 loops=1)
Planning Time: 31.746 ms
Execution Time: 17.607 ms
(14 rows)
Explanation:
The query joins three tables on the customers, intership, and office , based on matching email columns, and applies a filter condition on salary > 5000.
1. Sequential Scans on All Tables
- PostgreSQL scanned the entire customers, intership, and office tables.
- No indexes were used.
- Since the tables are relatively small, the scans completed quickly.
2. Filter Condition Applied
- The condition salary > 5000 was applied during the sequential scan of the intership table.
- Rows not matching the condition were filtered out.
3. Hash Joins Used for Matching
- PostgreSQL used Hash Joins to combine:
- intership.st_email = customers.cu_email
- office.off_email = customers.cu_email
- Hash joins are efficient when joining moderately sized datasets without indexes.
4. Execution Time
- Total execution time was around 17 ms.
- Performance is acceptable for the current dataset size.
Performance Consideration
Although the query performs well now, it may become slow if the tables grow larger because:
- All tables are using Sequential Scans
- No indexes exist on join columns or filter columns
Recommended Optimization :
To improve scalability:
samdb=# CREATE INDEX ON customers(cu_email);
CREATE INDEX ON intership(st_email);
CREATE INDEX ON office(off_email);
CREATE INDEX ON intership(salary);
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
Step 2: After Index created
samdb=# Explain analyze SELECT c.cus_id, c.cu_name,
c.cu_email, i.reg_name, i.salary, o.of_name,
o.off_address
FROM customers c, intership i, office o
WHERE c.cu_email = i.st_email
AND c.cu_email = o.off_email
AND i.salary > 5000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=7.11..136.42 rows=171 width=79) (actual time=0.380..3.304 rows=150 loops=1)
Join Filter: (c.cu_email = o.off_email)
-> Hash Join (cost=6.83..65.77 rows=170 width=64) (actual time=0.344..1.582 rows=150 loops=1)
Hash Cond: (i.st_email = c.cu_email)
-> Seq Scan on intership i (cost=0.00..49.58 rows=2046 width=32) (actual time=0.066..0.737 rows=2046 loops=1)
Filter: (salary > 5000)
-> Hash (cost=4.70..4.70 rows=170 width=32) (actual time=0.167..0.178 rows=170 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 19kB
-> Seq Scan on customers c (cost=0.00..4.70 rows=170 width=32) (actual time=0.015..0.080 rows=170 loops=1)
-> Index Scan using office_off_email_idx on office o (cost=0.28..0.40 rows=1 width=49) (actual time=0.010..0.010 rows=1 loops=150)
Index Cond: (off_email = i.st_email)
Planning Time: 2.426 ms
Execution Time: 3.427 ms
(13 rows)
Explanation:
After creating the index on office.off_email, the execution plan changed and performance improved significantly.
- Execution time reduced from ~17 ms to ~3.4 ms.
- PostgreSQL now uses a Nested Loop Join.
- An Index Scan is used on the
officetable instead of a full table scan. - The
intershiptable still uses a Sequential Scan because many rows matchsalary > 5000, so the planner finds it cheaper.
Key Improvement
The main performance gain comes from replacing the full scan of office with an index lookup, which reduces I/O and speeds up joins.
Performance Is Not Only About Indexes
While indexes greatly improve performance, schema design is equally important.
Data Type Selection Matters
Choosing appropriate column data types directly affects:
- Row size
- Index size
- Cache efficiency
- Query execution speed
Example: Poor Table Design
samdb=# CREATE TABLE college (id SERIAL PRIMARY KEY, name TEXT,
email TEXT,
type TEXT );
CREATE TABLE
Suppose the type column only stores small values such as:
UG , PG , Diploma
Example: Optimized Table Design
samdb=# CREATE TABLE student (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(150),
type VARCHAR(20)
);
CREATE TABLE
Why This Impacts Performance
If you define very large column types (e.g., TEXT or VARCHAR(4000)) but store small values:
- Row width increases
- Fewer rows fit into an 8KB data page
- More disk pages must be read
- Indexes become larger
- Cache efficiency decreases
As a result:
- I/O increases
- Memory usage increases
- Execution time increases
On tables with millions of rows, this design mistake can significantly impact performance.
Conclusion
Database performance depends on:
- Proper indexing
- Efficient query design
- Correct execution plan analysis
- System configuration
- Good schema and data type design
Performance tuning is not just about fixing slow queries, it starts at the design stage.
Well-designed tables, appropriate data types, and proper indexing together create a scalable and high-performing database system.
See this in action at PGConf India 2026 – Key PostgreSQL Query Tuning foundation Every Database Developer Should Build presented by Deepak Mahto
