Understanding Database Performance Based on Query Behavior

Introduction

In day-to-day database operations, a key responsibility of a DBA is to monitor database health, handle performance issues, and support customers in resolving problems efficiently. Database performance issues usually arise due to workload patterns, inefficient queries, or improper system and database configuration and schema design.

To effectively troubleshoot performance problems, it is essential to analyze running queries, understand their execution paths, and identify why certain queries are slow. Once the root cause is identified, performance issues can be resolved through proper query tuning, indexing strategies, and configuration adjustments.

Performance optimization is not only about creating indexes, choosing the correct data type for each column is equally important. For example, if you have a college table with a column like type, and the expected maximum length is around 200 characters, it is best to define it as VARCHAR(200) rather than using TEXT or VARCHAR(4000) without a clear requirement. Oversizing columns unnecessarily can increase storage usage, enlarge indexes, consume more memory, and potentially impact query performance due to larger row sizes and additional I/O. Defining column data types based strictly on business requirements ensures better efficiency, improved scalability, and optimal overall database performance.

However, database performance is not solely determined by queries. It is influenced by multiple factors, including:

Query design and execution behavior
Table and index bloat
Over-indexing or poorly designed indexes
Memory and I/O configuration
Proper table structure and data type selection

A structured approach is required to identify and fix slow queries. This approach focuses on the fundamentals:

Identifying problematic SQL statements
Reading and interpreting execution plans with purpose
Understanding how the database engine processes queries internally

By combining query-level analysis with system-level awareness, DBAs can accurately identify and resolve performance bottlenecks, delivering stable and predictable database performance.

The Fundamentals: A Three-Step Approach

Effective query optimization follows a systematic process:

1. Identify Problematic SQL Statements:

Not all slow queries are created equal. The first step is identifying which queries deserve your attention. Look for patterns in execution time, resource consumption, and frequency of execution. A query that runs once a day and takes 5 seconds may be less critical than a query that runs 1,000 times per minute and takes 100 milliseconds.

2. Read and Interpret Execution Plans with Purpose:

EXPLAIN ANALYZE is your window into how PostgreSQL processes a query. Understanding execution plans isn’t about memorizing node types it’s about recognizing patterns and understanding the cost implications of different operations. Every scan type, join method, and filter operation tells a story about how your data is being accessed and combined.

3 . Understand How the Database Engine Processes Queries Internally:

The PostgreSQL query planner makes decisions based on statistics, available indexes, and estimated costs. Understanding why the planner chose a sequential scan over an index scan, or a hash join over a nested loop, helps you make informed decisions about schema design, indexing strategies, and query rewrites.

Real-World Example: From Sequential Scans to Index Usage

Step 1: Use EXPLAIN to Identify Performance Issues

samdb=# Explain Analyze SELECT c.cus_id, c.cu_name,
    c.cu_email, i.reg_name, i.salary, o.of_name,
    o.off_address
FROM customers c, intership i, office o
WHERE c.cu_email = i.st_email
  AND c.cu_email = o.off_email
  AND i.salary > 5000;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=67.90..146.99 rows=171 width=79) (actual time=13.913..17.384 rows=150 loops=1)
   Hash Cond: (o.off_email = c.cu_email)
   ->  Seq Scan on office o  (cost=0.00..66.37 rows=2937 width=49) (actual time=1.088..3.555 rows=2937 loops=1)
   ->  Hash  (cost=65.77..65.77 rows=170 width=64) (actual time=12.731..12.734 rows=150 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 23kB
         ->  Hash Join  (cost=6.83..65.77 rows=170 width=64) (actual time=9.081..12.617 rows=150 loops=1)
               Hash Cond: (i.st_email = c.cu_email)
               ->  Seq Scan on intership i  (cost=0.00..49.58 rows=2046 width=32) (actual time=0.606..4.628 rows=2046 loops=1)
                     Filter: (salary > 5000)
               ->  Hash  (cost=4.70..4.70 rows=170 width=32) (actual time=7.303..7.304 rows=170 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 19kB
                     ->  Seq Scan on customers c  (cost=0.00..4.70 rows=170 width=32) (actual time=0.962..7.169 rows=170 loops=1)
 Planning Time: 31.746 ms
 Execution Time: 17.607 ms
(14 rows)

Explanation:

The query joins three tables on the customers, intership, and office , based on matching email columns, and applies a filter condition on salary > 5000.

1. Sequential Scans on All Tables

PostgreSQL scanned the entire customers, intership, and office tables.
No indexes were used.
Since the tables are relatively small, the scans completed quickly.

2. Filter Condition Applied

The condition salary > 5000 was applied during the sequential scan of the intership table.
Rows not matching the condition were filtered out.

3. Hash Joins Used for Matching

PostgreSQL used Hash Joins to combine:
intership.st_email = customers.cu_email
office.off_email = customers.cu_email
Hash joins are efficient when joining moderately sized datasets without indexes.

4. Execution Time

Total execution time was around 17 ms.
Performance is acceptable for the current dataset size.

Performance Consideration

Although the query performs well now, it may become slow if the tables grow larger because:

All tables are using Sequential Scans
No indexes exist on join columns or filter columns

Recommended Optimization :

To improve scalability:

samdb=# CREATE INDEX ON customers(cu_email);
CREATE INDEX ON intership(st_email);
CREATE INDEX ON office(off_email);
CREATE INDEX ON intership(salary);
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX

Step 2: After Index created

samdb=# Explain analyze SELECT c.cus_id, c.cu_name,
    c.cu_email, i.reg_name, i.salary, o.of_name,
    o.off_address
FROM customers c, intership i, office o
WHERE c.cu_email = i.st_email
  AND c.cu_email = o.off_email
  AND i.salary > 5000;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=7.11..136.42 rows=171 width=79) (actual time=0.380..3.304 rows=150 loops=1)
   Join Filter: (c.cu_email = o.off_email)
   ->  Hash Join  (cost=6.83..65.77 rows=170 width=64) (actual time=0.344..1.582 rows=150 loops=1)
         Hash Cond: (i.st_email = c.cu_email)
         ->  Seq Scan on intership i  (cost=0.00..49.58 rows=2046 width=32) (actual time=0.066..0.737 rows=2046 loops=1)
               Filter: (salary > 5000)
         ->  Hash  (cost=4.70..4.70 rows=170 width=32) (actual time=0.167..0.178 rows=170 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 19kB
               ->  Seq Scan on customers c  (cost=0.00..4.70 rows=170 width=32) (actual time=0.015..0.080 rows=170 loops=1)
   ->  Index Scan using office_off_email_idx on office o  (cost=0.28..0.40 rows=1 width=49) (actual time=0.010..0.010 rows=1 loops=150)
         Index Cond: (off_email = i.st_email)
 Planning Time: 2.426 ms
 Execution Time: 3.427 ms
(13 rows)

Explanation:

After creating the index on office.off_email, the execution plan changed and performance improved significantly.

Execution time reduced from ~17 ms to ~3.4 ms.
PostgreSQL now uses a Nested Loop Join.
An Index Scan is used on the office table instead of a full table scan.
The intership table still uses a Sequential Scan because many rows match salary > 5000, so the planner finds it cheaper.

Key Improvement

The main performance gain comes from replacing the full scan of office with an index lookup, which reduces I/O and speeds up joins.

Performance Is Not Only About Indexes

While indexes greatly improve performance, schema design is equally important.

Data Type Selection Matters

Choosing appropriate column data types directly affects:

Row size
Index size
Cache efficiency
Query execution speed

Example: Poor Table Design

samdb=# CREATE TABLE college (id SERIAL PRIMARY KEY, name TEXT,
    email TEXT,
    type TEXT );
CREATE TABLE

Suppose the type column only stores small values such as:

UG , PG , Diploma

Example: Optimized Table Design

samdb=# CREATE TABLE student (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(150),
    type VARCHAR(20)
);
CREATE TABLE

Why This Impacts Performance

If you define very large column types (e.g., TEXT or VARCHAR(4000)) but store small values:

Row width increases
Fewer rows fit into an 8KB data page
More disk pages must be read
Indexes become larger
Cache efficiency decreases

As a result:

I/O increases
Memory usage increases
Execution time increases

On tables with millions of rows, this design mistake can significantly impact performance.

Conclusion

Database performance depends on:

Proper indexing
Efficient query design
Correct execution plan analysis
System configuration
Good schema and data type design

Performance tuning is not just about fixing slow queries, it starts at the design stage.

Well-designed tables, appropriate data types, and proper indexing together create a scalable and high-performing database system.

See this in action at PGConf India 2026 – Key PostgreSQL Query Tuning foundation Every Database Developer Should Build presented by Deepak Mahto